[This local archive copy is from the official and canonical URL, http://www.mediacenter.org/report.htm; please refer to the canonical source document if possible.]
To: API Grammarians
From: Tagging Team (Report prepared by Kathy Foley and Tom Johnson)
Re: Preliminary Thoughts on Journalism Tag Schema
What follows are the first-pass thoughts of the tagging sub-group of the API Grammarians who met in Dallas Jan. 7-8, 1999. Participants on the tag-team were: Kathy Foley (Editor, Information Services-San Antonio Express-News firstname.lastname@example.org ); Tom Johnson (Prof. of Journalism - San Francisco State University email@example.com ); Alan Karben ( Associate Director, Interactive Development, The Wall Street Journal Interactive Edition firstname.lastname@example.org ); B. C. Krishna (FutureTense - email@example.com ); Chris Ryan (Freedom Forum Fellow - University of Kansas ); Dennis Walsh ( Co-Head of the Interactive Media Center, Miami University of Ohio DPWALSH@miavx1.acs.muohio.edu ); Chris Willis (Senior Editor/Technology & Design - A. H. Belo Corp firstname.lastname@example.org ); Steve Yelvington ( Editor, Star-Tribune Online email@example.com ); MTK
Metadata: Metadata is, most generally, data that describes other data to enhance its usefulness. The catalog that emerged as an important component of the modern library is used as a canonical example of metadata, although there are many other well-developed examples within libraries, museums, corporations and other institutions that emphasize intellectual assets as a central part of their stock-in-trade. The development and maintenance of this metadata is, then, an essential activity for these institutions. They describe, keep track of, provide access to, and manage their collections by the means. (We recommend the document at http://www.csdl.tamu.edu/~marshall/dl98-making-metadata.html because it addresses situations in other industries and institutions exactly parallel to ours in the media.)
Metadata may not be universal in its scope. Some metadata is local and private, used only by data producers, managers or maintainers— for example, metadata relating to archiving or media production or distribution. Or metadata may pertain to a limited set of content users — for example, the metadata relating to the use of particular materials on a particular day or of a given subject by individuals using a particular access medium or service (e.g. the Web verses PalmPilot racing results).
Data Tags: Anyone who has used computer editing systems, created Web pages in HTML or even used WordPerfect or Xywrite word processing software is already familiar with the concept and usage of data tags. Data tags are to a markup language what letters and words are to a written language: the building-blocks. A tag generally precedes a string of text and tells the computer to do something specific to whatever characters follow. When the instruction has been completed, then a related tag is used to stop the action. For example, around <ital>this phrase</ital> are tags to start and stop the italicization of the two words within the tags.
In computing and data base terms, tags can have greater significance. They can be used to organize the creation, archiving, searching, sorting, retrieval, and communication of a variety of data and data types, so long as that data is stored in a digital format. Data tags can indicate which part of a story is the headline and which is the byline. Tags can be used to label the concepts or content of a audio, video or database file. Tags can be used to indicate an annotation of a file, although the content of that file may not have that specific digital content. For example, a picture of Bill kissing Hillary could refer to "love" or "devotion," though such data would not be automatically inherent in a graphic file.
Tags often reflect Parent-Child or Hierarchy-of-Meaning relationships. For example, a tag in a reporter's story marking a geo-spatial reference might simply point to the name of the city "Arlington." But Arlington alone is not sufficient because the term can be part of multiple hierarchies that extend from Universe-Solar System-Planet-Earth-Continent-North America-United States-Virginia [or Texas, Illinois, Maine, Wisconsin], Tarrant County, Municipality, ZIP, latitude-longitude-degree-minute. The advantage of using a tag to identify Arlington is that the software can be programmed to require the producer or editor to specify which Arlington. And once that location is specified, a digital tag dictionary/thesaurus can make the necessary hierarchical links that would permit "fuzzy" searching, an in "There's that city that starts with an 'A" in Texas that's near Ft. Worth...."
[CHRIS: DO WE NEED A SECTION HERE ON THE WHY'S AND WHEREFORE'S OF XML ???]
TAGS FOR JOURNALISM
In addition to the metatags for journalism, there appear to be three sub-sets. Sometimes, too, metatags will be duplicated in various ways in the sub-set tags. For example, a metatag referring to the size or location of the file could also show up as a tag necessary for the archiving of the content.
Fig. 1 Relationships of Journalism Content Tag Types
Often the tags for these sub-sets overlap, but each has specific uses. The three sub-sets are:
These tags typically could include comments between writers, editors, producers, programmers or administrative personnel. They could pertain to publishing schedule or degree of content readiness. They could reflect an audit trail of changes and access rights. They can reflect embedded file types (e.g. a story or ad or announcement that links to an A/V file on the web or an audio file that delivers driving directions over a cell phone).
Content tags tend to be much more specific, but flexible enough, to help differentiate between President Thomas Jefferson and Thomas Jefferson High School.
Digital journalism content is evolving to generally fall in three sub-sets: news/editorial; commercial (ads, transactions) and community (i.e. content generated and largely maintained by individuals or community groups such as church announcements or Little League activities). The system we propose is malleable enough to accommodate all of these data types.
A well-designed tagging plan can help distinguish between the time and/or date a story was published and the time and/or date of an event. These tags can be used to mark-up headlines, subheads, bylines and even such industry-specific components as ledes, nut grafs and kickers. The system is flexible enough—using a coding language called XML—that each newspaper, magazine or TV station can customize the tags to fit its unique newsroom vocabulary.
On one hand, these tags are tied to the special concerns of archivists and archive vendors such as Lexis-Nexis and MediaStream. But they also will be invaluable to content producers as media institutions come to realize that their archives are one of the few truly unique resources they have, especially as they pertain to local markets. Consequently, ease of precision searching to facilitate, pardon the expression, re-purposing the content will drive P/L decisions. Typical archiving tags identify the source (byline, credit, publication, edition, page, section, zone), the type (news, feature, analysis, review, game story), or the relationship of the object to other objects (photo caption, graphics text, sidebars, series information)"
Here are the suggested tags the sub-group came up with in a short time. While we have tried to group the tags in their principal family -- Process, Content or Archive -- a tag will often serve more than one master. It is important to understand, however, that the term standard as applied to these tags does not mean mandatory or constricted. The system proposed is of value primarily because it allows -- and perhaps even encourages -- flexibility and customization.
NOTE: These tags below are just for illustrative purposes; this does not approach a complete list. Everyone should feel free to suggest tags at all levels, keeping in mind the nested functions and cross usage. This formulation is ideally suited for Web presentation. For a model, see "The FGDC Content Standard for Digital Geospatial Metadata" http://www.its.nbs.gov/nbs/meta/meta.html
Journalism Content Metatags (preliminary):
Preliminary Journalism Process Tags
Preliminary Journalism Content Tags
Preliminary Journalism Archive Tags
ISSUES FOR THE IMMEDIATE FUTURE:
Brief History of Metadata and Content Tagging
Nov. 1998 API Media Center Conference: Developing a Grammar for New Media," A proposal to create a news markup language for the Web may be the most influential development to emerge from a gathering of some of the finest creative minds from a broad spectrum of disciplines held Nov. 7-10, 1998 at The Media Center at the American Press Institute in Reston, VA. http://www.mediacenter.org/grammar
Jan. 1999 API Media Center Conference: "Grammar II: The Sequel," in Dallas" The intrepid grammarians attempt to develop a News Markup Language. http://www.mediacenter.org/nml.htm
Marshall, Catherine C. Marshall "Making Metadata: a study of metadata creation for a mixed physical-digital collection" ABSTRACT: Metadata is an important way of creating order in emerging distributed digital library collections. This paper presents an analysis of ethnographic data gathered in a university library's educational technology center as the staff develops metadata for a mixed physical-digital collection of visual resources. In particular, the paper explores issues associated with the application of standards, uncertain collection and metadata boundaries, distribution and responsibility, the types of description that arise in practice, and metadata temporality and scope. These issues help to characterize a problem space, and to explore the trade-offs collection maintainers must face when they create metadata for heterogeneous materials. http://www.csdl.tamu.edu/~marshall/dl98-making-metadata.html#table2
Rust, Godfrey. "Metadata: The Right Approach An Integrated Model for Descriptive and Rights Metadata in E-commerce." "There are currently four major active communities of rights-holders directly confronting these questions [involving digital content and metatags]: the DOI (Digital Object Identifier) community, at present based in the book and electronic publishing sector; the IFPI community of record companies; the ISAN community embracing producers, users, and rights owners of audiovisuals; and the CISAC community of collecting societies for composers and publishers of music, but also extending into other areas of authors' rights, including literary, visual, and plastic arts.... This paper examines three propositions that support the need for radical integration of metadata and rights management concerns for disparate and heterogeneous materials, and sets out a possible framework for an integrated approach. It draws on models developed in the CIS (Common Information System) plan and the DOI Rights Metadata group, and work on the ISRC (Corporation for National Research Initiatives), ISAN (International Standard Audiovisual Number), and ISWC standards and proposals." http://www.dlib.org/dlib/july98/rust/07rust.html#introduction
NOTE: "The Dublin Core" SITE IS THE JUMP STATION TO REVIEW THE MANY YEARS OF INTERNATIONAL WORK ALREADY COMPLETED ON TAGGING CONTENT.
The Dublin Core: A Simple Content Description Model for Electronic Resources : Metadata for Electronic Resources
The Dublin Core is a metadata element set intended to facilitate discovery of electronic resources. Originally conceived for author-generated description of Web resources, it has attracted the attention of formal resource description communities such as museums, libraries, government agencies, and commercial organizations.
The Dublin Core Workshop Series has gathered experts from the library world, the networking and digital library research communities, and a variety of content specialties in a series of invitational workshops. The building of an interdisciplinary, international consensus around a core element set is the central feature of the Dublin Core. The progress represents the emergent wisdom and collective experience of many stakeholders in the resource description arena. An open mailing list supports ongoing work. See: http://purl.oclc.org/dc
Metadata Related Tools: http://purl.oclc.org/dc/tools/index.htm
The Meta Data Coalition (formerly Metadata Coalition) regroups vendors and users allied with a common purpose of driving forward the definition, implementation and ongoing evolution of a meta data interchange format and its support mechanisms. The need for such standards arises as meta data, or the information about the enterprise data emerges as a critical element in effective data management. Different tools, including data warehousing, distributed client/server computing, databases (relational, OLAP, OLTP...), integrated enterprise-wide applications, etc... must be able to cooperate and make use of meta data generated by each other. http://www.he.net/~metadata/index.html
Meta Data Interchange Specification (MDIS Version 1.1) This is version 1.1 of the Meta Data Interchange Specification. It is available here as a table of contents and complete downloadable copies in PDF and PostScript formats. This document is dated August 1, 1997 http://www.csdl.tamu.edu/~marshall/dl98-making-metadata.html#table2 The Table of Contents here offers a model for how we might structure our reports and presentation.
Geospatial Support Staff Metadata Tutorial Introduction
"In the beginning, one collected ... data without considering that somewhere, sometime, someone might ask -
Why was this data gathered?
What was collected?
Who collected it?
How was it collected?
How current is it?
Where is this data?
[Who has access rights to the data? Who has rights to change the data? Should the data be updated? Should it be referenced to other data? Should the data have a warning flag of any sort (e.g. potentially libelous]?
"This brings us to the somewhat confusing business of collecting Metadata. When first faced with this nasty problem, the collection of Metadata can seem overwhelming. Fortunately, the Federal Geographic Data Committee (FGDC) has thought this out and has published a metadata content standard. So we know what to collect. Our job is to figure out what information would be most meaningful and best define our data sets using their standard. So let's begin by looking at the 10 Metadata sections." http://www.blm.gov/gis/meta/barney/meta1.html
XML Resources (Note that XML is a trademark held by M.I.T.):
Frequently Asked Questions about the Extensible Markup Language Maintained on behalf of the World Wide Web Consortium's XML Special Interest Group and many other members of the XML Special Interest Group of the W3C as well as FAQ readers around the world. A site with reliable, straight-forward answers. http://www.ucc.ie/xml
What the ?XML! may be one of the worst designed Web sites in existence, and Geocities a challenge to navigate, but be patient and drill down. The information is straightforward and helpful. This is the general site with the "Learn XML in 11.5 Minutes" document. http://www.geocities.com/SiliconValley/Peaks/5957/wxml.html
XML.COM is a Seybold Publications and O'Reilly & Associates venture that apparently aims to cover the XML world as it evolves. Good at staying up-to-date http://www.xml.com
W3C XML This is the home page of the W3C XML Activity, part of the Architecture Domain. The XML Activity Statement explains the W3C's work on this topic in more detail. http://www.w3.org/XML
Microsoft's XML resource page. This section provides information on Microsoft's support of the Extensible Markup Language (XML) -- the universal format for data on the Web. XML allows developers to easily describe and deliver rich, structured data from any application in a standard, consistent way. XML does not replace HTML; rather, it is a complementary format. (Note that XML has its own newsgroup: microsoft.public.xml. You can use this newsgroup to get answers to your XML questions, learn what's new with XML, and find out what you can do using XML.) http://www.microsoft.com/xml/default.asp
Arbortext, a vendor "of open, XML-based software solutions that accelerate the process of creating, managing, and delivering product information in medium and large enterprises. Has NT products. Click down, however, to find the links to other online XML resources. http://www.arbortext.com