[This local archive copy mirrored from the canonical site: http://www.hti.umich.edu/misc/ssp/workshops/teixml-qs.html; links may not have complete integrity, so use the canonical document at this URL if possible.]
TEI and XML in Digital Libraries
June 30-July 1st, 1998
Questions and Resources for Working Groups
Group 1: Descriptive Metadata: MARC, AACR2, and the TEI Header
- Will the library's online catalog serve as the primary (or a primary) database for identifying electronic texts? If so, does this argue for finding a way to migrate header content to the catalog?
- Hypothesis: We can generate most of one (MARC or header) from the
other (header or MARC), and that doing so will save us time/energy and
ensure greater reliability. Is this a reasonable hypothesis? Can we test it?
- Are there tools or resources available to and commonly used by libraries that would argue for a distinct approach to creating TEI Headers? (For example, is the library community more apt to take advantage of authority control sources, to use forms of name or entry, etc., because of the skill set and personnel?)
- Are TEI Headers more likely to be created by catalogers when the electronic text is created as part of a library-based text-encoding project?
- Are there approaches (such as punctuation or types of content) that are part of "library" work, that should make their way into the TEI Header? (For example, some library-based text encoding projects put the MARC 245 statement of responsibility in the TEI Header's TITLE.)
- Are electronic texts created as part of a library-based text encoding project frequently or always cataloged?
- To what extent is a round-trip from MARC/AACR2 to TEI Header possible?
- To what extent is a one-way trip from MARC/AACR2 to TEI Header possible?
- To what extent is a one-way trip from the TEI Header to MARC/AACR2 possible?
- To what extent are 5, 6, and 7 (above) desirable?
- What are the impediments for creating guidelines for the creation of TEI Headers in libraries?
Group 1 resources
- Draft Interim Guidelines for Cataloging Electronic Resources, Library of Congress, Cataloging Policy and Support Office. See especially sections B19.3, B19.4.1, B19.4.5, and B19.4.6.
- UVa Cataloging Procedures Manual -- CHAPTER XII: COMPUTER FILES CATALOGING
- UM Cataloging Procedure Manual for HTI
- ALCTS Committee on Cataloging: Description and Access, Task Force on Metadata and the Cataloging Rules, Final Report
- The University of Michigan Digital Library Production Service produces TEI Headers from USMARC communications format records for the Making of America project and related activities, automatically generating them from the Michigan MARC records. The program is not yet available for distribution. A sample TEI Header, resulting from that transformation, is provided here.
Group 2: TEI Text Encoding: Library Application
- Are there principles of projects or encoding in a library that make the work different from work done in, for example, a scholarly project?
- Are there principles of projects or encoding in a library that make the work different from work done in, for example, a commercial project?
- Is there a need to establish a common subset of the TEI DTD and/or TEI application guidelines for library-based TEI encoding projects?
- Will it facilitate data sharing?
- Will it facilitate tool sharing?
- Interoperability (e.g., in searching across collections)?
- Guidance for new projects?
- Guidance for outsourcing?
- Specifications for outsourcing?
- Is it possible to establish a common subset of the TEI DTD and/or TEI application guidelines?
- If the answer is "a common subset of the TEI DTD and/or TEI application guidelines is possible and desirable," should we create them?
- If the answer is "a common subset of the TEI DTD and/or TEI application guidelines is possible," how do we do it?
- Can the development of a common subset of the TEI DTD and/or TEI application guidelines be created informally?
- Is there a formal channel or group within the library community through which to develop the common subset of the TEI DTD and/or TEI application guidelines?
- If not a formal channel or gropu within the library community, should there be one? Who creates it? How is it formed? In what organization is it constituted?
- Who is "in charge"?
- What are some areas upon which to focus?
- Are there resources that should be tapped to begin this process?
- If the answer is "a common subset of the TEI DTD and/or TEI application guidelines is not possible or desirable," what then?
- What are the impediments for creating a common subset of the TEI DTD and/or TEI application guidelines for libraries?
Group 2 Resources
Encoding Guidelines from Library-based Text-encoding projects
- University of Virginia Electronic Text Center: (guidelines | site) "These guidelines cover not only tags and their usage, but offer suggestions on the processing of electronic texts and related images. The guidelines assume that the text in question is already in some electronic format; information on OCR scanning is available elsewhere in the Etext Center cluster of WWW documents."
- University of Michigan Humanities Text Initiative: (manual | site): "This specialized resource has been developed to assist everyone who is working on the American Verse Project, with additions for Middle English. The Styleguide answers specific questions, such as "What tags do I use on the titlepage of my text?" or "How do I markup this bit of poetry?" Composed by Maria Bonn to anticipate many of the questions one might ask while tagging American verse texts."
- Indiana University Victorian Women Writers Project (site)
General Guidelines | Front Matter | Verse | Drama | Back Matter and Proofreading
- CIC TEI Lite, preliminary guidelines for the common applicatin of TEI Lite in CIC library-based text encoding projects.
- Please send suggestions for additions to this list to Chris Powell <sooty@umich.edu>
Group 3: Structural & Descriptive Metadata: Page-Image Conversion Projects
The focus in Group 3 is specifically on encoding structural and administrative metadata related to the capture, organization, and presentation of page images, with or without associated OCR. It is assumed by the organizers of this meeting that markup at a "deeper" level for such materials will draw on the TEI Guidelines for encoding specifications (see Group 2, above).
- Is there a need to establish a common set of metadata elements for page image conversion projects?
- Would it facilitate data sharing?
- Would it facilitate tool sharing?
- Would it contribute to interoperability (e.g., in searching across collections)?
- Would it provide guidance for new projects?
- Would it provide guidance for outsourcing?
- Would it aid in creating specifications for outsourcing?
- If such a common set of metadata elements is useful, should they marked up as SGML?
- If such a common set of metadata elements is useful, are there advantages to requiring outsourcing vendors to deliver metadata created in the capture process in SGML?
- Can the TEI be augmented to accommodate this metadata?
- Should the TEI be augmented to accommodate this metadata?
- Is there a formal channel or group within the library community through which to develop and maintain a common set of page image metadata elements?
- What are challenges for effective storage by libraries and delivery by vendors to libraries?
- What are the impediments for creating a common framework and set of data elements for structural metadata?
Group 3 Resources
The following table of elements captured in several recent conversion projects is organized by institution, project, or specification. It lists the form in which the authoritative version of the metadata is stored and the data elements captured or reflected. "Feature," in each case, is an element intended to reflect the functional role (e.g., "list of illustrations," "first table of contents page"); all other categories are assumed to be self-explanatory or undocumented. We are eager to expand this to include information from other projects. Please send submissions to Nigel Kerr <nigelk@umich.edu>.
Institution | Storage form | data1 | data2 | data3 | data4 | data5 | data6 | data7 |
CIHM/ECO | MS-Access | sequence | number (e.g., "i" or "2") | feature | file reference | rescan | | |
Cornell MOA | modified Effect spec. | sequence | number (e.g., "i" or "2") | feature | file reference | | | |
EBind | SGML | sequence | number (e.g., "i" or "2") | feature | file reference | unique identifier | "ext.ptr" | misc. global |
JSTOR | modified Effect spec. | sequence | number (e.g., "i" or "2") | | file reference | | | |
Oxford | SGML | sequence | number (e.g., "i" or "2") | feature | file reference | section/issue/article boundary | unique identifier | |
UM MOA | SGML | sequence | number (e.g., "i" or "2") | feature | file reference | resolution (implied 600dpi) | source format (implied TIFF G4) | OCR confidence level |
Attributes currently supported in TEI PB element are coded in red.