[This local archive copy mirrored from the canonical site: http://www.hti.umich.edu/misc/ssp/workshops/teixml-qs.html; links may not have complete integrity, so use the canonical document at this URL if possible.]

TEI and XML in Digital Libraries
June 30-July 1st, 1998
Questions and Resources for Working Groups

Group 1: Descriptive Metadata: MARC, AACR2, and the TEI Header

  1. Will the library's online catalog serve as the primary (or a primary) database for identifying electronic texts? If so, does this argue for finding a way to migrate header content to the catalog?
  2. Hypothesis: We can generate most of one (MARC or header) from the other (header or MARC), and that doing so will save us time/energy and ensure greater reliability. Is this a reasonable hypothesis? Can we test it?
  3. Are there tools or resources available to and commonly used by libraries that would argue for a distinct approach to creating TEI Headers? (For example, is the library community more apt to take advantage of authority control sources, to use forms of name or entry, etc., because of the skill set and personnel?)
  4. Are TEI Headers more likely to be created by catalogers when the electronic text is created as part of a library-based text-encoding project?
  5. Are there approaches (such as punctuation or types of content) that are part of "library" work, that should make their way into the TEI Header? (For example, some library-based text encoding projects put the MARC 245 statement of responsibility in the TEI Header's TITLE.)
  6. Are electronic texts created as part of a library-based text encoding project frequently or always cataloged?
  7. To what extent is a round-trip from MARC/AACR2 to TEI Header possible?
  8. To what extent is a one-way trip from MARC/AACR2 to TEI Header possible?
  9. To what extent is a one-way trip from the TEI Header to MARC/AACR2 possible?
  10. To what extent are 5, 6, and 7 (above) desirable?
  11. What are the impediments for creating guidelines for the creation of TEI Headers in libraries?

Group 1 resources


Group 2: TEI Text Encoding: Library Application

  1. Are there principles of projects or encoding in a library that make the work different from work done in, for example, a scholarly project?
  2. Are there principles of projects or encoding in a library that make the work different from work done in, for example, a commercial project?
  3. Is there a need to establish a common subset of the TEI DTD and/or TEI application guidelines for library-based TEI encoding projects?
    1. Will it facilitate data sharing?
    2. Will it facilitate tool sharing?
    3. Interoperability (e.g., in searching across collections)?
    4. Guidance for new projects?
    5. Guidance for outsourcing?
    6. Specifications for outsourcing?
  4. Is it possible to establish a common subset of the TEI DTD and/or TEI application guidelines?
  5. If the answer is "a common subset of the TEI DTD and/or TEI application guidelines is possible and desirable," should we create them?
  6. If the answer is "a common subset of the TEI DTD and/or TEI application guidelines is possible," how do we do it?
    1. Can the development of a common subset of the TEI DTD and/or TEI application guidelines be created informally?
    2. Is there a formal channel or group within the library community through which to develop the common subset of the TEI DTD and/or TEI application guidelines?
    3. If not a formal channel or gropu within the library community, should there be one? Who creates it? How is it formed? In what organization is it constituted?
    4. Who is "in charge"?
    5. What are some areas upon which to focus?
    6. Are there resources that should be tapped to begin this process?
  7. If the answer is "a common subset of the TEI DTD and/or TEI application guidelines is not possible or desirable," what then?
  8. What are the impediments for creating a common subset of the TEI DTD and/or TEI application guidelines for libraries?

Group 2 Resources

Encoding Guidelines from Library-based Text-encoding projects


Group 3: Structural & Descriptive Metadata: Page-Image Conversion Projects

The focus in Group 3 is specifically on encoding structural and administrative metadata related to the capture, organization, and presentation of page images, with or without associated OCR. It is assumed by the organizers of this meeting that markup at a "deeper" level for such materials will draw on the TEI Guidelines for encoding specifications (see Group 2, above).
  1. Is there a need to establish a common set of metadata elements for page image conversion projects?
    1. Would it facilitate data sharing?
    2. Would it facilitate tool sharing?
    3. Would it contribute to interoperability (e.g., in searching across collections)?
    4. Would it provide guidance for new projects?
    5. Would it provide guidance for outsourcing?
    6. Would it aid in creating specifications for outsourcing?
  2. If such a common set of metadata elements is useful, should they marked up as SGML?
  3. If such a common set of metadata elements is useful, are there advantages to requiring outsourcing vendors to deliver metadata created in the capture process in SGML?
  4. Can the TEI be augmented to accommodate this metadata?
  5. Should the TEI be augmented to accommodate this metadata?
  6. Is there a formal channel or group within the library community through which to develop and maintain a common set of page image metadata elements?
  7. What are challenges for effective storage by libraries and delivery by vendors to libraries?
  8. What are the impediments for creating a common framework and set of data elements for structural metadata?

Group 3 Resources

The following table of elements captured in several recent conversion projects is organized by institution, project, or specification. It lists the form in which the authoritative version of the metadata is stored and the data elements captured or reflected. "Feature," in each case, is an element intended to reflect the functional role (e.g., "list of illustrations," "first table of contents page"); all other categories are assumed to be self-explanatory or undocumented. We are eager to expand this to include information from other projects. Please send submissions to Nigel Kerr <nigelk@umich.edu>.

InstitutionStorage formdata1data2data3data4data5data6data7
CIHM/ECOMS-Accesssequencenumber (e.g., "i" or "2")featurefile referencerescan  
Cornell MOAmodified Effect spec.sequencenumber (e.g., "i" or "2")featurefile reference   
EBindSGMLsequencenumber (e.g., "i" or "2")featurefile referenceunique identifier"ext.ptr"misc. global
JSTORmodified Effect spec.sequencenumber (e.g., "i" or "2") file reference   
OxfordSGMLsequencenumber (e.g., "i" or "2")featurefile referencesection/issue/article boundaryunique identifier 
UM MOASGMLsequencenumber (e.g., "i" or "2")featurefile referenceresolution (implied 600dpi)source format (implied TIFF G4)OCR confidence level

Attributes currently supported in TEI PB element are coded in red.