[Cache from http://sunsite.berkeley.edu/MOA2/papers/dtdtutorial2.htm; please use this canonical URL/source if possible.]


MOA2 Digital Object Document Type Definition Tutorial

Introduction

An XML Document Type Definition has been created for MOA2 digital objects. This DTD provides a means of encoding the various descriptive, administrative and structural metadata for all electronic versions of a particular archival object.

An MOA2 digital object consists of four major sections:

  1. Descriptive metadata. The descriptive metadata section may point to external descriptive metadata (such as a finding aid or MARC record) or it may itself contain embedded descriptive metadata, or both
  2. File inventory. The file inventory section lists all of the files comprising all electronic versions of the archival object, with the files grouped together by version.
  3. Administrative metadata.The administrative metadata section provides information regarding how the files were created and stored, intellectual property rights, and the source of the files.
  4. Structural map. The structural map provides one or more hierarchical descriptions of the original archival object’s structure (either logical or physical structure), and provides pointers from locations within those hierarchies to the files which comprise the various electronic versions.
A more detailed explanation of each section and their inter-relations follows

The examples in this tutorial are taken from a simplified version of the of MOA II encoding of the Breen Diary (from the collection of The Bancroft Library).   This simplified Breen diary consists of two pages of the diary proper; followed by a two page letter that has been inserted at the end of the diary. You may view the entire XML source for the sample document.  Each section below is also directly linked to the pertinent section of the source XML file.  You can also activate the sample document in the MOA2 Viewer.
 

Descriptive Metadata

Descriptive Metadata section of an MOA2 XML Object (the <DescMD> element) may refer to an external source of descriptive metadata; or may itself contain embedded descriptive metadata. References to external descriptive metadata appear in <DMDRef> elements.  Embedded descriptive metadata appears within an <DMD> element.

External Descriptive Metadata: Descriptive Metadata Reference. A descriptive metadata reference element (<DMDRef>) simply provides the URI for an external source of descriptive metadata. For example, the descriptive metadata reference below points to an external finding aid:

<DescMD>
<DMDRef LOCTYPE='URL' DMDTYPE='FINDAID'>http://sunsite2.berkeley.edu/cgi-bin/oac/calher/breen
</DMDRef>
</DescMD>

This <DMDRef> contains two attributes. The LOCTYPE specifies the type of URI being provided (PURL, HANDLE, DOI and PDI would be other options). The DMDTYPE identifies the type of descriptive metadata being referred to: MARC record, FINDAID, RDF, PICS or OTHER. Additional supported attributes provide for specifying the MIMETYPE of the external descriptive metadata, and a LABEL that can be used to identify the available descriptive metadata to the user.

Click here to view the entire descriptive metadata section of the simplified encoding for the Breen Diary.

Embedded Descriptive Metadata. Embedded descriptive metadata appearing under a <DMD> element can either use generic descriptive metadata elements defined in the DTD, or another user-defined text format (e.g., MARC, Dublin Core) enclosed in a wrapper. The generic descriptive metadata elements provided for by the DTD are grouped under a <GDM> element. These are closely related to the descriptive metadata fields supported by GenDB, a database designed and used at UC Berkeley to gather the metadata needed to construct both EAD and MOA2 objects. (If GenDB is used, a program will automatically created the MOA2 XML objects from the database). The core descriptive metadata elements include title, date, caption, dimensions, and material origin. In addition to the core descriptive metadata elements, the following elements are supported: administrative information, alternate date, content, creator, general notes, physical description, related materials, subobject source, and subject.

Note that a <GDM> element includes an ID attribute. This attribute provides a unique, internal name for each GDM element which can be used in the StructMap to link a particular division of the document hierarchy to a particular GDM element This allows specific sections of descriptive metadata to be linked to specific parts of the digital object. In other words, a <GDM> element may pertain to the entire digital object described by an MOA2.xml file or just a portion of it.  For example, in the case of the Breen Diary it would be possible to set up one <GDM> element that pertained to the Breen Diary as a whole, and a second <GDM> element that just pertained to the letter appended to the end of the Diary.
 

File Inventory

The file inventory section consists of one or more <FileGrp> elements used to group together related files.  A <FileGrp> lists all of the files which comprise a single electronic version of the archival object. For example, there might be separate <FileGrp> elements for the thumbnails, the master archival images, the pdf versions, the TEI encoded text versions, etc.

Consider the first <FileGrp> from the simplified encoding of the Breen Diary:

<FileGrp VERSDATE ='12/4/1998'>
    <File ID ='FID1' MIMETYPE ='text/sgml' SEQ = '1' CREATED = '12/4/1998' ADMID = 'ADM4 ADM4 ADM6'
    GROUPID = 'GID1' USE = 'ARCHIVE'>
        <FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/sgml/breen2.sgm
        </FLocat>
    </File>
</FileGrp>

The <FileGrp> above represents the single file containing the SGML encoded text transcription of the Breen Diary.The <FileGrp> tag contains a VERSDATE attribute, which provides the date the SGML version was created. It could also include an ADMID, which would provide the names of the various sections within the administrative metadata portion of the document which apply to all the files in the file group. However, the ADMID information may also be supplied, as it is here, at the <File> element level.  The <FileGrp> here contains a single <File> element, which identifies the the one file in this file group.  Its attributes include such information as the mimetype of the file and its intended use.  The <File> element in turn contains a <FLocat> (file location) element. The <FLocat> provides a network location for the file (in this case, a URL), and provides an attribute to specify whether this location is a URL, PURL, URN, etc.

A more complicated <FileGrp> from the simplified Breen Diary is shown below. This aggregates all of the <File> elements that represent medium resolution jpeg versions of the diary.  Within the highest level <FileGrp> the <File> elements are divided between two secondary <FileGrp> elements.  The first secondary <FileGrp> represents the medium resolution jpegs of the diary pages; the second represents the medium resolution jpegs of the letter pages.

<FileGrp VERSDATE ='4/3/1998'>
    <FileGrp>
        <File ID ='FID6' MIMETYPE='image/jpg' SEQ = '1' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
        ADMID ='ADM2 ADM4 ADM11' GROUPID = 'GID2' USE = 'REFERENCE'>
            <FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018236B.jpg
            </FLocat>
        </File>
        <File ID ='FID7' MIMETYPE='image/jpg' SEQ = '2' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
        ADMID ='ADM2 ADM4 ADM12' GROUPID = 'GID3' USE = 'REFERENCE'>
            <FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018237B.jpg
            </FLocat>
        </File>
    </FileGrp>
    <FileGrp>
        <File ID ='FID8' MIMETYPE='image/jpg' SEQ = '3' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
        ADMID ='ADM2 ADM4 ADM9' GROUPID = 'GID31' USE = 'REFERENCE'>
            <FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018266B.jpg
            </FLocat>
        </File>
        <File ID ='FID9' MIMETYPE='image/jpg' SEQ = '4' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
        ADMID ='ADM2 ADM4 ADM10' GROUPID = 'GID32' USE = 'REFERENCE'>
            <FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018267B.jpg
            </FLocat>
        </File>
    </FileGrp>
</FileGrp>

Note that the <File> element contains an ID attribute. This attribute provides a unique, internal name for this file which can be referenced by other portions of the document. You’ll see this type of referencing in action when we look at the Structural Map Section.

Click here to view the entire file inventory of the sample Breen Diary.
 

Administrative Metadata

<AdminMD> elements contain the administrative metadata pertaining to the files comprising an MOA2 document. There are three main forms of administrative metadata that are provided for: file management information (<FileMgmt> element), intellectual property rights information (<Rights> element), and information regarding the original source of the electronic files referred to by the document (<Source> element). Multiple instances of each of these types of information may occur within a single document.

An example of file management information for an image file associated with the Breen Diary appears below:

<AdminMD ID='ADM2'>
    <FileMgmt>
        <Image>
            <Compression>JPEG
            </Compression>
            <BitDepth BITS='24' />
            <ColorSpace>RGB
            </ColorSpace>
            <Resolution>90
            </Resolution>
        </Image>
    </FileMgmt>
</AdminMD>

Note that each administrative metadata section ( <AdminMD> ) has an ID attribute.  In the sample above it is "ADM2".  This ID attribute allows the <AminMD> element to be linked to particular files or file groups.  For example, the <File> element below links to the <AdminMD> element shown above, as well as to two additional <AdminMD> elements--one containing rights information and one containing source information.

<File ID ='FID7' MIMETYPE='image/jpg' SEQ = '2' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
ADMID ='ADM2 ADM4 ADM12' GROUPID = 'GID3' USE = 'REFERENCE'>
    <FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018237B.jpg
    </FLocat>
</File>

Notice that the <File> tag has an ADMID attribute, the first item in which is the name ADM2, providing the link to the <AdminMD> element above.  Note that if a particular <AdmMD> element pertains to all of the files in a <FileGrp>, the pertinent ADMID attribute can be specified at the <FileGrp> level rather than as an attribute of each <File> in the <FileGrp>.

You’ll note that the <File> tag also has ADMID names of ADM4 and ADM12. If you examine the XML for the  simplified Breen Diary, you’ll find the administrative metadata sections carrying these names. These sections provide additional administrative metadata describing the files in this file group. Click here to view the <Rights> administrative metadata (ADM4). Click here to view the <Source> administrative metadata (ADM12).

Click here to view the entire Administrative Metadata section of the sample XML document.
 

Structural Map

The structural map section of an MOA2 object defines a hierarchical structure (or structures) which will eventually be presented to users of the electronic archival object to allow them to navigate through it. The <StructMap> element encodes this hierarchy as a nested series of <div> elements. Each <div> carries attribute information specifying what kind of division it is, and also may contain multiple file pointer ( <fptr> ) elements. File pointers specify files (or in some cases, locations within files) that correspond to the portion in the hierarchy represented by the <div>..

To get a sense of the information encoded in <div> elements, consider the following <div> element for the first entry in the Breen Diary:

<div N = '1' TYPE = 'entry' LABEL = 'Friday Nov. 20th 1846'>
    <fptr FILEID = 'FID2' MIMETYPE = 'image/tif' />
    <fptr FILEID = 'FID6' MIMETYPE = 'image/jpg' />
    <fptr FILEID = 'FID10' MIMETYPE = 'image/GIF' />
    <fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'entry1' />
</div>

The type of object represented by this <div> is an diary entry  (TYPE=’entry’), and the entry has a label which should be displayed to the user (‘Friday Nov. 20 1846’).  The <fptr> elements specify the files that correspond with this level of hierarchy: there is a master tif file, a jpeg file, a gif file, and an sgml file containing a transcription. The FILEID atrributes in the <fptr> elements link to the corresponding <File> elements in the file inventory portion of the MOA2 xml document. To see the medium resolution jpeg image associated with the "Friday Nov. 20" entry in the diary, for example, you would look at the <File> element with the ID attribute  of ‘FID6’.

Note in the case of the SGML file (see the last <fptr> element in the example above), there is one additional piece of information provided as an attribute, a TAGID (‘entry1’). This indicates that within the actual file identified within this document by the <File> element ‘FID1,’ you should find an SGML element tag with the ID attribute value of ‘entry1.’ This element within the SGML document marks the beginning of the diary entry in question.

To get a sense of the hierarchical structure that can be encoded in a <StructMap> we need to look at the entire <StructMap> from the sample document.

<StructMap TYPE='logical'>
    <div N = '1' TYPE = 'diary' LABEL = '[Patrick Breen Diary November 20, 1846 - March 1, 1847]'>
        <div N = '1' TYPE = 'entry' LABEL = 'Friday Nov. 20th 1846'>
            <fptr FILEID = 'FID2' MIMETYPE = 'image/tif' />
            <fptr FILEID = 'FID6' MIMETYPE = 'image/jpg' />
            <fptr FILEID = 'FID10' MIMETYPE = 'image/GIF' />
            <fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'entry1' />
        </div>
        <div N = '2' TYPE = 'entry' LABEL = 'sat. 21st'>
            <fptr FILEID = 'FID3' MIMETYPE = 'image/tif' />
            <fptr FILEID = 'FID7' MIMETYPE = 'image/jpg' />
            <fptr FILEID = 'FID11' MIMETYPE = 'image/GIF' />
            <fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'entry2' />
        </div>
        <div N = '1' TYPE = 'letter' LABEL = 'Letter by George McKinstry, tipped into original diary'>
            <div N = '1' TYPE = 'page' LABEL = 'Letter, G. McKinstry, page 1'>
                <fptr FILEID = 'FID4' MIMETYPE = 'image/tif' />
                <fptr FILEID = 'FID8' MIMETYPE = 'image/jpg' />
                <fptr FILEID = 'FID12' MIMETYPE = 'image/GIF' />
                <fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'GMletter1' />
            </div>
            <div N = '2' TYPE = 'page' LABEL = 'Letter, G. McKinstry, Page 2'>
                <fptr FILEID = 'FID5' MIMETYPE = 'image/tif' />
                <fptr FILEID = 'FID9' MIMETYPE = 'image/jpg' />
                <fptr FILEID = 'FID13' MIMETYPE = 'image/GIF' />
                <fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'GMletter2' />
            </div>
        </div>
    </div>
</StructMap>
 

This structural map indicates the document has a three level hierarchy: it is a ‘diary’ with two ‘entry’ components (or <div> elements) and one "letter" component. (The letter is appended at the and of the diary).  The "letter" component has, in turn, two "page" components.

Click here to view the entire <StructMap> of the sample Breen Diary document in context.
 

Conclusion


The DTD provides a flexible mechanism for encoding the descriptive, administrative and structural metadata that describe the files comprising multiple electronic versions of an archival object and their relationships.. It also manages to encode this information in a relatively efficient format. This flexibility and efficiency does come at the cost of some complexity. However, it is anticipated that MOA2 XML documents will be primarily machine-generated, and machine-processed for display, so that complexity should be relatively well hidden from those producing documents, and users examining them.