[Cache from http://sunsite.berkeley.edu/MOA2/papers/dtdtutorial2.htm; please use this canonical URL/source if possible.]
An MOA2 digital object consists of four major sections:
The examples in this tutorial are taken from a simplified
version of the of MOA II encoding of the Breen Diary (from the collection
of The Bancroft Library). This simplified Breen diary consists
of two pages of the diary proper; followed by a two page letter that has
been inserted at the end of the diary. You may view the entire XML
source for the sample document. Each section below is also directly
linked to the pertinent section of the source XML file. You can also
activate
the sample document in the MOA2 Viewer.
External Descriptive Metadata: Descriptive Metadata Reference. A descriptive metadata reference element (<DMDRef>) simply provides the URI for an external source of descriptive metadata. For example, the descriptive metadata reference below points to an external finding aid:
<DescMD>
<DMDRef LOCTYPE='URL' DMDTYPE='FINDAID'>http://sunsite2.berkeley.edu/cgi-bin/oac/calher/breen
</DMDRef>
</DescMD>
This <DMDRef> contains two attributes. The LOCTYPE specifies the type of URI being provided (PURL, HANDLE, DOI and PDI would be other options). The DMDTYPE identifies the type of descriptive metadata being referred to: MARC record, FINDAID, RDF, PICS or OTHER. Additional supported attributes provide for specifying the MIMETYPE of the external descriptive metadata, and a LABEL that can be used to identify the available descriptive metadata to the user.
Click here to view the entire descriptive metadata section of the simplified encoding for the Breen Diary.
Embedded Descriptive Metadata. Embedded descriptive metadata appearing under a <DMD> element can either use generic descriptive metadata elements defined in the DTD, or another user-defined text format (e.g., MARC, Dublin Core) enclosed in a wrapper. The generic descriptive metadata elements provided for by the DTD are grouped under a <GDM> element. These are closely related to the descriptive metadata fields supported by GenDB, a database designed and used at UC Berkeley to gather the metadata needed to construct both EAD and MOA2 objects. (If GenDB is used, a program will automatically created the MOA2 XML objects from the database). The core descriptive metadata elements include title, date, caption, dimensions, and material origin. In addition to the core descriptive metadata elements, the following elements are supported: administrative information, alternate date, content, creator, general notes, physical description, related materials, subobject source, and subject.
Note that a <GDM> element includes an ID attribute.
This attribute provides a unique, internal name for each GDM element which
can be used in the StructMap to link a particular division of the document
hierarchy to a particular GDM element This allows specific sections of
descriptive metadata to be linked to specific parts of the digital object.
In other words, a <GDM> element may pertain to the entire digital object
described by an MOA2.xml file or just a portion of it. For example,
in the case of the Breen Diary it would be possible to set up one <GDM>
element that pertained to the Breen Diary as a whole, and a second <GDM>
element that just pertained to the letter appended to the end of the Diary.
Consider the first <FileGrp> from the simplified encoding of the Breen Diary:
<FileGrp VERSDATE ='12/4/1998'>
<File ID ='FID1' MIMETYPE ='text/sgml' SEQ =
'1' CREATED = '12/4/1998' ADMID = 'ADM4 ADM4 ADM6'
GROUPID = 'GID1' USE = 'ARCHIVE'>
<FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/sgml/breen2.sgm
</FLocat>
</File>
</FileGrp>
The <FileGrp> above represents the single file containing the SGML encoded text transcription of the Breen Diary.The <FileGrp> tag contains a VERSDATE attribute, which provides the date the SGML version was created. It could also include an ADMID, which would provide the names of the various sections within the administrative metadata portion of the document which apply to all the files in the file group. However, the ADMID information may also be supplied, as it is here, at the <File> element level. The <FileGrp> here contains a single <File> element, which identifies the the one file in this file group. Its attributes include such information as the mimetype of the file and its intended use. The <File> element in turn contains a <FLocat> (file location) element. The <FLocat> provides a network location for the file (in this case, a URL), and provides an attribute to specify whether this location is a URL, PURL, URN, etc.
A more complicated <FileGrp> from the simplified Breen Diary is shown below. This aggregates all of the <File> elements that represent medium resolution jpeg versions of the diary. Within the highest level <FileGrp> the <File> elements are divided between two secondary <FileGrp> elements. The first secondary <FileGrp> represents the medium resolution jpegs of the diary pages; the second represents the medium resolution jpegs of the letter pages.
<FileGrp VERSDATE ='4/3/1998'>
<FileGrp>
<File ID ='FID6' MIMETYPE='image/jpg'
SEQ = '1' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
ADMID ='ADM2 ADM4 ADM11'
GROUPID = 'GID2' USE = 'REFERENCE'>
<FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018236B.jpg
</FLocat>
</File>
<File ID ='FID7' MIMETYPE='image/jpg'
SEQ = '2' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
ADMID ='ADM2 ADM4 ADM12'
GROUPID = 'GID3' USE = 'REFERENCE'>
<FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018237B.jpg
</FLocat>
</File>
</FileGrp>
<FileGrp>
<File ID ='FID8' MIMETYPE='image/jpg'
SEQ = '3' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
ADMID ='ADM2 ADM4 ADM9'
GROUPID = 'GID31' USE = 'REFERENCE'>
<FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018266B.jpg
</FLocat>
</File>
<File ID ='FID9' MIMETYPE='image/jpg'
SEQ = '4' X ='512' Y = '768' UNIT = 'PIXELS' CREATED = '4/3/1998'
ADMID ='ADM2 ADM4 ADM10'
GROUPID = 'GID32' USE = 'REFERENCE'>
<FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018267B.jpg
</FLocat>
</File>
</FileGrp>
</FileGrp>
Note that the <File> element contains an ID attribute. This attribute provides a unique, internal name for this file which can be referenced by other portions of the document. You’ll see this type of referencing in action when we look at the Structural Map Section.
Click here to view the entire
file inventory of the sample Breen Diary.
An example of file management information for an image file associated with the Breen Diary appears below:
<AdminMD ID='ADM2'>
<FileMgmt>
<Image>
<Compression>JPEG
</Compression>
<BitDepth BITS='24' />
<ColorSpace>RGB
</ColorSpace>
<Resolution>90
</Resolution>
</Image>
</FileMgmt>
</AdminMD>
Note that each administrative metadata section ( <AdminMD> ) has an ID attribute. In the sample above it is "ADM2". This ID attribute allows the <AminMD> element to be linked to particular files or file groups. For example, the <File> element below links to the <AdminMD> element shown above, as well as to two additional <AdminMD> elements--one containing rights information and one containing source information.
<File ID ='FID7' MIMETYPE='image/jpg' SEQ = '2' X ='512' Y = '768'
UNIT = 'PIXELS' CREATED = '4/3/1998'
ADMID ='ADM2 ADM4 ADM12' GROUPID = 'GID3' USE = 'REFERENCE'>
<FLocat LOCTYPE = 'URL'>http://sunsite.berkeley.edu/~jmcdonou/BREEN/figures/I0018237B.jpg
</FLocat>
</File>
Notice that the <File> tag has an ADMID attribute, the first item in which is the name ADM2, providing the link to the <AdminMD> element above. Note that if a particular <AdmMD> element pertains to all of the files in a <FileGrp>, the pertinent ADMID attribute can be specified at the <FileGrp> level rather than as an attribute of each <File> in the <FileGrp>.
You’ll note that the <File> tag also has ADMID names of ADM4 and ADM12. If you examine the XML for the simplified Breen Diary, you’ll find the administrative metadata sections carrying these names. These sections provide additional administrative metadata describing the files in this file group. Click here to view the <Rights> administrative metadata (ADM4). Click here to view the <Source> administrative metadata (ADM12).
Click here to view the entire
Administrative Metadata section of the sample XML document.
To get a sense of the information encoded in <div> elements, consider the following <div> element for the first entry in the Breen Diary:
<div N = '1' TYPE = 'entry' LABEL = 'Friday Nov. 20th 1846'>
<fptr FILEID = 'FID2' MIMETYPE = 'image/tif'
/>
<fptr FILEID = 'FID6' MIMETYPE = 'image/jpg'
/>
<fptr FILEID = 'FID10' MIMETYPE = 'image/GIF'
/>
<fptr FILEID = 'FID1' MIMETYPE = 'text/sgml'
TAGID = 'entry1' />
</div>
The type of object represented by this <div> is an diary entry (TYPE=’entry’), and the entry has a label which should be displayed to the user (‘Friday Nov. 20 1846’). The <fptr> elements specify the files that correspond with this level of hierarchy: there is a master tif file, a jpeg file, a gif file, and an sgml file containing a transcription. The FILEID atrributes in the <fptr> elements link to the corresponding <File> elements in the file inventory portion of the MOA2 xml document. To see the medium resolution jpeg image associated with the "Friday Nov. 20" entry in the diary, for example, you would look at the <File> element with the ID attribute of ‘FID6’.
Note in the case of the SGML file (see the last <fptr> element in the example above), there is one additional piece of information provided as an attribute, a TAGID (‘entry1’). This indicates that within the actual file identified within this document by the <File> element ‘FID1,’ you should find an SGML element tag with the ID attribute value of ‘entry1.’ This element within the SGML document marks the beginning of the diary entry in question.
To get a sense of the hierarchical structure that can be encoded in a <StructMap> we need to look at the entire <StructMap> from the sample document.
<StructMap TYPE='logical'>
<div N = '1' TYPE = 'diary' LABEL = '[Patrick
Breen Diary November 20, 1846 - March 1, 1847]'>
<div N = '1' TYPE = 'entry'
LABEL = 'Friday Nov. 20th 1846'>
<fptr FILEID = 'FID2' MIMETYPE = 'image/tif' />
<fptr FILEID = 'FID6' MIMETYPE = 'image/jpg' />
<fptr FILEID = 'FID10' MIMETYPE = 'image/GIF' />
<fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'entry1' />
</div>
<div N = '2' TYPE = 'entry'
LABEL = 'sat. 21st'>
<fptr FILEID = 'FID3' MIMETYPE = 'image/tif' />
<fptr FILEID = 'FID7' MIMETYPE = 'image/jpg' />
<fptr FILEID = 'FID11' MIMETYPE = 'image/GIF' />
<fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'entry2' />
</div>
<div N = '1' TYPE = 'letter'
LABEL = 'Letter by George McKinstry, tipped into original diary'>
<div N = '1' TYPE = 'page' LABEL = 'Letter, G. McKinstry, page 1'>
<fptr FILEID = 'FID4' MIMETYPE = 'image/tif' />
<fptr FILEID = 'FID8' MIMETYPE = 'image/jpg' />
<fptr FILEID = 'FID12' MIMETYPE = 'image/GIF' />
<fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'GMletter1' />
</div>
<div N = '2' TYPE = 'page' LABEL = 'Letter, G. McKinstry, Page 2'>
<fptr FILEID = 'FID5' MIMETYPE = 'image/tif' />
<fptr FILEID = 'FID9' MIMETYPE = 'image/jpg' />
<fptr FILEID = 'FID13' MIMETYPE = 'image/GIF' />
<fptr FILEID = 'FID1' MIMETYPE = 'text/sgml' TAGID = 'GMletter2' />
</div>
</div>
</div>
</StructMap>
This structural map indicates the document has a three level hierarchy: it is a ‘diary’ with two ‘entry’ components (or <div> elements) and one "letter" component. (The letter is appended at the and of the diary). The "letter" component has, in turn, two "page" components.
Click here to view the entire
<StructMap> of the sample Breen Diary document in context.
The DTD provides a flexible mechanism for encoding
the descriptive, administrative and structural metadata that describe the
files comprising multiple electronic versions of an archival object and
their relationships.. It also manages to encode this information in a relatively
efficient format. This flexibility and efficiency does come at the cost
of some complexity. However, it is anticipated that MOA2 XML documents
will be primarily machine-generated, and machine-processed for display,
so that complexity should be relatively well hidden from those producing
documents, and users examining them.