@N: the 'name' of the column as presented for display, unless n="0", in which
case it is a redundant wrapper column on a single-column text.
Constraint: id must be unique among the entire CDLI corpus.
-->
<!ELEMENT column (l|nonl)+ >
<!ATTLIST column
id ID #REQUIRED
n CDATA #REQUIRED
certain (y|n) "y">
<!--
Lines are wrapped in L.
@N: the 'name' of the line as presented for display
Constraint: id must be unique among the entire CDLI corpus.
-->
<!ELEMENT l (nong|n|w|g|cg|gg|igg)*>
<!ATTLIST l
id ID #REQUIRED
n CDATA #REQUIRED>
<!--
Words are wrapped in W.
-->
<!ELEMENT w (g|cg|gg|igg)*>
<!ATTLIST w
lang NMTOKEN #IMPLIED
xml:lang NMTOKEN #IMPLIED >
<!--
Numbers are wrapped in N. The content is the word or grapheme sequence
constituting the number; attributes give information on the numeric system
and the value for which the sequence is taken to stand.
-->
<!-- add systems here as desired -->
<!ENTITY % systems "(
sexagesimal
| capacity
| unidentified
)" >
<!ELEMENT n (w|g|cg|gg|igg)*>
<!ATTLIST n
value CDATA #IMPLIED
system %systems; "unidentified" >
<!-- IGG = Interpretive Grapheme Group; a mechanism for inline presentation
of both an interpretive version of what the graphemes on the tablet were
supposed to be, and the literal sequence occurring on the tablet.
By definition, the first child of the group is the interpretation; the second
child is the literal grapheme sequence on the object.
-->
<!ELEMENT igg ((g|cg|gg),(g|cg|gg))>
<!ATTLIST igg
type (ordering|correction|explanation) #REQUIRED >
<!--
GG (grapheme group) is exclusively a scoping mechanism for treating
several graphemes as a single unit.
-->
<!ELEMENT gg ((g|cg),(g|cg)+)>
<!--
Graphemes are wrapped in G.
** Grapheme regexps to be added. **
-->
<!ELEMENT g (#PCDATA)>
<!--
Compound graphemes are wrapped in CG.
** Content model to be added. **
-->
<!ELEMENT cg ANY >
<!--
The grapheme attribute definitions were made with an underlying
assumption that CDLI transliterations would be as simple as possible for
manipulation as data, and that wherever possible editorial commentary and
squeamishness should be reserved to a commentary file.
The commentary is not expected to be machine-manipulated, beyond the
assumption that commentary entries will reference the ID at the L level,
such that an HTML version of the corpus could include machine-generated
links back from the lines to the commentary.
Items to be removed/reserved to the commentary include:
- erasures
- palimpsest writings
- alternate possible identifications; e.g., ki/di
- alternate readings; e.g., gin/du
- explanatory addition of sign name; e.g., mu4(TUG2)
@TYPE:
Defines whether the grapheme content is a sign-value, a sign-name or a
reference to an entry in a sign list. Grapheme readings and sign-names are
not differentiated by use of lowercase and uppercase, but instead by use of
@TYPE.
@FUNCTION:
Allows qualification of whether graphemes are glosses or not. No distinction
is made between types of gloss. Glosses are simply characterized as
pregloss (i.e., occurring before what they gloss) or postgloss (i.e., occurring
after what they gloss).
-->
<!ATTLIST g
nametype (signref|listref) #IMPLIED
breakage (damaged|missing) #IMPLIED
sign (unusual.form|really.is|ed.emended|ed.removed|ed.supplied
|scribe.implied) #IMPLIED
uncertain (y) #IMPLIED
collated (y) #IMPLIED
gloss (pre|post) #IMPLIED
>
<!--
All the non-x types (noncolumn, nonl (non-line) and nong (non-grapheme)
share a common set of attributes and content model.
The content model, PCDATA, is intended purely for the preservation of the
verbatim text of comments in legacy data.
@TYPE:
break, gap and ruling: self-explanatory
traces: for extents with signs or traces which have not been transliterated
as data
image: for drawings included by the scribe
seal: for seals
@UNIT:
self: derived from object-oriented programming practice;
'self' indicates that the extent is given in units of the type of the element on
which the attribute occurs: for noncolumn, self means 'column(s)'; for nonl,
self means 'line(s); for nong, self means 'grapheme(s)'.
Co-constraint notes (these cannot be expressed in the DTD):
For type=image, UNIT may be 'self'. If type=image and unit=quantity, the
extent indicates the amount of 'self' which is covered with the image.
Regardless of the value of UNIT, the REF attribute may be used when
type=image to give a URL which shows the image.
For type=seal, UNIT is always 'self'; the REF attribute is always used. The
reference is either to a text-local seal-transliteration or to the seal
corpus entry of which the instance seal is an exemplar.
quantity: indicates that the extent is given as a quantity
@REF:
A reference. For seals, the REF should give the local ID of the seal whose
occurrence is being noted in the non-x element.
For images, the REF should be a URL (note: this is technically never
necessary for CDLI; an exception could be a situation in which a specific file
contains a shot of the image which is better than, or more specifically
targetted than, the images which give shots of the tablets).
@EXTENT: the extent of the non-x material. Should match
\d+(mm|cm)?
i.e., it may be a number, or a measurement in mm or cm.
-->
<!ELEMENT noncolumn (#PCDATA)>
<!ELEMENT nonl (#PCDATA)>
<!ELEMENT nong (#PCDATA)>
<!ENTITY % non-x-attr-set "
type (broken|traces|gap|ruling|image|seal) #REQUIRED
unit (self|quantity|ref) #IMPLIED
extent CDATA #IMPLIED
ref CDATA #IMPLIED
">
<!ATTLIST noncolumn %non-x-attr-set; >
<!ATTLIST nonl %non-x-attr-set; >
<!ATTLIST nong %non-x-attr-set; >