[Cache from http://www.carc.aist.go.jp/nlwww/i-content/GDA/tagset.html; please use this canonical URL/source if possible.]
The GDA Tag Set |
Draft Version 0.65 (June 12, 2001)
HASIDA Kôiti
Revision history | GDA Home Page | Japanese Tagging Manual |
Table of Contents
This document is properly displayed by Netscape Navigator 4.0 or later
set up to enable stylesheets.
Tags, attribute names, and attribute values are in bold face in the text. Technical terms, tags, attribute names, and attribute values are in red bold font where they are introduced. * (asterisk) means that the term is used not widely but locally in this and related documents. Examples are in a green typewriter font. Annotation task is normally done at a certain fixed level of minuteness, but the levels of minuteness are indeterminate in the examples below. |
1. Introduction |
This draft discusses the GDA (Global Document Annotation) tag set, providing rationale of tags and examples of their usage. This may serve as a tagging manual for human annotators with plenty knowledge of both theoretical and computational linguistics, but the real tagging manual for most annotators must be provided on the basis of this document.
The GDA tag set aims at making semantic and pragmatic structure of electronic texts automatically recognizable. It is being developed so as to be easy to embed into TEI, EAGLES, and HTML tag sets. So the meanings of the GDA tags should be maximally consistent with the three tag sets. Some tags are imported from them, but when such a tag is defined in two or more of them, the meaning in HTML is preferrred to that of TEI and EAGLES, because GDA tags are expected to be very often embedded in HTML files.
This draft is released for the sake of public survey and evaluation. Empirical evaluation is necessary on both how useful the tags described below are for practical applications and how consistently people can annotate documents with those tags. We would like to improve the tag set by taking results of such evaluations into account, before announcing it for public, extensive use.
To optimize the benefit per cost of tagging, we try to design as simple a tag set as posibble which captures enough contents for practical applications. The semantic and pragmatic content of an utterance might be unlimitedly complex due to the complexity of the context. An appropriate degree of complexity of the tag set could be identified, however, because the present technology concerning natural language can effectively process only limited sorts of information. For instance, tagging for metonymy may not be very useful. The tag set should go along with the contemporary state of the art. We can refine the tag set when more detailed tags become useful as technology advances.
Here we do not restrict ourselves to any single NLP/AI application, but try to address as many aspects of language which seem useful in one of translation, retrieval, summarization, question answering, case-based reasoning, presentation, and so forth. Users interested in only some of these applications may want to use subsets of the tag set. In this connection, the GDA tags are almost entirely optional, as application technologies do not normally require exhaustive tagging. In fact, many relatively simple untagged sentences can be analyzed right by the current technology. So we have tried to design the GDA tag set in such a way that more minute annotation entails more information; in particular, if you do not annotate then you do not commit yourself to any specific interpretation.
The GDA tag set is not specific to any particular language, though the example passages below are mostly in English. The usage of the tags is subject to some customization for particular languages, but we want to use the same vocabulary for the sake of coordination across different languages. Of course different tagging manuals are necessary for different languages. However, we hope to design the tag set so that it is easy for you to write such a manual once you have understood the idea behind the tag set.
The tag set is not a linguistic theory. It encodes semantic and pragmatic structures of documents, remaining somewhat neutral among linguistic theories. Encoding semantic or pragmatic structures and capturing linguistic generalizations are different issues. In particular, we will sacrifice syntactic generalization very often, because syntax is not our primary concern but used as a partial aid for encoding semantics and pragmatics. This could be justified because people probably have better intuition about semantics and pragmatics than about syntax. Of course linguistic theories are very helpful in designing the tag set, but what is important is that the tag set can represent the semantic and pragmatic structure of a wide range of documents, but not that it captures linguistic generalizations. Needless to say, we will attempt to capture as much linguistic generalization as possible as far as we do not sacrifice clarity and ease of encoding semantic and pragmatic structures.
In principle, null annotation entails no information in the GDA tag set. This is to allow partial annotation. In particular, nothing is meant by the absense of a tag or an attribute. For instance, lack of specification of the scope of a plural noun phrase (as an alleged quantifier) does not mean that the noun phrase has no scope. The GDA tag set sometimes allows you to entail some information by lack of annotation, but the tag set is designed in such a way that you should be aware that you are meaning something with null annotation in such cases.
The rest of this document consists of two parts.
The first is Section 2, which may be regarded as a user manual.
This section contains subsections each of which addresses
how to annotate a particular type of linguistic construction
such as dependency, anaphora, and scoping.
The second part is Section 3 and thereafter, which may be thought
of as a reference manual, discussing how to use each tag and attribute.
2. Annotation of Semantic Structures |
The GDA tag set is designed so that the GDA-annotation reduces the ambiguity in mapping a document to a sort of entity-relation graph (or semantic network) representing the underlying semantic structure. The tag set does not directly encode such graphs, though it should be straightforward to encode them with RDF or related tag sets such as DAML.
A semantic network consists of nodes and links, which may be labelled with concept identifiers. For instance, sentence `Tom met a girl' may have the semantic structure shown in Figure 1.
Figure 1: A semantic structure of `Tom met a girl.'
Operator identifiers are concept identifiers representing the (syntactic, semantic, or other) functions of function words (auxiliary, preposition, postposition, article, etc.) and function morphemes (tense marker, number marker, etc.). In Figure 1, the gray rectangles are operator identifiers. Many operator identifiers, such as agt (agent), sg (singular), and past, are defined native in the GDA tag set, and thus are of the simplest form without the `ont:' prefix. An operator identifier is either a relation identifier or a unary operator identifier. A relation identifeir represents a binary relation and is often associated with a dependency. For instance, the dependency of the subject on the verb may carry an AGENT relation, which is indicated by relation identifier agt. A unary operator identifier represents the function of auxiliary, article, some inflectional morpheme, or the like. Both relation identifiers and unary operator identifiers may lack explicit linguistic markings. For instance, no explicit word or morpheme designates the agt relationship between `met' and `Tom' in `Tom met a girl.'
The disks in Figure 1 are entities such as objects and events. As mentioned above, the dashed arrows represent membership to or containment in concepts. For instance, node m1 is an instance of concept eng:meet, thus representing a meeting event. Similarly, node g2 represents a single girl. The labelled solid arrows are instances of (primitive) binary relations designated by the labelling relation identifiers. For instance, the arrow from node m1 to Tom means that Tom is the agent of meeting event m1.
The semantic structure in Figure 1 can be encoded as in Figure 2.
<su> <persnamep opr="agt">Tom</persnamep> <v sem="past.eng:meet">met<v> <np opr="obj"> <adp sem="sg">a</adp> <n sem="eng:girl">girl</n> </np>. </su> |
Figure 2: Annotation encoding the semantic structure in Figure 1. |
The semantic structure of each GDA element consists of two parts which are called the head semantics and the operator, partially encoded by the sem and opr attributes, respectively. The value of each of these attributes in the above example is a concept string, which is a sequence of one or more concept identifiers connected via dots (`.'). In general, the value of sem and opr attribute may be one or more concept strings separated by blanks. The only nonatomic concept string in the above example is past.eng:meet. The order among the concept identifiers in a concept string should be consistent with the directions of the solid arrows in the corresponding part of the semantic structure. For instance, the following annotation is possible, where past and ont1:buy#2 must precede obj and obj must precede sg and deu:Blume, but there is no restriction on the order between past and ont1:buy#2 or that between sg and deu:Blume.
<v sem="past.ont1:buy#2.obj.sg.deu:Blume">bought a flower</v>Compare this with the corresponding semantic structure below:
Figure 3: The directed path represented by
past.ont1:buy#2.obj.sg.deu:Blume.
In general, a concept string represents a directed path as in Figure 3, which is in general a linear sequence of disk nodes connected via solid arrows in the same direction, where the nodes are instances of lexical concepts and unary operators, and the arrows are instances of binary relations, and the order among the concept identifiers in the concept string is compatible with the order along the directed path.
In general, the head semantics and the operator of a GDA element are directed paths, as shown in Figure 4:
Figure 4: Two parts of semantic structure of a GDA element.
The initial node of the main seantics of GDA element X is equal to the final node of the operator of X and called the body node of X and written body(X). The initial node of the operator of X is called the tail node of X and written tail(X).
There are four types of syntactic constructions:
dependency structure,
coordinate structure,
apposition, and repair.
Dependency structures and appositive structures partially overlap.
The type of the syntactic construction at the top of an element
is specified by the syn attribute.
2.1. Dependency Structure |
If GDA element X depends on another GDA element Y or X is the head of Y, then tail(X) = body(Y). In Figure 1, for instance, tail(`Tom') = body(`met') = m1, where the solid arrow labelled with agt is the operator of `Tom.' The operator of X may be null, in which case body(X) = tail(X) = body(Y). In fact, body(`a') = tail(`a') = body(`girl') = f2 in `a girl' in Figure 1. The meaning of a preposition must be annotated by an opr attribute, because a preposition connects its object and the head. In Figure 2, for instance, `for' has sem="ben".
The opr attribute encodes a relationship in which the current element stands with respect to the element that it depends on, as in:
<v>go <adp opr="fin">to Paris</adp></v>The opr attribute with a relation identifer as its value may be attached to a function word, as in:
<v>go <ad sem="fin">to</ad> Paris</v>
Relation identifers can be combined to make compound relation identifers. There are two types of combination. If a and b are relation identifers, then so are a.b and a b, too. The operator `.' has precedence over the blank operator. That is, a.b c is the combination of a.b and c through blank.
a.b represents the composition of a and b as binary relations. That is, x and z stand in relation a.b, if and only if there exists y such that x and y stand in relation a and y and z stand in relation b.
Multiple concept strings in the value of a sem or opr attribute are conjunctive, which represent the intersection of the denotation of the concept strings. Here we generally regard the denotation of a concept string as a binary relation, even when the concept string is a unary operator. So the conjunction of concept strings denotes the intersection of the binary relations denoted by the concept strings. (A binary relation is a subset of the Cartesian product of two sets. So the conjunction of several binary relations is the intersection of those subsets.) The value of a sem or opr attribute is multiple concept strings when the meaning in question cannot be captured by a single category, such as below.
I came <adp opr="res pur">so that I met him</adp>.
In `Kim likes Mary better than Betty,' for instance, we must specify whether `Betty' is compared with `Kim' or `Mary.' Similarly, in `Kim blamed Mary together with Betty,' we want to mark whether Kim and Betty blamed Mary or Kim blamed Mary and Betty. To implement this in general, we use extended relation identifers of the form a-b, where a and b are relation identifers but not extended relation identifers. a is a relation identifer such as cmp and bsc, and b is a relation identifer to indicate which element is in parallel with the current element, as in what follows:
The relation identifers are used as attributes as well, which we will call relational attributes. While the opr attribute appears in the depending element (the satellite in the case of rhetorical relations), the relational attribute appears in the governing element (the nucleus) and points to the depending element. Namely, the value of the relational attribute is the referential index of the element, if any, which semantically or pragmatically depends on the element containing this relational attribute. Of course the attribute name indicates the type of the dependency.
Coreference is encoded by the relational attribute eq.
<np id="j0">John</np> beats <adp eq="j0">his</adp> wife.
Most of the syntactic constructions are dependency structures. The dep attribute can specify dependencies across element boundaries. The other, ordinary dependencies hold within elements. When an intrasentential element lacks the dep attribute, its syntactic relationship with the surrounding context is specified by the syn attribute of its parent element. syn="f" entails forward dependencies, whereas syn="b" backward dependencies.
The complexity of tagging is reduced by syn="f" and syn="b". For instance,
<adp syn="b">in order to talk to one of them</adp>is equivalent to the following:
<adp>Thus syn="b" (may be syn="f" in the following example) allows us to dispense with tags embedding each other. On the other hand, a structure in which many constituents depend on one constituent can be easily treated with phrasal tags for the dependants:
in
<seg>
order
<seg>
to
<seg>
talk
<seg>
to
<seg>
one
<seg>
of them
</seg>
</seg>
</seg>
</seg>
</seg>
</seg>
</adp>
<su syn="b">
<np>I</np>
went
<np>there</np>
<np>yesterday</np>
<adp>by foot</adp>
<adp>with you</adp>
<adp>after lunch</adp>.
</su>
2.2. Nonstandard Dependency |
Nonstandard dependencies including crossing dependencies and parentheticals are marked up by the dep attribute. Here is an example of crossing dependency:
<su>Although `a man' is a phrasal element, it is a head due to the dependency enforced by the dep attribute. Note that the following tagging is wrong, because it entails that `who I don't know' depends on `saw'.
<np>I</np>
saw
<np id="m0">a man</np>
<np>yesterday</np>
<vp dep="m0">who I don't know</vp>
</su>
<su>
<np>I</np>
saw
<np mod="w0">a man</np>
<np>yesterday</np>
<vp id="w0">who I don't know</vp>
</su>
Parentheticals with normal dependencies can be annotated normally:
<su syn="b">Parentheticals without dependencies can be annotated with dep="nil":
<np>
Admission,
<adp>even of a regular customer</adp>,
</np>
is prohibited.
</su>
<su><su> may be used to indicate that the parenthetical element does not depend on anything around it:
<np>That dog</np>,
<adp dep="nil">
or <vp><np>it</np> may be <np>a cat</np></vp>
</adp>,
is scary.
</su>
<su>Parentheticals with inward dependencies can be annotated with the dep attribute:
<np>That dog</np>,
<su>
or <vp><np>it</np> may be <np>a cat</np></vp>
</su>,
is scary.
</su>
<su dep="S" syn="b" opr="cnt">
<np>You</np> should,
<su id="S">I suppose</su>,
do <np>it</np> by yourself.
</su>
2.3. Other Local Constructions |
Local syntactic constructions other than dependency are coordination, apposition, repair, error, and idiosyncratic structures, specified by syn="c", syn="a", syn="r", syn="e", and syn="i", respectively. These structures consist of peer terms and operators. In `A and B,' for instance, `A' and `B' are peer terms and `and' is an operator. <fo> (forward affixing operator), <bo> (backward affixing operator), <io> (infix operator) elements are operators when they are children of coordinate, apposition, repair, error, and idiosyncratic structures.
syn="c" specifies a coordination, which may have a scope. In general, scoping is encoded by the sce (scoping element) attribute, which points at the scoping element (the coordinate structure in this case). In the following, the whole sentence is the scope of `London and Paris.'
<su sce="LP">I lived in <np id="LP" syn="c">London and Paris</np>.</su>A collective coordination, which lacks a scope, is encoded with sce="self".
<np syn="c" sce="self">London and Paris</np> are different.
Appositive strutucre involving gapping is specified by syn="a" and annotated similarly to coordinate structure.
<su syn="a">In apposition, two elements correspoinding via pel or phd are regarded as coreferential. So `it' corefers with `the present' and `to Mary' corefers with `to my wife' here.
<vp>
<np>I</np>
gave
<np id="it">it</np>
<adp id="mary">to Mary</adp>
</vp>,
<bo>that is</bo>,
<vp>
<np pel="it">the present</np>
<adp pel="mary">to my wife</adp>
</vp>.
</su>
Again here is an explicit alternative:
<su syn="a">
<vp>
<np>I</np>
gave
<np id="it">it</np>
<adp id="mary">to Mary</adp>
</vp>,
<bo>that is</bo>,
<vp>
<np ed=":">I</np>
<v ed=":">gave</v>
<np pel="it">the present</np>
<adp pel="mary">to my wife</adp>
</vp>.
</su>
Repair involving gapping is specified by syn="r".
<su syn="r">
<vp>
I gave
<np id="boy">the boy</np>
<adp id="dog">to the dog</np>
</vp>,
<io>oh excuse me</io>,
<vp>
<np pel="boy">the dog</np>
<adp pel="dog">to the boy</adp>
</vp>
</su>
2.4. Relative Clause |
A relative clause is a phrase governed by a noun semantically related with some parts of it. Just like a topicalized sentence, a relative clause with a WH complementizer is regarded as a constituent (typically a <vp> element) whose head is the main verb of the clause; the WH complementizer depends on the main verb.
If the relative clause lacks a WH complementizer, then the relation between the noun governing the relative clause and the pronoun (a gap or a resumptive pronoun) coreferring the governing noun can be encoded by a relational attribute, as below.
The relation between the relative pronoun, if any, and the head noun governing the relative clause can be encoded by an eq attribute. The relation between the gap (or resumptive pronoun) in the relative clause and the WH complementizer (`which' and `for whom' in the examples below) can be encoded by relational attribute if necessary, too.
opr="uba" means that `whom' does not semantically depend on `want'.
<np>mgn cannot be used if `work' is an element.
<n>people</n>
<vp syn="b">
<np id="X" opr="uba" eq="mgn">whom</n>
<np>I</np>
want to
<v obj="X">meet</v>
</vp>
</np>
<np syn="b">plg cancels uba, so that `work' is associated with `whom' via ben.
the
<n>painter</n>
<vp>
<np><ad eq="mgn">whose</ad> work</np>
surprised
<np>me</np>
</vp>
</np>
<np>Parasitic gaps are not distinguished from normal gaps.
the
<n>man</n>
<vp>
<adp id="X" opr="uba.ben">for <n eq="mgn">whom</n></adp>
<np>I</np>
can
<v plg="X">work</v>
</vp>
</np>
<np>
the
<n id="B">book</n>
<vp syn="b">
<n id="X" opr="uba" eq="B">which</n>
<np>I</np>
have
<v obj="X">lost</v>
<adp>before <v obj="X">reading</v></adp>
</vp>
</np>Infinitival relative clause:
the <n>tool</n>If the noun governing the relative clause is the closest noun govern+ning the gap or the resumptive pronoun, then the value of the relation attribute may be mgn (minimal govern+ing noun), as follows.
<vp syn="b">
<adp opr="mns">by <np eq="mgn">which</np></adp>
to <v>open</v> it
</vp>
- <np>
<n>people</n>
<vp syn="b"><np>I</np> want to <v obj="mgn">meet</v></vp>
</np>- <np syn="f">the <n>bar</n> <vp obj="mgn"><np>I</np> love</vp></np>
- <np syn="f">the <n>man</n> <vp>I think <vp aen="mgn">is crazy</vp></vp></np>
- <np syn="b"><adp>the</adp> <n>person</n> to <vp obj="mgn">blame</vp></np>
2.5. Topicalization, etc.
Topicalization and other long-distance dependencies are interpreted by relational attributes. Topicalization is regarded as essentially the same structure as relative clause with WH complementizer. Relation identifier uba means that the element has no direct semantic relationship with what it syntactically depends on. So extaposed elements tend to have uba. plg cancels uba.
<su>
<adp opr="uba.gol" id="X">To her</adp>,
<v>
I think
<vp plg="X">he sent a bouquet</vp>
</v>.
</su><su>
<np opr="uba" id="X">This</np>
is
<ajp>easy <vp>for me to <vp obj="X">reach</vp></vp></ajp>
</su><su>
<np opr="uba" id="WH">What</np>
<v>
do you like to <vp obj="WH">eat</vp>
</v>?
</su>Combination of a relative clause and another long-distance dependency:
the <n>piano</n>
<vp>
<np eq="mgn" id="WH" opr="uba">which</np>
<np id="S" opr="uba">this sonata</np>
is easy
<vp>
to make her
<v obj="S">play</v>
<adp arg="WH">with</adp>
</vp>
</vp>
2.6. Gapping and Abstraction
In coordination (whether distributive or collective), apposition, and repair, gapping can be encoded with the pel (parallel element) and the phd (peer head) attributes. When pel appears in a peer term, the parts of the peer terms which neither point nor are pointed by pel are copied to the corresponding loci in the other peer terms. Similarly, the constituents depending on an element pointed by phd are copied to be dependants on the element with that phd.
<su syn="c">This means `perhaps nearly 270 people were killed in the earthquake, and 1,400 people were reported injured in the earthquake' where the underlined parts are copies due to pel and phd. pel and phd are adaptations of = and *RNR*, respectively, of Penn TreeBank. The sce="top" entails that the two peer terms concern the same earthquake.
<vp syn="b">
<adp pel="nil">Perhaps</adp>
<np opr="ctl.obj"><nump id="n">nearly 270</nump> people</np>
were
<v id="k" phd="inj">killed</v>
</vp>
and
<vp syn="b">
<nump pel="n">1,400</nump>
<vp syn="b" pel="k">
reported
<v id="inj">injured</v>
in <np sce="top">the earthquake</np>
</vp>
</vp>
</su>Another solution to gapping is to interpret gaps more explicitly:
<su syn="c">The eq="E" specifies coreference, which in this case means that the two peer terms concern the same earthquake.
<vp syn="b">
<adp>Perhaps</adp>
<np><nump>nearly 270</nump> people</np>
were killed
<adp eq="E"/>
</vp>
and
<vp syn="b">
<np><nump>1,400</nump> <n ed=":">people</n></np>
<v ed=":">were</v>
reported injured
<adp id="E">in the earthquake</adp>
</vp>
</su>Below is another example of how to use pel and phd.
<su syn="c">Here is an explicit solution using fil:
<vp syn="b">
<np syn="b">
Passengers going to
<placename id="S">Shinjuku</placename>
</np>,
please go to
<np id="T" syn="b">Track No.1</np>
</vp>,
and
<vp syn="b">
<placename pel="S">Ueno</placename>,
<np syn="b" pel="T">Track No.3</np>
</vp>.
</su><su syn="c">
<vp syn="b">
<np syn="b">
Passengers going to
<placename>Shinjuku</placename>
</np>,
please go to
<np syn="b">Track No.1</np>
</vp>,
and
<vp syn="b">
<np syn="b">
<bo syn="b" ed=":">passengers going to</bo>
<placename>Ueno</placename>,
</np>
<bo syn="b" ed=":">please go to</bo>
<np syn="b">Track No.3</np>
</vp>.
</su>
2.7. Scope
Scopes of quantifiier, negation, modal operator, conditional operator, coordination, plural, and so forth are encoded by the sce (scoping element) attribute.
The sce value of an element A is the id value of another element B such that A is a scope of B. Here A must command B; an element commands another element when the former contains the latter or contains an element which either points at the latter element via a relational attribute or pointed by the latter element via the dep attribute. For instance, the following annotation entails the interpretation that each of three collectors bought one same paining, so that this painting has been bought three times as far as the sentence entails.
<su sce="c3">
<np id="c3">Three collectors</np>
have bought <np sce="top">a painting</np>.
</su>An optionally scope-introducing element, such as `three collectors' and `Tom and Mary,' actually introduces a scope only when pointed via the sce attribute. In the above exmaple, if cse="c3" were absent, the interpretation is that the three men cooperatively bought a car, so that the car was bought once.
The scopes of elements such as `every man' and `Tom or Mary,' which always introduce scopes, are assumed to be the minimal dominating <vp> or <np> elements. For instance,
<su><np syn="c">Tom or Mary</np> came.</su>means that Tom came or Mary came, where the scope of `Tom or Mary' is the entire sentence.For instance, the following means that each of three collectors bought a painting, entailing that three probably distinct paintings were bought.
<su sce="c3"><np id="c3">Three collectors</np> bought <np sce="c3">a painting</np>.</su>Here the referent of `a painting' is in the scope introduced by `three collectors.' Since there are three instantiations of this scope, corresponding to the three collectors, there are three possibly distinct paintings each of which was bought in one of those instantiations.For another example, the de dicto reading of `Jane wants to marry a doctor,' which entails no specific doctor, is marked up as follows:
Jane <v id="w1">wants</v> to marry <np sce="w1">a doctor</np>.Here the doctor is situated in the scope introduced by the modal operator `wants.' Being the head of the complement of `want,' `marry' is forced to be situated in the scope of `wants.' So the sce attribute need not be specified for `marry.' As for the other elements, absense of the sce attribute entails no specific default reading. To entail the de re reading involving a specific doctor, `a doctor' must have an sce attribute pointing to an ancester <q> or <quote> or <dv> element or the whole document (represented by top).Similarly, the reading of `every man loves a woman' in which `a woman' is in the outermost situation (that is, one woman is loved by all the men) is encoded by the following annotation:
Every man loves <np sce="top">a woman</np>The other reading, in which the referent of `a woman' is in the scope of `every man' (different men may love different women), is encoded by:<np id="e0">Every man</np> loves <np sce="e0">a woman</np>under construction
3. General Attributes
The following attributes are globally applicable to all the GDA tags exept <anchor/> and <alt/>. These attributes are all optional. lang, next and prev are straightforward imports from TEI.
- id
- Unique identifier for the element. The value must begin with a letter and can contain letters, digits, hyphens, and periods.
- lang
- Language of the text in this element; if not specified, the language is assumed to be the same as in the surrounding context. The value should be a three-letter language identifier in ISO 639-2, such as eng (English) and jpn (Japanese).
- resp
- The annotator. That is, the person responsible for annotation.
- dtp
- Description type. The possible values are listed below.
- nm
- Normal. This is the default value.
- sc
- Socalled. Word or phrase for which the addressor disclaims responsibility.
- qt
- Quotation. Used when the quoted material is at the same level of discourse. <q> is used otherwise.
- mt
- Word or phrase not used but mentioned.
- <ajp dtp="mt">Long</ajp> is a short word.
- so
- Onomatopoeia (sound).
- <su dtp="so">Bang!</su>
- mn
- Miosis (manner).
- vi
- Icon (vision).
- That's great <ij dtp="vi">:-)</ij>.
- op
- Option.
- from <bo dtp="op">(inside of)</bo> the car
- em
- Emphasis.
- That's a <n dtp="em">*common sense*</n>.
- next
- Next element's id value in an aggregate.
- prev
- Previous element's id value in an aggregate.
<q id="q1" who="j1" next="q2">`<su id="s1" next="s2">If it rains,</su>'</q> <np id="j1">John</np> said, <q id="q2" prev="q1">`<su id="s2" prev="s1">I won't come.</su>'</q>- dep
- An ID attribute pointing to the governor in intrasentential syntactic dependency or intersentential rhetorical dependency. If dep points to something belonging to an upper context, the element having this dep attribute is an insertion from that context.
- sbu
- The id values of the child elements which does not explicitly occur in the current element (sub-utterances). Used to encode ambiguity typically in output of automatic analysis by computers.
- cocu
- Concurrent utterance. An IDREFS attribute. Span tags (<span>, <bspan>, and <espan>) are used if cuncurrent utterances are not discourse or syntactic consituents.
- coiu
- Coinitial utterance. An IDREFS attribute. Span tags are used when coinitial utterances begin in the middle of words.
- err
- Possibility of tagging error. Mainly intended for automated annotation. Typical usages and their meanings follow:
err="t" (tag name is questionable)
err="b" (position of begin tag is questionable)
err="e" (position of end tag is questionable)
err="a opr" (opr attribute is questionable)
err="?a opr" (combination of opr and element is questionable)
err="tb" (tag name and position of begin tag is questionable)The syn attribute below can appear in every tag except <anchor/> and <alt/>.
- syn
- Synthesis. Type of syntactic or pragmatic construction among the child elements and texts. The possible values are the following.
- n
- No relation among child elements.
- d
- Dependency. The default value of the syn attribute for intrasentential tags in any language. If an element has value d for the syn attribute, then tags can be inserted into its child texts so that the following conditinos hold.
Note that the dependencies in an element whose syn value is d are not in general uniquely determined. In
- Each child is either an element or a symbol (such as comma and period).
- An empty head element must be created if all the existing children are phrasal elements and symbols.
- There is exactly one child non-phrasal element which does not depend on any other element.
- Each of the other elements depends on a non-phrasal element.
<np><aj>American</aj> <n>stock</n> <n>holder</n></np>for instance, the three child elements may have arbitrary dependency relations under the above conditions. Two readings are possible which respect the English syntax: stock holder who is American and holder of American stock.The first in the examples below means that I have boiled several eggs, wheres the latter means that I have several boiled eggs.
- <su><np>I</np> have <vp><v>boiled</v> <np>eggs</np></vp>.</su>
- <su><np>I</np> have <np><vp>boiled</vp> <n>eggs</n></np>.<su>
- f
- Forward dependency chain. If an element has syn="f", then it is possible to insert tags into its child texts so that the following conditions hold.
Each child element is interpreted as lacking dep, pel, and phd attributes unless they are explicitly specified. So syn="f" uniquely determines the dependency relationships among the child elements, unlike syn="d". Note that the non-phrasal elements mentioned above may be either head elements or non-intrasentential elements such as <p> and <ss>.
- Each child is either an element or a symbol (such as comma and period).
- The elements created by the insertion of tags are not empty, except that an empty head copular element must be created if all the existing children are phrasal elements and symbols.
- The created elements lack dep, pel, and phd attributes.
- There is exactly one child non-phrasal element which does not depend on any other element.
- Each of the other elements depends on the nearest non-phrasal element to its right if any, and otherwise the nearest non-phrasal element to its left.
The following Russian example shows how to use the latter half of condition 2 above. Note that an empty head copular element must be assumed to be there because the existing children are phrasal elements and a symbol.
<su lang="rus" syn="f"><np>Eta</np> <np>dom</np>.</su> <su lang="rus" syn="f"><np>this <np>house `This is a house.' - b
- Backward dependency chain. The reversal of f. If an element has syn="b", then it is possible to insert tags into its child texts so that the following conditions hold.
Each child element is interpreted as lacking dep, pel, and phd attributes unless they are explicitly specified. So syn="b" uniquely determines the dependency relationships among the child elements, just as syn="f". Again the non-phrasal elements mentioned above may be either head elements or non-intrasentential elements such as <p> and <ss>.
- Each child is either an element or a symbol (such as comma and period).
- The elements created by the insertion of tags are not empty, except that an empty head copular element must be created if all the existing children are phrasal elements and symbols.
- The created elements lack dep, pel, and phd attributes.
- There is exactly one child non-phrasal element which does not depend on any other element.
- Each of the other elements depends on the nearest non-phrasal element to its left if any, and otherwise the nearest non-phrasal element to its right.
In an element whose syn value is one of the following (c, a, r, e, and i), each child element must be either a peer term or an operator element (an <fo>, <bfo>, <bo>, <fbo>, or <io> element). So the following annotation is wrong, because `came here' is a peer term but `came' and `here' are not.
Below are right annotations.
- Kim <v syn="c"><v opr="pre">came</v> <adp>here</adp> and left</v>.
- Kim <v syn="c"><v opr="pre">came here</v> and left</v>.
- Kim ate <np syn="c">bread, <bo>though</bo> not egg</np>.
- <np id="sk" syn="c"><bo>Instead of</bo> Sue, Kim</np> came.
- c
- Coordination.
A coordination may be collective or distributive. For instance, the below example means either that Tom and Mary got married with each other (collective reading), or that Tom got married with somebody and Mary got married with somebody else (distributive reading).
<su><np syn="c">Tom and Mary</np> got married.</su>The following annotation means that 'two hours and a half' is two hours plus half an hour.<np syn="c"><n>two hours</n> and <n>a half</n></np>When the peer terms which are children of an element with syn="c" do not have relation identifiers, the entire element refers to the sum (agglegation) of the referents of the peer terms.
When the peer terms which are children of an element with syn="c" have relation identifiers, the chid elements semantically depend on the governer of the entire element.
<n>route</n>Compare:
<adp syn="c">
<adp opr="int">from London</adp>
<adp opr="fin">to Paris</adp>
</adp>
<v>
go
<adp opr="src">from London</ad>
<adp opr="gol">to Paris</adp>
</v>Dates and times are not marked with syn="c".
<timep><time>two </time> <time>thirty</time></timep>- a
- Apposition.
<su syn="a">
<su>I introduced <np id="M">Mary</np> <adp id="S">to Sue</adp></su>,
that is,
<su><np pel="M">my girlfrend</np> <adp pel="S">to my wife</adp></su>,
</su>- r
- Repair.
<su>
<vp>
I gave
<persname id="M">Mary</persname>
<adp id="D">to the dog</adp>
</vp>,
<io>oh I'm sorry</io>,
<vp>
<np pel="M">the dog</np>
<adp pel="D">to Mary</adp>
</vp>.
</su>- e
- Error. Repair of a special sort in which the last peer term contains an error.
- i
- Iidiosyncratic construction.
- <num syn="i"><num>4</num> over <num>7</num></num>
- <np syn="i"><num>1</num> vs. <num>2</num></np>
- <np syn="i"><num>2</num> : <num>4</num> : <num>1</num></np>
The following attributes are applicable to all the intrasentential tags (tags for intrasentential contents).
The syn attribute will be discussed in more detal in the next section.
- sem
- Word sense, or more precisely a semantic class which the referent of the element belongs to. The value is one or more concept strings.
- opr
- Operator. The value is one or more concept strings.
- pel
- Parallel element. Specifies substitution between corresponding elements in abstraction and instantiation.
- phd
- Peer head. Specifies that the dependents of the pointed element are also dependents of the pointing element.
- sce
- Scoping element. An IDREFS attribute. Points at the element introducing the smallest scope containing the present element. If an element X has an sce attribute pointing at X or one of its descendants, then the scope contains no elments outside X. So an element pointing at itself by sce such as below lacks a scope.
<su><np syn="c" sce="self">Tom and Mary</np> got married.</su>- pco
- Parallel correspondence. An IDREFS attribute. Connects elements with the same scope.
- ref
- The id value of the element to be filled as the head of the current element. Omission of a non-head (maximal projection) should be annotated not by ref but by relational attributes.
`Tom <v id="cm1">came</v>.' `Who <v ref="cm1"/>?'- ed
- Edition by the annotator. The value is a colon followed by the string in the original document which has been replaced with the text content of the element. In the following example, `do that' has been inserted.
I will <vp ed=":">do that</vp>.- orth
- Orthography. Correction of spelling and speech errors.
<n orth="enough">enuff</n>- abbr
- Class of abbreviation. Its default value is none. Other values include contraction, suspension, brevigraph, superscription, and acronym, which means that the element is an abbreviation.
- expan
- Expansion of abbreviation. The presence of this attribute means that the element is an abbreviation. <orgname expan="Electrotechnical Laboratory" abbr="acronym">ETL</orgname>
- pron
- Pronunciation.
<n pron="meetee">MITI</n>
4. Tags
Hereafter the addressor means not only the agent of a speech, but also the author of a written passage, the thinker of a thought, the performer of a sign language or a gesture, and so on, where the speech, the passage, the thought, etc. appear as tagged elements in the document. Similarly, the addressee means the recipient of them intended by the addressor.
A referential index is a name. A referential index usually refers to the element which has it as the id value. There are special referential indices which are not the id value of any element. We will call them deictic indices. Different occurrences of the same deictic index may refer to different things in one GDA file. For instance, fwd refers to the element or text subsequent to the element containing it, so that two occurrences of fwd in two distinct elements must refer to diferent things. The deictic indices are p0 (generic people), p1 (first person (addressor) singular, or `I/my/me'), p1p (first person plural, or `we/our/us'), p1i (first person plural including second person), p1x (first person plural excluding second person), p2 (second person (addressee) singular), p2p (second person plural), nil (nothing), top (entire discourse), self (the element itself), fwd (forward), bwd (backward), and mgn (minimal govern+ing noun).
Tags defined in the GDA tag set follow.
- <gda>
- The whole document. A GDA document must be one <gda> element following some header elements except when embedded in an HTML or other file.
- <dv>
- Subdivision of document.
- type
- Conventional name of the division. The standard values are part, chapter, section, and subsection, but any other character string is also possible for the value.
- <h> <h1> ··· <h6>
- Title of the division. <title> is not used here, because HTML browsers hide <title> elements. A <dv> may contain an <h> or <hj>. <h>, which is undefined in HTML, can be used to avoid HTML formatting effects.
- <p>
- Paragraph.
- <ss>
- Sequence of sentences.
- <q>
- Quotation and citation including direct speech, thought, etc. A <q> element contains the quotation marks or their equivalents, if any. <q> elements do not have to be syntactic constituents.
In the following example, `YES' is interpreted as printed on the button, as if the button were saying 'YES.'
- type
- Type of the content matter, such as speech or thought. The values may be spoken, written, thought, sign (for sign language), and gesture.
Press the <q>`YES'</q> button.The following tags represent structures in sentences, and called intrasentential tags. Elements with those tags are called intrasentential elements. Among them, phrasal tags are <su>, <ij> and tags whose name end with `p' (except <p> above, which represent paragraph) such as <np> and <vp>. Elements with phrasal tags are called phrasal elements. Phrasal elements represent maximal projections, which cannot be heads of larger constituents. No elements can syntactically depend on them, except when stipulated by the dep attribute. Head tags are the other intrasentential tags, such as <n> and <v>. Elements with head tags are called head elements. They can be heads in dependency structures without being specified by the dep attribute.
- <su>
- Sentential unit. A piece of utterance which has no direct syntactic relation with other utterances except stipulated by the dep attribute.
- <segs>
- Sequence of syntactic constituents without direct syntactic relations with each other. The default value for syn is n.
- <seg> <segp>
- Subsentential segment, which is a syntactic constituent. Used when the constituent cannot or need not be categorized by any of the following tags.
- I <seg>saw a girl</seg> with a telescope.
- <n> <np>
- Noun and noun phrase.
- <v> <vp>
- Verb, verb phrase, and sentence.
- <su>I <v>want <vp>to <vp>sleep</vp></vp></v>.</su>
Note that `want to sleep' must not be <vp> but <v>, because it is the head of the whole sentence.- <aj> <ajp>
- Adjective and adjective phrase.
- <ad> <adp>
- Adverb, adverbial phrase, adnoun, adnominal phrase, preposition, prepositional phrase, postposition, postpositional phrase, determiner, determiner phrase, complementizer, and complement sentence.
- <np><ad>the</ad> <n>man</n></np>
- <ij>
- Interjection. <ij> elements do not participate in dependency relations.
- <date> <datep>
- Date.
- value
- Value of the date in the format of ISO 8601.
- <time> <timep>
- Time of day.
- value
- Value of the time in the format of ISO 8601.
- <name> <namep>
- Proper noun or noun phrase.
- type
- Type of the object referred to. The value is a concept identifier.
- <persname> <persnamep>
- Name of person.
- <name>Mr. <persname>Brown</persname></name>
- <orgname> <orgnamep>
- Name of organization.
- <placename> <placenamep>
- Name of place.
- <geogname> <geognamep>
- Name of geographical object such as mountain, river, sea, etc.
- <num> <nump>
- Number.
- type
- Type of numeric value. The values include int, real, float, ordinal, fraction, and percentage.
- value
- Value of the number in a standard form.
- <num type="int" value="21">twenty one</num> <num type="percentage" value="10">10%</num>
- <num type="ordinal" value="2">second</num>
- <num type="fraction" value="1/3">one third</num>
- <address> <addr> <addrp>
- Postal address. HTML browsers italicize <address> elements by default. <addr> should be used to avoid that effect.
- <bibref>
- Bibliographic reference. A phrasal tag.
- <np>Incompleteness Theorem <bibref>(Goedel, 1931)</bibref></np>
<bo>, <bfo>, <fo>, <fbo>, or <io> are called operator tags, and elements they enclose are operator elements. Operator elements are operators of coordination, apposition, or repair, when their parent elements'syn values are c, a, r, or e.
- <bo>
- Backward affixing operator. Its default value for syn attribute is bc, and its rightmost child head element is the head of a larger constituent involving some material on the right. A <bo> element is not a syntactic constituent when it contains more than one non-phrasal children.
work <bo sem="pur">in order to</bo> liveA <bfo> element is an operator of coordination, apposition, repair, or error, when its parent element's syn value is c, a, r, or e.
<v syn="c">weaken, <bo>rather than</bo> strengthen</v>, <np>the control</np>- <bfo>
- Backward-affixing forward-depending operator. The child elements can depend forward on an external governor. <bfo> is a phrasal tag. A <bfo> element is not necessarily a linguistic constituent. An <bfo> element is an operator of coordination, apposition, or repair, when its parent element's syn value is c, a, or r
<su>
<np syn="c">
<bfo>Not only</bfo>
<persname>Tom</persname>
<bfo>but also</bfo>
<persname>Mary</persname>
</np>
came.
</su>- <fo>
- Forward affixing operator. Its default value for syn attribute is fc, and its leftmost child head element is the head of a larger constituent involving some material on the left. An <fo> element is not a syntactic constituent when it contains of more than one non-phrasal children. An <fo> element is an operator of coordination, apposition, or repair, when its parent element's syn value is c, a, or r
- <fbo>
- Forward-affixing backward-depending operator. The child elements can depend backward on external governor. <fbo> is a phrasal tag. An <fbo> element is not a linguistic constituent when more than one of its child elements depend on external heads. An <fbo> element is an operator of coordination, apposition, or repair, when its parent element's syn value is c, a, or r
- <io>
- Infix operator of coordinate, appositive, and repair structure whose default value for syn attribute is n. An <io> element is an operator of coordination, apposition, or repair, when its parent element's syn value is c, a, or r
<su>
<np syn="c">
<persname>Tom</persname>
<io>and</io>
<persname>Mary</persname>
</np>
got married.
</su>
<p> and the tags thereafter are called intradivisional tags. <su> and the tags thereafter are called intrasentential tags. The following table shows elements of which tags (in the left) can contain which tags (in the right) as children (not descendants in general).
parent child <gda>, <dv> and <q> all tags except <gda> <h> and <hi> intradivisional tags <p> and <ss> <ss>, phrasal tags, and <q> intrasentential tags <q> and intrasentential tags The following tags are used to encode ambiguities. In GDA, these tags are usually not manually handled, but instead automatically processed by computers. The elements of these tags are all empty, and can appear anywhere in the document. These tags except <anchor/> are called link tags. Link elements (elements with link tags) are children of other elements only when they are referred to via the dtrs attribute.
- <anchor/>
- Anchor point. The <anchor/> element must have id attribute, and assigns an identifier to a point in the document.
- <alt/>
- Alternatives.
- targets
- The id values of the alternatives.
- weights
- The percentage probabilities of the corresponding alternatives.
- content
- id values of <anchor/> elements. The content attribute specifies the virtual content of the element. For instance, the virtual content of <alt content="n0 n1 s0 s1" targets="v1 v2"> is `The idea' plus `that I should go' if the following text is in the same document file.
- <anchor id="n0"/>The idea<anchor id="n1"/> occurred to me <anchor id="s0"/>that I should go<anchor id="s1"/>.
In general, the virtual content of an element with attribute content="id1 ··· id2n" is the aggregate of regions between the <anchor/> elements with id values id2i-1 and id2i for i from 1 to n. content="id1 ··· id2n+1" is equivalent to content="id1 ··· id2n+1 id2n+1".- <su dtrs="va0">I <anchor id="a0"/>saw <anchor id="a1"/>the girl <anchor id="a2"/>with a telescope<anchor id="a3"/>.</su>
<alt id="va0" content="a0 a3" targets="vp1 vp2"/>
<v id="vp1" dtrs="np1"/>
<v id="vp2" dtrs="vp3 pp1"/>
<n id="np1" content="a1 a3" dtrs="pp1"/>
<v id="vp3" content="a0 a2"/>
<ad id="pp1" content="a2 a3"/>
5. Relation Identifiers
Relation identifiers represent primitive binary relations between dependent elements and governing (depended) elements, and include grammatical functions, thematic roles, and rhetorical relations. The distinction among these three types of relations is often vague. For instance, LOCATION counts as both a grammatical function and a thematic role. Although CAUSE is usually regarded as a rhetorical relation, it can also serve as a thematic role of phrases such as `due to lack of money.' This is why we conflate grammatical functions, thematic roles, and rhetorical relations. Among the values introduced below, cau, cnc, cnd, and so on, serve as both rhetorical relations and thematic roles.
A purpose of relation identifiers is to associate complement elements (subjects, objects, indirect objects, and so forth) with the corresponding arguments of verbs, adjectives, etc. To fulfill this, we employ a rather standard approach: the association is specified by marking elements with grammatical functions such as SUBJECT and OBJECT (sbj and obj below, respectively), provided that we have a dictionary containing the argument structures of verbs and so on. In many languages, there is usually no need to explicitly markup complements such as subjects objects, and indirect objects, because their grammatical functions are obvious from the surface forms and hence their thematic roles can be inferred from the dictionary. When the verb has multiple argument structures, such as with `Tom opens the door' (where `Tom' is the agent) and `The key opens the door' (where `the key' is the instrument), we can either markup the subject noun phrases with the thematic role or markup the verb in terms of the argument structure. Also, by using grammatical functions we do not have to worry about whether the subject of buy should be AGENT or RECIPIENT, for example.
The rest of the purpose of relation identifiers is to resolve ambiguities of both the thematic roles of adjunct elements, which are typically prepositions and postpositions, and the rhetorical relations which are not explicitly marked. To attain this, we must simply markup the elements in question with thematic roles and rhetorical relations. However, the exhaustive listing of thematic roles and rhetorical relations appears impossible, as widely recognized. We are not yet sure about how many thematic roles and rhetorical relations are sufficient for engineering applications such as machine translation, but as mentioned before, the appropriate granularity of classification will be determined by the current level of technology.
The relation identifers are enumerated below in several clusters.
5.1. Grammatical Function
- arg
- Primary or unique argument, such as the arguments of auxiliary verbs and prepositions. Used as relational attribute only. The concrete relation between the first and the second argument of arg is entailed by the governor. Morphological dependencies (such as the one between a preposition/postposition and its complement) are characterized by arg, but they need not be explicitly annotated so. The relation of the argument as to a relational noun is arg, too.
- a friend <ad sem="arg">of</ad> mine
- which <ad sem="arg">of</ad> them
- mod
- modification. The concrete relation between the first and the second argument of mod is entailed by the depender. Adverbs and demonstrative pronous have mod. So they need not be annotated with respect to the relation idenfitier. A relative clause should have opr="mod" when the role of the head noun in the relative clause is not annotated with a relational attribute.
- ctl
- Controller.
- He promised <np opr="ctl.exp">me</np> <vp opr="cnt">to be permitted</vp>
- xpl
- Expletive complement.
- <np opr="xpl">It</np> is easy to do.
- <np opr="xpl">There</np> lived a man in this house.
- uba
- Unbounded argument, which means that the element has no direct semantic relation with the governing element.
- plg
- Plug. Used in a relational attribute, as the last component or a component just in front of ppa, and plugs the relational attribute with the relation identifier of the element pointed by the relational attribute. That is, let x be the referent of an element with relational attribute A.plg="Y" or A.plg.ppa="Y", y be the referent of the element pointed by this relational attribute (i.e., the element with id="Y"), and B be the relation identifier of this element, then x A.B y holds.
- <su>
<adp id="K" opr="uba.ben">For Kim</adp>,
<v syn="b"><np>I</np> have <v plg="K">worked</v> hard</v>.
</su>- ppa
- Permanent predicate ambiguity. Used as the last compoment of a relational attribute. In the example below, whether `slowly' depends on `walk' or `talking' cannot be determined.
- <vp syn="ba"><v ppa="SLOWLY">walk</v> talking <adp id="SLOWLY">slowly</adp></vp>
5.2. Participant
- agt
- Agent of action.
- <su><np opr="agt">Tom</np> came.</su>
- obj
- Object of action or event.
- Mary beats <np opr="obj">her husband</np>.
- res
- Result, which is another special case of obj. A resulting event or object.
- Tom is gone, <adp opr="res">so that I'm alone</adp>.
- Sue built <np opr="res">a house</np>.
- Kim turned the car <adp opr="res">to garbage</adp>.
- src
- Source. The second argument of src is the initial position or state of the entity denoted by the subject or object of the verb denoting the first argument of src.
- get it <adp opr="src">from him</adp>
- gol
- Goal. The second argument of gol is the final position or state of the entity denoted by the subject or object of the verb denoting the first argument of gol.
- give <np opr="gol">him</np> the book
- rpt
- Reciprocal partner.
- Tom had a date <ad sem="agt.rpt">with</ad> Mary.
- mix sugar <ad sem="obj.rpt">with</ad> salt
- rcp
- Reciprocity. Almost equivalent with the conjunction of eq and rpt. For instance, "agt.rcp" is almost equal to "agt agt.rpt". The second argument of rcp must be a set of plural things.
- <np opr="agt.rcp">Tom and Mary</np> got married.
- mix <np opr="obj.rcp">sugar and salt</ad>
- similarity <ad sem="obj.rcp">of</ad> the two
- ben
- Beneficiary. Different from gol (which specifies destination only) and pur (which concerns event). Expressible with `for the sake of,' though it may mean pur as well.
- I gave <np sem="gol">him</np> a book <adp sem="ben">for <np>her</np> sake</adp>
- a present <adp opr="gol ben">for you</adp>
- exp
- Experiencer.
- It seems <adp opr="exp">to me</adp> that he left.
- jnt
- Joint participant in the event.
- You came <adp opr="agt.jnt">with her</adp>.
- pos
- Possessor.
- a daughter <adp opr="pos">of mine</adp>
5.3. Apposition
- ela
- Elaboration.
- <ss>Tom is gone. <su opr="ela">He escaped.</su></ss>
- sum
- Summary.
- eg
- Example.
- expensive cars <adp opr="eg">such as Mercedez</adp>
- cnt
- Content of thought, belief, speech, promise, rumor, plan, request, and so forth.
- <np>plan <vp opr="cnt">to visit Tokyo</vp></np>
- ask <np opr="exp">her</np> <adp opr="cnt">for a date</adp>
- the <n>fact <adp opr="cnt">that you're here</adp></n>
- persuade him <vp opr="cnt">to go</vp>
5.4. Causality and Reasoning
- cau
- Cause, reason, or motivation.
- <ss><su opr="cau">Tom came.</su> <su>Mary was surprised.</su></ss>
- I went home <adp opr="cau">because I was sleepy</adp>
- He died <adp opr="cau">of cancer</adp>.
- pur
- Purpose.
- I went there <vp opr="pur">to see her</vp>.
- cnd
- Condition.
- I'll come <adp opr="cnd">if you're there</adp>
- cnc
- Concession.
- <ss><su opr="cnc">Tom came.</su> <su>Mary wasn't surprised.</su></ss>
- cntrst
- Contrast.
- <ss>Tom came. <su opr="cntrst">However, Bill left.</su></ss>
5.5. Spatiotemporal Relation
- tmx
- Temporal extension.
- I was asleep <adp opr="tmx">during his talk</adp>.
`During his talk' in `I slept during his talk' has opr="tim" rather than opr="tmx", if the intended interpretation does not entail that I was sleeping all during his talk.- tim
- Temporal location. Equivalent to tmx.sup.
- I was born <ad opr="tim">in</ad> 1958.
- pre
- Precedence.
- She came <ad opr="pre">after<ad> he arrived.
- pst
- Postcedence.
- coc
- Cooccurrence.
- eating <ad sem="coc">while</ad> driving
- spx
- Spatial extension.
- loc
- Location (typically spatial). Equivalant to spx.sup. This entails spatial inclusion (for instance, `walk in the garden' entails that the walking event is spatially included in the garden). [Should we markup metaphoric usages?]
- live <ad opr="loc">in</ad> Tokyo
- ilc
- Iinternal location.
- hang <ad sem="ilc">on</ad> a bar
- cut <ad sem="ilc">at</ad> the center
- via
- Intermediate location.
- exit <ad sem="via">from</ad> the window
- go <ad sem="via">through</ad> the tunnel
- crimb <np opr="via">a mountain</np>
- pass <ad sem="via">by</ad> a bridge
- dir
- Direction.
- I walked <adp opr="dir">to the north</adp>.
- opp
- Opposit direction.
- keep yourself <ad sem="opp">from</ad> the evil
- independent <ad sem="opp">of</ad> her parents
- int
- Initial point of the event or the object.
- stay here <adp opr="int">from tomorrow</adp>
- fin
- Final point of the event or the object.
- keep drinking <adp opr="fin">until next morning</adp>
- stx
- Stative or situational extension.
- sit
- State or situation. Equivalent to stx.sup.
- play a role <adp opr="sit">in a ceremony</adp>
- txx
- Taxonomic extension.
- in
- Inclusion or membership in concepts or sets. Synonym of txx.sup.
- Dog is <np opr="in">animal</np>.
- Guernica is <np opr="in">his work</np>.
- This is <np opr="in">beer</np>.
- a <ajp opr="in">female</ajp> doctor
5.6. Logical Relation
- eq
- Equivalence.
- New York <ad opr="app">or</ad> Big Apple
- and
- Conjunction.
- or
- Disjunction.
- xor
- Exclusive disjunction.
- sup
- Includer of any sort: superset as to subset, whole as to part, set as to element.
- sub
- Subset, part, or element. Inverse of sup.
5.7. Other Semantic Relation
- ccm
- Circumstance.
- go out <adp opr="ccm">with a book in hand</adp>
- met
- Metonymy. In the following example, obj.eq="mgn" means that it is exactly the ham sandwitch that is the most expensive, and opr="agt.met" that the thing which is gone is something which stands in a metonymic relation with that ham sandwitch, such as the person who ordered it.
- <np opr="agt.met">The most <aj obj.eq="mgn">expensive</aj> ham sandwitch</np> is gone.
When more concrete relation identifiers are applicable, they should be used instead of met. For instance, part-whole relations should be encoded by sub and sup, and possetion by pos.- bsc
- Base of symmetric comparison.
- resemble <np opr="bsc">his father</np>
- different <ad opr="bsc">from</ad> the promise
- say the same thing as <np opr="bsc-agt">I</np>
- cmp
- Base of degree comparison.
- as tall <adp opr="cmp">as Bill</adp>
- taller <adp opr="cmp">than Bill</adp>
- sim
- Similarity.
- dance <ad opr="sim">like</ad> a butterfly
- bas
- Basis or principle on which to do the action in question.
- judge <adp opr="bas">accroding to the law</adp>
- cev
- Criterion of evaluation.
- These shoes are too small <adp opr="cev">for me</adp>
- who
- Addressor as to utterance. Imported from TEI. The first argument must be utterance rather than the uttering event. So it is wrong to attach opr="who" to `Tom' in `Tom says that he is hungry.' Below are some examples of correct usage.
- <su><np opr="who">(Tom)</np> <q><su><np eq="p1">I</np> am hungry.</su></q></su>
- <q who="TOM"><su><np eq="p1">I</np>'m hungry.</su></q>
- <bo sem="who">According to</bo> the police, the criminal escaped.
When who is used as a relational attribute and it points to an upper context (e.g., who="top"), the utterance is a note by the addressor of that context (in case of who="top", the the author of the entire document).
- <q>`<su><adp who="top">(Tom is)</adp> crazy.</su>'</q>
- <q>`<su>Th<su who="top">(laughter)</su>at's funny.</su>'</q>
- whm
- Addressee as to utterance. The first argument must be utterance rather than the uttering event. The utterance is a monologue if it has whm="p1". The discourse below means that Tom told the author to come:
- <np id="tom">Tom</np> said.
<q who="tom" whm="p1">`Come.'</q>- mns
- Means or instrument.
- survive <ad sem="mns">by</ad> eating grasses
- paint <ad sem="mns">with</ad> a brush
- mat
- Material.
- made <ad sem="mat">of</np> wood
- msr
- Measure.
- weigh <np opr="msr"><nump opr="msr">two</nump> kilograms</np>
- mob
- Measured object (inverse of msr).
- <np>two cups <ad sem="mob">of</ad> tea</np>
- ql
- Qualification.
- I scolded her <ad sem="ql">as</ad> her father.
- sbm
- Subject matter.
- talk <ad sem="sbm">about</ad> it.
- uni
- Unit of measure.
- work three days <np opr="uni">a week</np>
5.8. Backward-Looking Communicative Functions
- und
- Understand.
- nun
- Not understand.
- rpl
- Reply.
- rpw
- Reply WH.
- rpy
- reply YES.
- rpn
- Reply NO.
- acc
- Accept.
- rej
- Reject.
- hld
- Hold.
6. Unary Operator Identifiers
6.1. Forward-Looking Communicative Functions
Backward-looking functions are encoded by und through hld discussed before. They can of course be relational attributes which take the id values of the utterences with corresponding forward-looking functions.
- stt
- Statement.
- ord
- Order.
- req
- Request.
- ofr
- Offer.
- cmt
- Commitment or promise.
- qyn
- YES/NO query.
- qw
- WH query.
- cnv
- Convention, including greetings.
- smn
- Summon.
- exc
- Exclamation.
- abu
- Abuse.
- blm
- Blame.
6.2. Reference Types
Not only noun phrases but also verb phrases, sentences, and so on refer to objects, events, states of affairs, and so on. Here we introduce attributes to classify such references.
Of cource sg, pl, and du are for countable nouns only, and pt is for uncountable nouns only.
- gn
- Generic or attributive.
- <np opr="gn">Dinosaurs</np> are extinct.
- dance like <np opr="gn">a butterfly</np>
- sg
- Singular.
- Give me <np opr="sg">your fish</np>.
- pt
- Partitive.
- Give me <np opr="pt">water</np>.
- pl
- Plural.
- du
- Dual.
- plgn
- Plural generic.
- <np opr="plgn">These cars</np> are expensive. (Several models of cars are entailed here.)
- dugn
- Dual generic.
In a generic reading, the predication concerns (default properties of) the whole kind referred to by the noun phrase in question. An accidental universal quantification, such as with `I know (all) the Emperors of Japan,' does not qualify as a generic reading. We do not distinguish the two types of generic reading: those such as with `Chickens evolved from dinosaurs' and those such as with `Chickens lay eggs.' This distinction is captured by classifying the predicates.
individual vs. stage reading?
6.3. Tense and Aspect
Most langauges have grammaticized marking of tense, but for instance Chinese lack tense marking so that tense tagging will be of a great benefit in Chinese. Perhaps no language lacks grammaticized aspect marking, but aspect tagging could be useful in some cases.
- past
- Past, including historical present.
- Brutus <v opr="past">murders</v> Caesar.
- pres
- Present.
- He <v opr="pres">could do it</v>.
- futr
- Future.
Aspects are interpreted with the following unary operator identifiers:
prf and npt are special cases of tel, and prog and stat are special cases of atel. Perhaps we do not need to subdivide atel.
- tel
- Telic.
- prf
- Perfect.
- npt
- Non-perfect telic.
- atel
- Atelic.
- prog
- Progressive.
- stat
- Stative.
under construction
stl politeness
7. Others
under construction