[Cache version from http://www.cs.brandeis.edu/%7Ejamesp/arda/random/TimeML-Draft2.2.htm; please see this canonical location if possible.]
Release Date: May 14, 2002
Authors: Bob Ingria and James Pustejovsky
TERQAS TimeML Working Group Members: Branimir Boguraev, Michael Bukatin, Jose Castano, John Frank, Rob Gaizauskas, Bob Ingria, Graham Katz, Andy Latto, Inderjeet Mani, James Pustejovsky, Erik Rauch, Antonio Sanfilippo.
This document describes the initial specification of the markup language for temporal and event expressions in text being developed by the TERQAS TimeML Working Group. The group started with two major design goals: (1) to use the core of Andrea Setzer's thesis annotation, which was christened STAG (Sheffield Temporal Annotation Guidelines); and (2) to remain, as much as possible, compliant with the TIDES TIMEX2 annotation effort. The initial list of tasks that were scheduled for the first version of TimeML is given below.
The working group also discussed the following features, although in less detail:
In the remainder of the document, we outline a BNF for TimeML, withthe ultimate goal of specifying complete XML schema definitions for the language, and not simply a DTD. The reason why an XML schema is preferable to a DTD is that an XML schema provides an initial richer set of data types for constraining the value of attributes, and also provides a mechanism for adding user-defined types.
This section presents a BNF for the temporal annotation language presented in Andrea Setzer's thesis (Setzer, 2001). Consideration of the details of this BNF, which, as noted above, the TimeML working group came to call STAG (Sheffield Temporal Annotation Guidelines), in conjunction with problems raised in trying to apply it to actual texts, resulted in several changes and extensions to Setzer's original scheme. Detailing these issues will help justify the details of this initial pass at TimeML.
A note on attribute values: All attribute values are marked as 'potential values' in Setzer's thesis; those marked with * may need to have a larger, but still closed, set specified. Those not so marked are probably sufficient as is.
attributes ::= eid class [argEvent] [tense] [aspect] [([signalID] relatedToEvent eventRelType) | ([signalID] relatedToTime timeRelType)]//* N.B. argEvent is dependent on class='REPORTING'
eid ::= <integer> *class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL' argEvent ::= <integer> tense ::= 'PAST' | 'PRESENT' | 'FUTURE' aspect ::= 'PROGRESSIVE' | 'PERFECTIVE' signalID ::= <integer> relatedToEvent ::= <integer> *eventRelType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS' relatedToTime ::= <integer> *timeRelType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS'
attributes ::= tid type calDate [(eid signalID relType)]//* calDate is limited to [[DD]MM]YYYY | ('SPR'|'SUM'|'AUT'|'WIN')YYYY
//* N.B. (eid signalID relType) is dependent on type='COMPLEX'
tid ::= <integer> *type ::= 'DATE' | 'TIME' | 'COMPLEX' calDate ::= PCDATA eid ::= <integer> signalID ::= <integer> *relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS'
attributes ::= sid
sid ::= <integer>
'DOA' stands for 'Date of article'.
No attributes
One thing that is striking in looking at this BNF is this fragment of the attribute structure of EVENT:
[([signalID] relatedToEvent eventRelType) | ([signalID] relatedToTime timeRelType)]
In each case, we are dealing not with three unrelated attributes, but with three attributes that only make sense as a unit. The same triad also appears in the attribute structure of TIMEX:
[(eid signalID relType)]
Moreover, as the specification of the values for the eventRelType and timeRelType attributes of EVENT and the relType attribute of TIMEX, we are really dealing with one property, whose values are specified three times. This is forced in the case of eventRelType and timeRelType for EVENT by virtue of the fact that only the name of the attribute can link it to relatedToEvent or relatedToTime, respectively. And, of course, since relType is defined on TIMEX, not EVENT, it must repeat the specification of permissible values.
All these considerations suggest that these triplets of attributes should be factored out into the form of a new abstract tag (i.e. one which consumes no input text). This would formally express the fact that these attributes are linked, allow eventRelType, timeRelType and relType to be collapsed into a single attribute, and allow the specification of the possible values of this single attribute to be stated only once.
[Note: Of course in BNF (or in an XML DTD) it would be possible to specify an abstract element as the value of eventRelType, timeRelType and relType and thus state their possible values only once, but we would still be left with the fact that the inherent relation between a signalID, relatedID, and relType would be unexpressed in the STAG annotation language.]
For these reasons, we remove the cited triplets from the definition of EVENT and TIMEX and introduce the tag:
attributes ::= (eventID | timeID) [signalID] (relatedToEvent | relatedToTime) relType
eventID ::= <integer> timeID ::= <integer> signalID ::= <integer> relatedToEvent ::= <integer> relatedToTime ::= <integer> *relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS'
eventID and timeID are used to anchor the link to an EVENT or TIMEX (the element that would have contained the [signalID] (relatedToEvent | relatedToTime) relType triple before it was factored out into LINK). Note that factoring out this triplet also entails that the decision on where to record this information is now no longer arbitrary. Previously, the information could be recorded on either of two related events, but there was no principle to decide which event should contain this information.
In addition to purely formal considerations of the geometry of STAG (essentially, refactoring considerations, in the sense of Fowler (1999) and many others), the TimeML working group also found empirical considerations motivating adding the LINK tag. The original STAG framework had been designed with the presupposition that any given EVENT would be related to at most one other EVENT or other indexed element. Attempts to annotate various newswire articles showed that this assumption was false, and that a single EVENT could be related to more than one other indexed element. Here is one such example:
FAMILIES SUE OVER AREOFLOT CRASH DEATHS The Russian airline Aeroflot has been <EVENT eid=1 relatedToTime=1 timeRelType=BEFORE tense=PRESENT aspect=PERFECTIVE class=OCCURRENCE> hit </EVENT> with a writ for loss and damages, <EVENT eid=2 tense=NONE aspect=PERFECTIVE relatedToEvent=1 eventRelType=BEFORE class=OCCURRENCE> filed </EVENT> in Hong Kong by the families of seven passengers <EVENT eid=3 tense=NONE aspect=PERFECTIVE relatedToEvent=2 eventRelType=BEFORE class=OCCURRENCE relatedToEvent2=4 eventRel2Type=IS_INCLUDED signal2=1> killed </EVENT> <SIGNAL sid=1> in </SIGNAL> an air <EVENT eid=4 class=OCCURRENCE> crash </EVENT>. All 75 people <STATE stid=1 relatedToEvent=5 eventRelType=INCLUDES> on board </STATE> the Aeroflot Airbus <EVENT eid=5 tense=PAST aspect=PERFECTIVE relatedToEvent=6 eventRelType=IAFTER signal=2> died </EVENT> <SIGNAL sid=2> when </SIGNAL> it <EVENT eid=6 tense=PAST aspect=PERFECTIVE relatedToTime=2 timeRelType=IS_INCLUDED relatedToEvent=4 eventRelType=ID> ploughed </EVENT> into a Siberian mountain <SIGNAL sid=3> in </SIGNAL> <TIMEX tid=2 type=DATE calDate=041994> March 1994 </TIMEX>. ... <DOA tid=1> 03-27-96 </DOA>
There are several notable features of this annotation. First, notice this <EVENT> element:
<EVENT eid=3 tense=NONE aspect=PERFECTIVE relatedToEvent=2 eventRelType=BEFORE class=OCCURRENCE relatedToEvent2=4 eventRel2Type=IS_INCLUDED signal2=1> killed </EVENT>
which features the addition of the ad hoc attributes relatedToEvent2 and eventRel2Type as an attempt to allow multiple related events, along with the new attribute signal2, to link the signal to this related event and no other. Clearly this solution is fragile, in that it requires either a cut-off of related events at some arbitrary number or else an open-ended set of triplets of the form:
signalN relatedToEventN and eventRelNType
Note that the LINK tag introduced above solves both the problem of the potentially unbounded number of related events and that of relating a particular signal to a given related EVENT (or other indexed element).
Given the existence of the LINK tag, we can rewrite the above annotation as follows:
FAMILIES SUE OVER AREOFLOT CRASH DEATHS The Russian airline Aeroflot has been <EVENT eid=1 tense=PRESENT aspect=PERFECTIVE class=OCCURRENCE> hit </EVENT> <LINK eventID=1 relatedToTime=1 relType=BEFORE/> with a writ for loss and damages, <EVENT eid=2 tense=NONE aspect=PERFECTIVE class=OCCURRENCE> filed </EVENT> <LINK eventID=2 relatedToEvent=1 relType=BEFORE/> in Hong Kong by the families of seven passengers <EVENT eid=3 tense=NONE aspect=PERFECTIVE class=OCCURRENCE> killed </EVENT> <LINK eventID=3 relatedToEvent=2 relType=BEFORE/> <LINK eventID=3 signalID=1 relatedToEvent=4 relType=IS_INCLUDED/> <SIGNAL sid=1> in </SIGNAL> an air <EVENT eid=4 class=OCCURRENCE> crash </EVENT>. All 75 people <STATE stid=1> on board </STATE> <LINK stateID=1 relatedToEvent=5 relType=INCLUDES/> the Aeroflot Airbus <EVENT eid=5 tense=PAST aspect=PERFECTIVE> died </EVENT> <LINK eventID=5 signalID=2 relatedToEvent=6 relType=IAFTER/> <SIGNAL sid=2> when </SIGNAL> it <EVENT eid=6 tense=PAST aspect=PERFECTIVE> ploughed </EVENT> <LINK eventID=6 relatedToTime=2 relType=IS_INCLUDED/> <LINK eventID=6 relatedToEvent=4 relType=ID/> into a Siberian mountain <SIGNAL sid=3> in </SIGNAL> <TIMEX tid=2 type=DATE calDate=041994> March 1994 </TIMEX>. ... <DOA tid=1> 03-27-96 </DOA>
In addition to the abstraction that LINK provides, there are several other additions exhibited in the annotation presented above.
The tense attribute adds the value:
DOA previously had no attributes; it now adds the attribute
The relType attribute adds the values:
attributes ::= stid
stid ::= <integer>
The TimeML working group found that, in addition to annotating events, it is also useful to annotate a select subset of states. We have decided to recognize for markup, only those states which are identifiably changed over the course of the document being marked up. For example, in the present document, in the expression the Aeroflot Airbus the relationship indicating that the Airbus is run and operated by Aeroflot is not a State in the present sense here. Rather, because it is persistent throughout the event line of the document, we factor it out and it is not marked up. On the other hand, properties that are known to change during the events represented/reported in an article will be marked as States, as illustrated below:
All 75 people <STATE stid=1> on board </STATE> <LINK stateID=1 relatedToEvent=5 relType=INCLUDES/> the Aeroflot Airbus <EVENT eid=5 tense=PAST aspect=PERFECTIVE> died </EVENT> <LINK eventID=5 signalID=2 relatedToEvent=6 relType=IAFTER/> <SIGNAL sid=2>
Putting together all the modifications and additions discussed above gives us the following BNF for a first pass at TimeML.
attributes ::= eid class [argEvent] [tense] [aspect]//* N.B. argEvent is dependent on class='REPORTING'
eid ::= <integer> *class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL' argEvent ::= <integer> tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'NONE' aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'
attributes ::= tid type calDate//* calDate is limited to [[DD]MM]YYYY | ('SPR'|'SUM'|'AUT'|'WIN')YYYY
tid ::= <integer> *type ::= 'DATE' | 'TIME' | 'COMPLEX' calDate ::= PCDATA
attributes ::= stid
stid ::= <integer>
attributes ::= sid
sid ::= <integer>
'DOA' stands for 'Date of article'.
attributes ::= tid
tid ::= <integer>
attributes ::= (eventID | timeID | stateID) [signalID] (relatedToEvent | relatedToTime) relType
eventID ::= <integer> timeID ::= <integer> stateID ::= <integer> signalID ::= <integer> relatedToEvent ::= <integer> relatedToTime ::= <integer> *relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'ID'
The working group discussed the case of temporal expressions where the calendar date is not referred to directly, but there is an expression that acts as a temporal function over a Timex expression. Different examples of this are given below.
Assume these are all bound to an event predicate such as "John died", with an already established DOA. One proposal is to treat the "modifiers" such as "next", "last", and "N_<time-unit>_ago" as signals, relating to other times and events. For example, (1) would be marked up as follows:
<DOA tid=1>DDMMYYYY</DOA> he <EVENT eid=1 tense=PAST aspect=PERFECTIVE> died </EVENT> <SIGNAL sid=1> last </SIGNAL> <LINK eventID=1 signalID=1 relatedToTime=2 relType=BEFORE magnitude=+1/> <TIMEX tid=2 type=DATE> Monday </TIMEX>. Sentence (2) would be represented as follows: <DOA tid=1>DDMMYYYY</DOA> he <EVENT eid=1 tense=FUTURE > die </EVENT> <SIGNAL sid=1> next </SIGNAL> <LINK eventID=1 signalID=1 relatedToTime=2 relType=AFTER magnitude=+1/> <TIMEX tid=2 type=DATE> month </TIMEX>.
It would also be possible to link the temporal function expression to the DOA, but this would make it less explicit how the event predicate of "die" is bound to "last Monday". Furthermore, this would not be parallel to the derivation of "John died on Monday".
<DOA tid=1>DDMMYYYY</DOA> he <EVENT eid=1 tense=PAST> died </EVENT> <SIGNAL sid=1> on </SIGNAL> <LINK eventID=1 signalID=1 relatedToTime=2 relType=IS_INCLUDED magnitude=0/> <TIMEX tid=2 type=DATE> Monday </TIMEX>.
Notice that this particular proposal has added a new attribute to <LINK>, magnitude, which acts as a quantifier over the appropriate temporal granularity. The granularity is dependent upon the type and the caldate of the <TIMEX>. Hence, if the value is "Monday", then the granularity for magnitude to operated over is a week, and so on. This relates to the issues in the Ontology Working Group as well as Hobbs' Semantic Web Temporal Spec Notes. Hence, the new specification for <LINK> is as follows:
attributes ::= (eventID | timeID | stateID) [signalID] (relatedToEvent | relatedToTime) relType magnitude eventID ::= <integer> timeID ::= <integer> stateID ::= <integer> signalID ::= <integer> relatedToEvent ::= <integer> relatedToTime ::= <integer> *relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' | 'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'ID' magnitude ::= <integer>
There are many cases of the temporal functions that were discussed by the WG that will perhaps be handled in a slightly different way, if this is adopted. For example, reference to events directly would be more complex:
In addition to the need for temporal functions of the sort sketched out to handle expressions beyond those handled by the TIMEX2 annotation, they are also useful for creating intensional descriptions of temporal expressions. During the first meetings of the plenary group, it was discovered that many temporal expressions are too indeterminate or fuzzy to be reduced to a concrete ISO time value.
The notion of scale was introduced, but we will defer discussion until the next version of this document.
Various temporal expressions have interpretations that are best expressed in terms of temporal functions. For example, temporal expressions like the following might be annotated with interpetations such as those specified. (We have used pre-theoretical, but fairly transparentm\, names to represent the hypothetical functions. DOA is the date of the article, presupposed for concreteness.)
last week = (predecessor (week DOA)) last Thursday = (thursday (predecessor (week DOA)) the week before last = (predecessor (predecessor (week DOA))) next week = (successor (week DOA))
Such representations present a problem for annotation because the needed functions, which would be best expressed as XML tags, can't appear as the values of attributes in another XML tag. As always, the solution is to use a tag's ID as the value of an attribute, in place of the tag itself. Given this strategy, TimeML can define whatever functions are necessary, and pass in the id of a function (tag) whenever we want to use it as the value of an attribute. If we further allow the arguments of functions to be function IDs, we can compose functions by using their IDs as pointers (much as in the box and pointer notation for LISP). To show how this would work, we present the following sample annotations. For concreteness, we assume each appears in a document which has
<DOA tid=t1>
This ID provides a temporal anchor in all the examples.
(1) last week
<SIGNAL sid=1> last </SIGNAL> <TIMEX3 tid=2 type=DATE valueFromFunction=tf2> week </TIMEX3> <coerceTo tfid=tf1 argumentID=t1 value=WEEK/> <predecessor tfid=tf2 signalID=1 argumentID=tf1 value=1/>
(2) last Thursday
<SIGNAL sid=1> last </SIGNAL> <TIMEX3 tid=2 type=DATE valueFromFunction=tf3> Thursday </TIMEX3> <coerceTo tfid=tf1 argumentID=t1 value=WEEK/> <predecessor tfid=tf2 signalID=1 argumentID=tf1 value=1/> <getNamedElementOf tfid=tf3 argumentID=tf2 value=THURSDAY/>
N.B. 'last Thursday' and 'Thursday of last week' should get the same interpretation.
(3) the week before last
<TIMEX3 tid=2 type=DATE valueFromFunction=tf2> the week </TIMEX3> <SIGNAL sid=1> before last </SIGNAL> <coerceTo tfid=tf1 argumentID=t1 value=WEEK/> <predecessor tfid=tf2 signalID=1 argumentID=tf1 value=2/>
(4) next week
<SIGNAL sid=1> next </SIGNAL> <TIMEX3 tid=2 type=DATE valueFromFunction=tf2> week </TIMEX3> <coerceTo tfid=tf1 argumentID=t1 value=WEEK/> <successor tfid=tf2 signalID=1 argumentID=tf1 value=1/>
Some notes on these sample representations:
(SUCCESSOR foo 1) = (primSuccessor foo) (SUCCESSOR foo 2) = (primSuccessor (primSuccessor foo)) ... (SUCCESSOR foo n) = (primSuccessor_1 ... (primSuccessor_n foo) ...)
This is much like the INCF (increment-by) and DECF (decrement-by) functions in Common LISP, except that INCF and DECF allow their second argument to not appear, in which case it defaults to 1. For our purposes, we want the 'count' argument to always be present.
Relations expressed through predicates and nominal expressions are typically anchored as deictic events. Aspect expressed on the verb is a means of looking inside the event to focus on a segment or particular part of an event. For example,
For TimeML, we will designate the class of aspectual predicates as events of class ASPECTUAL. This class will have an additional attribute which we will call PHASE. This attribute will take one of the four facets listed above. Finally, we will add the attribute ARGEVENT to ASPECTUAL events as well.
attributes ::= eid class [argEvent] [tense] [aspect] [phase]//* N.B. argEvent is dependent on class='REPORTING' or class='ASPECTUAL''
eid ::= <integer> *class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL' argEvent ::= <integer> tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'NONE' aspect ::= 'PROGRESSIVE' | 'PERFECTIVE' phase ::= 'INITIATION' | 'COMPLETION' | 'TERMINATION' | 'CONTINUATION'
To illustrate this mark up, consider a couple of example sentences.
The boat <EVENT eid=1 tense=PAST aspect=PERFECTIVE phase=INITIATION argEvent=2 > began </EVENT> <EVENT eid=2 tense=null aspect= null> sink </EVENT>. The search party <EVENT eid=1 tense=PAST aspect=PERFECTIVE phase=TERMINATION argEvent=2> stopped </EVENT> <EVENT eid=2 tense=null aspect= PROGRESSIVE> looking </EVENT>. for the survivors.
In various discussions of the full TERQAS groups, the utility of being able to mark confidence values for various aspects of the annotation was pointed out. In general, it would be useful to allow confidence values to be assigned to any tag, and, in fact, to any attribute of any tag.
A convenient way to do this would be to create a confidence tag, which would consume no input, and which would have the following attributes:
attributes ::= tagType tagID [attributeName] confidenceValue
where
So, for example, given this annotation:
The TWA flight <EVENT eid=1 class=OCCURRENCE tense=PAST aspect=NONE> crashlanded </EVENT> <LINK eventID=1 signalID=1 relatedToTime=1 relType=BEFORE durationID=1/> on Easter Island <DURATION did=1 value=2w> two weeks </DURATION> <SIGNAL sid=1> ago </SIGNAL>. ... <DocCreationTime> <TIMEX tid=1 type=DATE calDate=12201999> 12-20-1999 </TIMEX> </DocCreationTime>
if we wanted to indicate that we were unsure that we had not annotated DURATION correctly, we could add this annotation:
<CONFIDENCE tagType=DURATION tagID=1 confidenceValue=0.50/>
where the lack of the optional attribute, attributeName, indicates that the confidence applies to the whole tag.
On the other hand, if we wanted to indicate that we weren't sure if the tense of 'crashlanded' was really PAST, we could add this annotation:
<CONFIDENCE tagType=EVENT tagID=1 attributeName=tense confidenceValue=0.75/>
Abstracting confidence measures as a separate tag frees the annotation from having to include a confidence value attribute in every tag and eliminates the problem of uncertainty over the exact attribute of a tag the confidence value applies to.
Note: currently LINKs do not have IDs. If we want to apply confidence measures to LINKs and/or their attributes, we will need to give each LINK a unique ID under this proposal.
As for how confidence values should be assigned in manual annotation, we feel that, in a large-scale annotation effort such as TIMEBANK, two conditions should be satisfied:
Therefore, the annotation of a scalar value such as confidence should have at least two features:
Fowler, Martin (1999) Refactoring: Improving the Design of Existing Code. Addison-Wesley, Reading, Massachusetts.
Setzer, Andrea (2001) Temporal Information in Newswire Articles: An Annotation Scheme and Corpus Study, Doctoral Dissertation, University of Sheffield, Sheffield, UK.