[Cache version from http://www.cs.brandeis.edu/%7Ejamesp/arda/random/TimeML-Draft2.2.htm; please see this canonical location if possible.]

TimeML Specification: Draft 2

Release Date: May 14, 2002

Authors: Bob Ingria and James Pustejovsky

TERQAS TimeML Working Group Members: Branimir Boguraev, Michael Bukatin, Jose Castano, John Frank, Rob Gaizauskas, Bob Ingria, Graham Katz, Andy Latto, Inderjeet Mani, James Pustejovsky, Erik Rauch, Antonio Sanfilippo.

1.0 Introduction

This document describes the initial specification of the markup language for temporal and event expressions in text being developed by the TERQAS TimeML Working Group. The group started with two major design goals: (1) to use the core of Andrea Setzer's thesis annotation, which was christened STAG (Sheffield Temporal Annotation Guidelines); and (2) to remain, as much as possible, compliant with the TIDES TIMEX2 annotation effort. The initial list of tasks that were scheduled for the first version of TimeML is given below.

  1. Incorporate most of STAG.
  2. Stay compliant (after study) with TIMEX2 guidelines.
  3. Introduce a LINK tag: an object that links events/times to events/times but consumes no input text.
  4. Introduce a STATE tag: annotate only states that are updated in the context of the narrative being tagged.
    Any state persistent throughout the entire article would not be tagged as a state.
  5. Enrich time relations: add immediately-after (IAFTER) and immediately-before (IBEFORE).
  6. Introduce scale as a relation attribute: we need to convert preexisting Timex data into the TML standard.
  7. Modify the DOA tag to have a tid attribute.
  8. Introduce event identity (ID).
  9. Added NONE as a value for the tense attribute.
  10. aspect is labeled as a signal but retains its existing values (NONE, PROGRESSIVE, PERFECTIVE).
  11. Add temporal functions for doing temporal math on expressions such as "last week".
    Track this as an enrichment over the TIMEX2 guidelines.

The working group also discussed the following features, although in less detail:

  1. Possibly identify "Event Clusters" or "time frames". This would be useful for clustering related events in a narrative, temporal segmentation of the narrative, by reducing the number of temporal relations that need to be annotated.
  2. Brief discussion of negation and modality. One suggestion is to use a polarity attribute on negative propositional content:
    1. The plane did not crash.
    2. No survivors were found.
  3. Enrich the Event Typology to improve temporal inference.
    This is related to the next point.
  4. Add hooks to the event ontology for event entailment operations.
  5. Include event and time closure operations as part of TML.
  6. It was suggested that the head verb should not be annotated as an EVENT but rather as a signal to the event.
    This would mean that the <LINK> tag (see below) would contain all the semantic information in the annotation.
  7. Introduce init and cul attributes to events, or either reify init and cul as events, to handle aspectual events:
    1. The party will begin at noon.
    2. The man began the lecture at noon.

In the remainder of the document, we outline a BNF for TimeML, withthe ultimate goal of specifying complete XML schema definitions for the language, and not simply a DTD. The reason why an XML schema is preferable to a DTD is that an XML schema provides an initial richer set of data types for constraining the value of attributes, and also provides a mechanism for adding user-defined types.

2.0 From STAG to TimeML

This section presents a BNF for the temporal annotation language presented in Andrea Setzer's thesis (Setzer, 2001). Consideration of the details of this BNF, which, as noted above, the TimeML working group came to call STAG (Sheffield Temporal Annotation Guidelines), in conjunction with problems raised in trying to apply it to actual texts, resulted in several changes and extensions to Setzer's original scheme. Detailing these issues will help justify the details of this initial pass at TimeML.

A note on attribute values: All attribute values are marked as 'potential values' in Setzer's thesis; those marked with * may need to have a larger, but still closed, set specified. Those not so marked are probably sufficient as is.

2.1 Tags and Attributes for STAG

<EVENT>

attributes ::= eid class [argEvent] [tense] [aspect] [([signalID]
relatedToEvent eventRelType) | ([signalID] relatedToTime timeRelType)]
//* N.B. argEvent is dependent on class='REPORTING'
eid ::= <integer>
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL'
argEvent ::= <integer>
tense ::= 'PAST' | 'PRESENT' | 'FUTURE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'
signalID ::= <integer>
relatedToEvent ::= <integer>
*eventRelType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' |
'SIMULTANEOUS'
relatedToTime ::= <integer>
*timeRelType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' |
'SIMULTANEOUS'

<TIMEX>

attributes ::= tid type calDate [(eid signalID relType)]
//* calDate is limited to [[DD]MM]YYYY | ('SPR'|'SUM'|'AUT'|'WIN')YYYY
//* A standard SGML or XML DTD cannot represent this, but an XML schema can

//* N.B. (eid signalID relType) is dependent on type='COMPLEX'

tid ::= <integer>
*type ::= 'DATE' | 'TIME' | 'COMPLEX'
calDate ::= PCDATA
eid ::= <integer>
signalID ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' |
'SIMULTANEOUS'

<SIGNAL>

attributes ::= sid
sid ::= <integer>

<DOA>

'DOA' stands for 'Date of article'.

No attributes

2.2 Comments and Extensions

One thing that is striking in looking at this BNF is this fragment of the attribute structure of EVENT:

[([signalID] relatedToEvent eventRelType) | ([signalID] relatedToTime
timeRelType)]

In each case, we are dealing not with three unrelated attributes, but with three attributes that only make sense as a unit. The same triad also appears in the attribute structure of TIMEX:

[(eid signalID relType)]

Moreover, as the specification of the values for the eventRelType and timeRelType attributes of EVENT and the relType attribute of TIMEX, we are really dealing with one property, whose values are specified three times. This is forced in the case of eventRelType and timeRelType for EVENT by virtue of the fact that only the name of the attribute can link it to relatedToEvent or relatedToTime, respectively. And, of course, since relType is defined on TIMEX, not EVENT, it must repeat the specification of permissible values.

All these considerations suggest that these triplets of attributes should be factored out into the form of a new abstract tag (i.e. one which consumes no input text). This would formally express the fact that these attributes are linked, allow eventRelType, timeRelType and relType to be collapsed into a single attribute, and allow the specification of the possible values of this single attribute to be stated only once.

[Note: Of course in BNF (or in an XML DTD) it would be possible to specify an abstract element as the value of eventRelType, timeRelType and relType and thus state their possible values only once, but we would still be left with the fact that the inherent relation between a signalID, relatedID, and relType would be unexpressed in the STAG annotation language.]

For these reasons, we remove the cited triplets from the definition of EVENT and TIMEX and introduce the tag:

<LINK>

attributes ::= (eventID | timeID) [signalID] (relatedToEvent |
relatedToTime) relType
eventID ::= <integer>
timeID ::= <integer>
signalID ::= <integer>
relatedToEvent ::= <integer>
relatedToTime ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' |
'SIMULTANEOUS'

eventID and timeID are used to anchor the link to an EVENT or TIMEX (the element that would have contained the [signalID] (relatedToEvent | relatedToTime) relType triple before it was factored out into LINK). Note that factoring out this triplet also entails that the decision on where to record this information is now no longer arbitrary. Previously, the information could be recorded on either of two related events, but there was no principle to decide which event should contain this information.

In addition to purely formal considerations of the geometry of STAG (essentially, refactoring considerations, in the sense of Fowler (1999) and many others), the TimeML working group also found empirical considerations motivating adding the LINK tag. The original STAG framework had been designed with the presupposition that any given EVENT would be related to at most one other EVENT or other indexed element. Attempts to annotate various newswire articles showed that this assumption was false, and that a single EVENT could be related to more than one other indexed element. Here is one such example:

2.2.1 Document Annotation in STAG

FAMILIES SUE OVER AREOFLOT CRASH DEATHS

   The Russian airline Aeroflot has been
<EVENT eid=1 relatedToTime=1 timeRelType=BEFORE tense=PRESENT
aspect=PERFECTIVE class=OCCURRENCE>
hit
</EVENT>
with a writ for loss and damages,
<EVENT eid=2 tense=NONE aspect=PERFECTIVE relatedToEvent=1
eventRelType=BEFORE class=OCCURRENCE>
filed
</EVENT>
in Hong Kong by the families of seven passengers
<EVENT eid=3 tense=NONE aspect=PERFECTIVE relatedToEvent=2
eventRelType=BEFORE
class=OCCURRENCE relatedToEvent2=4 eventRel2Type=IS_INCLUDED
signal2=1>
killed
</EVENT>
<SIGNAL sid=1>
in
</SIGNAL>
an air
<EVENT eid=4 class=OCCURRENCE>
crash
</EVENT>.

   All 75 people
<STATE stid=1 relatedToEvent=5 eventRelType=INCLUDES>
on board
</STATE>
the Aeroflot Airbus
<EVENT eid=5 tense=PAST aspect=PERFECTIVE relatedToEvent=6
eventRelType=IAFTER signal=2>
died
</EVENT>
<SIGNAL sid=2>
when
</SIGNAL>
it
<EVENT eid=6 tense=PAST aspect=PERFECTIVE relatedToTime=2
timeRelType=IS_INCLUDED relatedToEvent=4 eventRelType=ID>
ploughed
</EVENT>
into a Siberian mountain
<SIGNAL sid=3>
in
</SIGNAL>
<TIMEX tid=2 type=DATE calDate=041994>
March 1994
</TIMEX>.

...

<DOA tid=1>
03-27-96
</DOA>

There are several notable features of this annotation. First, notice this <EVENT> element:

<EVENT eid=3 tense=NONE aspect=PERFECTIVE relatedToEvent=2
eventRelType=BEFORE
class=OCCURRENCE relatedToEvent2=4 eventRel2Type=IS_INCLUDED
signal2=1>
killed
</EVENT>

which features the addition of the ad hoc attributes relatedToEvent2 and eventRel2Type as an attempt to allow multiple related events, along with the new attribute signal2, to link the signal to this related event and no other. Clearly this solution is fragile, in that it requires either a cut-off of related events at some arbitrary number or else an open-ended set of triplets of the form:

signalN relatedToEventN and eventRelNType

Note that the LINK tag introduced above solves both the problem of the potentially unbounded number of related events and that of relating a particular signal to a given related EVENT (or other indexed element).

2.2.2 Document Annotation with LINK

Given the existence of the LINK tag, we can rewrite the above annotation as follows:

FAMILIES SUE OVER AREOFLOT CRASH DEATHS

   The Russian airline Aeroflot has been
<EVENT eid=1 tense=PRESENT aspect=PERFECTIVE class=OCCURRENCE>
hit
</EVENT>
<LINK eventID=1 relatedToTime=1 relType=BEFORE/>
with a writ for loss and damages,
<EVENT eid=2 tense=NONE aspect=PERFECTIVE class=OCCURRENCE>
filed
</EVENT>
<LINK eventID=2 relatedToEvent=1 relType=BEFORE/>
in Hong Kong by the families of seven passengers
<EVENT eid=3 tense=NONE aspect=PERFECTIVE class=OCCURRENCE>
killed
</EVENT>
<LINK eventID=3 relatedToEvent=2 relType=BEFORE/>
<LINK eventID=3 signalID=1 relatedToEvent=4
relType=IS_INCLUDED/>
<SIGNAL sid=1>
in
</SIGNAL>
an air
<EVENT eid=4 class=OCCURRENCE>
crash
</EVENT>.

   All 75 people
<STATE stid=1>
on board
</STATE>
<LINK stateID=1 relatedToEvent=5 relType=INCLUDES/>
the Aeroflot Airbus
<EVENT eid=5 tense=PAST aspect=PERFECTIVE>
died
</EVENT>
<LINK eventID=5 signalID=2 relatedToEvent=6 relType=IAFTER/>
<SIGNAL sid=2>
when
</SIGNAL>
it
<EVENT eid=6 tense=PAST aspect=PERFECTIVE>
ploughed
</EVENT>
<LINK eventID=6 relatedToTime=2 relType=IS_INCLUDED/>
<LINK eventID=6 relatedToEvent=4 relType=ID/>
into a Siberian mountain
<SIGNAL sid=3>
in
</SIGNAL>
<TIMEX tid=2 type=DATE calDate=041994>
March 1994
</TIMEX>.

...

<DOA tid=1>
03-27-96
</DOA>

In addition to the abstraction that LINK provides, there are several other additions exhibited in the annotation presented above.

3.0 Further Additions

3.1 Modifications to Existing Tags and Attributes

<EVENT>

The tense attribute adds the value:

'NONE'
for untensed verb forms, such as participles, etc.

<DOA>

DOA previously had no attributes; it now adds the attribute

tid ::= <integer>
which allows the DOA to serve as a temporal anchor for the entire article

<LINK>

The relType attribute adds the values:

'IAFTER'
immediately after
'IBEFORE'
immediately before
'ID'
identity (of events)

3.2 New Tags

<STATE>

attributes ::= stid
stid ::= <integer>

The TimeML working group found that, in addition to annotating events, it is also useful to annotate a select subset of states. We have decided to recognize for markup, only those states which are identifiably changed over the course of the document being marked up. For example, in the present document, in the expression the Aeroflot Airbus the relationship indicating that the Airbus is run and operated by Aeroflot is not a State in the present sense here. Rather, because it is persistent throughout the event line of the document, we factor it out and it is not marked up. On the other hand, properties that are known to change during the events represented/reported in an article will be marked as States, as illustrated below:

All 75 people
<STATE stid=1>
on board
</STATE>
<LINK stateID=1 relatedToEvent=5 relType=INCLUDES/>
the Aeroflot Airbus
<EVENT eid=5 tense=PAST aspect=PERFECTIVE>
died
</EVENT>
<LINK eventID=5 signalID=2 relatedToEvent=6 relType=IAFTER/>
<SIGNAL sid=2>

4.0 BNF for TimeML

4.1 Tags and Attributes Defined Above

Putting together all the modifications and additions discussed above gives us the following BNF for a first pass at TimeML.

<EVENT>

attributes ::= eid class [argEvent] [tense] [aspect]
//* N.B. argEvent is dependent on class='REPORTING'
eid ::= <integer>
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL'
argEvent ::= <integer>
tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'NONE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'

<TIMEX>

attributes ::= tid type calDate
//* calDate is limited to [[DD]MM]YYYY | ('SPR'|'SUM'|'AUT'|'WIN')YYYY
//* A standard SGML or XML DTD cannot represent this, but an XML schema can
tid ::= <integer>
*type ::= 'DATE' | 'TIME' | 'COMPLEX'
calDate ::= PCDATA

<STATE>

attributes ::= stid
stid ::= <integer>

<SIGNAL>

attributes ::= sid
sid ::= <integer>

<DOA>

'DOA' stands for 'Date of article'.

attributes ::= tid
tid ::= <integer>

<LINK>

attributes ::= (eventID | timeID | stateID) [signalID] (relatedToEvent
| relatedToTime) relType
eventID ::= <integer>
timeID ::= <integer>
stateID ::= <integer>
signalID ::= <integer>
relatedToEvent ::= <integer>
relatedToTime ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' |
'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'ID'

4.2 Additional Tags and Attributes

TEMPORAL FUNCTIONS

The working group discussed the case of temporal expressions where the calendar date is not referred to directly, but there is an expression that acts as a temporal function over a Timex expression. Different examples of this are given below.

  1. John died last Monday.
  2. John will die next month.
  3. John died two weeks ago.

Assume these are all bound to an event predicate such as "John died", with an already established DOA. One proposal is to treat the "modifiers" such as "next", "last", and "N_<time-unit>_ago" as signals, relating to other times and events. For example, (1) would be marked up as follows:

 
<DOA tid=1>DDMMYYYY</DOA>
he
<EVENT eid=1 tense=PAST aspect=PERFECTIVE>
died
</EVENT>
<SIGNAL sid=1>
last
</SIGNAL>
<LINK  eventID=1 signalID=1 relatedToTime=2 relType=BEFORE
magnitude=+1/>
<TIMEX tid=2 type=DATE>
Monday
</TIMEX>.
Sentence (2) would be represented as follows:
 
<DOA tid=1>DDMMYYYY</DOA>
he
<EVENT eid=1 tense=FUTURE >
die
</EVENT>
<SIGNAL sid=1>
next
</SIGNAL>
<LINK eventID=1 signalID=1 relatedToTime=2 relType=AFTER
magnitude=+1/>
<TIMEX tid=2 type=DATE>
month
</TIMEX>.

It would also be possible to link the temporal function expression to the DOA, but this would make it less explicit how the event predicate of "die" is bound to "last Monday". Furthermore, this would not be parallel to the derivation of "John died on Monday".

<DOA tid=1>DDMMYYYY</DOA>
he
<EVENT eid=1 tense=PAST>
died
</EVENT>
<SIGNAL sid=1>
on
</SIGNAL>
<LINK eventID=1 signalID=1 relatedToTime=2 relType=IS_INCLUDED
magnitude=0/>
<TIMEX tid=2 type=DATE>
Monday
</TIMEX>.

Notice that this particular proposal has added a new attribute to <LINK>, magnitude, which acts as a quantifier over the appropriate temporal granularity. The granularity is dependent upon the type and the caldate of the <TIMEX>. Hence, if the value is "Monday", then the granularity for magnitude to operated over is a week, and so on. This relates to the issues in the Ontology Working Group as well as Hobbs' Semantic Web Temporal Spec Notes. Hence, the new specification for <LINK> is as follows:

<LINK>

attributes ::= (eventID | timeID | stateID) [signalID] (relatedToEvent
| relatedToTime) relType magnitude
 
eventID ::= <integer>
timeID ::= <integer>
stateID ::= <integer>
signalID ::= <integer>
relatedToEvent ::= <integer>
relatedToTime ::= <integer>
*relType ::= 'BEFORE' | 'AFTER' | 'INCLUDES' | 'IS_INCLUDED' |
'SIMULTANEOUS' | 'IAFTER' | 'IBEFORE' | 'ID'
magnitude ::= <integer>

There are many cases of the temporal functions that were discussed by the WG that will perhaps be handled in a slightly different way, if this is adopted. For example, reference to events directly would be more complex:

  1. He died the week before the party.
  2. He died after the hijacking.

In addition to the need for temporal functions of the sort sketched out to handle expressions beyond those handled by the TIMEX2 annotation, they are also useful for creating intensional descriptions of temporal expressions. During the first meetings of the plenary group, it was discovered that many temporal expressions are too indeterminate or fuzzy to be reduced to a concrete ISO time value.

<SCALE>

The notion of scale was introduced, but we will defer discussion until the next version of this document.

Temporal Functions

Various temporal expressions have interpretations that are best expressed in terms of temporal functions. For example, temporal expressions like the following might be annotated with interpetations such as those specified. (We have used pre-theoretical, but fairly transparentm\, names to represent the hypothetical functions. DOA is the date of the article, presupposed for concreteness.)

last week = (predecessor (week DOA))

last Thursday = (thursday (predecessor (week DOA))

the week before last = (predecessor (predecessor (week DOA)))

next week = (successor (week DOA))

Such representations present a problem for annotation because the needed functions, which would be best expressed as XML tags, can't appear as the values of attributes in another XML tag. As always, the solution is to use a tag's ID as the value of an attribute, in place of the tag itself. Given this strategy, TimeML can define whatever functions are necessary, and pass in the id of a function (tag) whenever we want to use it as the value of an attribute. If we further allow the arguments of functions to be function IDs, we can compose functions by using their IDs as pointers (much as in the box and pointer notation for LISP). To show how this would work, we present the following sample annotations. For concreteness, we assume each appears in a document which has

<DOA tid=t1>

This ID provides a temporal anchor in all the examples.

(1) last week

<SIGNAL sid=1>
last
</SIGNAL>
<TIMEX3 tid=2 type=DATE valueFromFunction=tf2>
week
</TIMEX3>

<coerceTo tfid=tf1 argumentID=t1 value=WEEK/>

<predecessor tfid=tf2 signalID=1 argumentID=tf1 value=1/>

(2) last Thursday

<SIGNAL sid=1>
last
</SIGNAL>
<TIMEX3 tid=2 type=DATE valueFromFunction=tf3>
Thursday
</TIMEX3>

<coerceTo tfid=tf1 argumentID=t1 value=WEEK/>

<predecessor tfid=tf2 signalID=1 argumentID=tf1 value=1/>

<getNamedElementOf tfid=tf3 argumentID=tf2 value=THURSDAY/>

N.B. 'last Thursday' and 'Thursday of last week' should get the same interpretation.

(3) the week before last

<TIMEX3 tid=2 type=DATE valueFromFunction=tf2>
the week
</TIMEX3>
<SIGNAL sid=1>
before last
</SIGNAL>

<coerceTo tfid=tf1 argumentID=t1 value=WEEK/>

<predecessor tfid=tf2 signalID=1 argumentID=tf1 value=2/>

(4) next week

<SIGNAL sid=1>
next
</SIGNAL>
<TIMEX3 tid=2 type=DATE valueFromFunction=tf2>
week
</TIMEX3>

<coerceTo tfid=tf1 argumentID=t1 value=WEEK/>

<successor tfid=tf2 signalID=1 argumentID=tf1 value=1/>

Some notes on these sample representations:

  1. We have used pre-theoretical, relatively obvious names for the functions in the examples above. We intend to use functions (tags) that, as closely as possible, follow the names of the temporal functions in the ontology we adopt. By doing this, our ontology would provide a denotational semantics for our XML-based temporal function markup.
  2. Similarly, we have used relatively arbitrary and whitebread names for the attributes of the sample temporal functions (i.e. 'argumentID' and 'value'). We would want to give the attributes (i.e. arguments or parameters) of these function entities more meaningful names in the final version of this specification.
  3. Since it would be unwieldy for our markup to recursively wrap SUCCESSOR (or PREDECESSOR) around itself, we have given SUCCESSOR (and PREDECESSOR) a second integer argument, which essentially encodes the number of calls to the function; i.e.
    (SUCCESSOR foo 1) = (primSuccessor foo)
    
    (SUCCESSOR foo 2) = (primSuccessor (primSuccessor foo))
    
    ...
    
    (SUCCESSOR foo n) = (primSuccessor_1 ... (primSuccessor_n foo) ...)
    

    This is much like the INCF (increment-by) and DECF (decrement-by) functions in Common LISP, except that INCF and DECF allow their second argument to not appear, in which case it defaults to 1. For our purposes, we want the 'count' argument to always be present.

Aspectual Verbs

Relations expressed through predicates and nominal expressions are typically anchored as deictic events. Aspect expressed on the verb is a means of looking inside the event to focus on a segment or particular part of an event. For example,

  1. a. John built the house.
    b. John has built the house.
    c. John is building the house.
    d. John had built the house.
In languages such as English and French, there is an additional grammatical device of aspectual predication, which focuses on four facets of the event history:
  1. a. Initiation: begin, start
    b. Termination: stop, end
    c. Completion: finish, complete
    d. Continuation: continue, keep
Here, a member of a closed class of predicates is able to select a verbal or nominal complement as an argument and mark that event with the function (designation) associated with one of the facets above.

For TimeML, we will designate the class of aspectual predicates as events of class ASPECTUAL. This class will have an additional attribute which we will call PHASE. This attribute will take one of the four facets listed above. Finally, we will add the attribute ARGEVENT to ASPECTUAL events as well.

<EVENT>

attributes ::= eid class [argEvent] [tense] [aspect] [phase]
//* N.B. argEvent is dependent on class='REPORTING' or class='ASPECTUAL''
eid ::= <integer>
*class ::= 'OCCURRENCE' | 'PERCEPTION' | 'REPORTING' | 'ASPECTUAL'
argEvent ::= <integer>
tense ::= 'PAST' | 'PRESENT' | 'FUTURE' | 'NONE'
aspect ::= 'PROGRESSIVE' | 'PERFECTIVE'
phase ::= 'INITIATION' | 'COMPLETION' | 'TERMINATION' | 'CONTINUATION'

To illustrate this mark up, consider a couple of example sentences.

  1. The boat began to sink quickly.
  2. The search party stopped looking for the survivors.
These two sentences are represented below in the markup defined above.
The boat
<EVENT eid=1 tense=PAST aspect=PERFECTIVE phase=INITIATION
argEvent=2 >
began
</EVENT>
<EVENT eid=2 tense=null aspect= null>
sink
</EVENT>. 

The search party
<EVENT eid=1 tense=PAST aspect=PERFECTIVE phase=TERMINATION
argEvent=2>
stopped
</EVENT>
<EVENT eid=2 tense=null aspect= PROGRESSIVE>
looking
</EVENT>. 
for the survivors. 

Confidence Levels

In various discussions of the full TERQAS groups, the utility of being able to mark confidence values for various aspects of the annotation was pointed out. In general, it would be useful to allow confidence values to be assigned to any tag, and, in fact, to any attribute of any tag.

A convenient way to do this would be to create a confidence tag, which would consume no input, and which would have the following attributes:

<CONFIDENCE>

attributes ::= tagType tagID [attributeName] confidenceValue

where

tagType
would range over the names of all the tags of TimeML
tagID
would range over the set of actual tag IDs within the current document
attributeName
would range over the names of all the attributes of all the tags of TimeML
confidenceValue
would range over the rationals (i.e. would have a floating point value) between 0 and 1

So, for example, given this annotation:

The TWA flight
<EVENT eid=1 class=OCCURRENCE tense=PAST aspect=NONE>
crashlanded
</EVENT>
<LINK eventID=1 signalID=1 relatedToTime=1 relType=BEFORE
durationID=1/>
on Easter Island
<DURATION did=1 value=2w>
two weeks
</DURATION>
<SIGNAL sid=1>
ago
</SIGNAL>.

...

<DocCreationTime>
<TIMEX tid=1 type=DATE calDate=12201999>
12-20-1999
</TIMEX>
</DocCreationTime>

if we wanted to indicate that we were unsure that we had not annotated DURATION correctly, we could add this annotation:

<CONFIDENCE tagType=DURATION tagID=1 confidenceValue=0.50/>

where the lack of the optional attribute, attributeName, indicates that the confidence applies to the whole tag.

On the other hand, if we wanted to indicate that we weren't sure if the tense of 'crashlanded' was really PAST, we could add this annotation:

<CONFIDENCE tagType=EVENT tagID=1 attributeName=tense
confidenceValue=0.75/>

Abstracting confidence measures as a separate tag frees the annotation from having to include a confidence value attribute in every tag and eliminates the problem of uncertainty over the exact attribute of a tag the confidence value applies to.

Note: currently LINKs do not have IDs. If we want to apply confidence measures to LINKs and/or their attributes, we will need to give each LINK a unique ID under this proposal.

As for how confidence values should be assigned in manual annotation, we feel that, in a large-scale annotation effort such as TIMEBANK, two conditions should be satisfied:

  1. Fairly high inter-annotator agreement on the tag assignment in the text.
  2. Ease of use and habitability of the tool from the annotator's perspective.

Therefore, the annotation of a scalar value such as confidence should have at least two features:

  • The choice of confidence values should be as clearly defined as possible to cover the options; this relates to the granularity and orders of magnitude as presented by Jerry Hobbs as well. This would suggest a selection from a small set (e.g. low, mid, high; not_sure, sure, abolutely_sure). These could be interpreted or rescaled to a (0,1] range, if need be, for subsequent inference.
  • There should be a default value specified (at high (=1)) so that it is not necessary to annotate all links and attributes for them with a confidence.
The constraint on human annotators to a subset of the possible values should be documented in the annotation guidelines and implemented in the annotation tool. And it would probably be best if the annotation tool did not present numbers but rather natural language descriptions such as those suggested above, which would be represented in the underlying annotation numerically. For example, the annotator might pick "moderately certain", which would enter the annotation as .5. Moreover, for manual annotation, it does not seem that the 0 and 1 values will be used/useful. Presumably if the annotator doesn't trust an annotation at all s/he won't add it. And, as was suggested above, 1, at least for manual annotation, should be the default or unmarked value, and so need not be noted, since it would bulk up the files considerably, even if it were used only on entire tags.

Bibliography

Fowler, Martin (1999) Refactoring: Improving the Design of Existing Code. Addison-Wesley, Reading, Massachusetts.

Setzer, Andrea (2001) Temporal Information in Newswire Articles: An Annotation Scheme and Corpus Study, Doctoral Dissertation, University of Sheffield, Sheffield, UK. -- e.g. 'shortly after'; so 3 events may be ordered, but middle one closer to one than to the other aspect as signal Rob: granularity relation = DUREX -->