DARPA Communicator Testbed


Human Annotations File

A human annotations file consists of a top level tag GC_LOG_ANNOTATIONS, containing some number of GC_SESSION tags. Each GC_SESSION tags may contain a GC_ANNOT tag (indicating task completion), and some number of GC_DATA tags containing transcriptions. Here is a sample annotations file:

<GC_LOG_ANNOTATIONS>
   <GC_SESSION id="199.94.106.6:20300:0">
      <GC_ANNOT task_completion="1"/>
      <GC_DATA type_utt_text="transcription" turnid="0" dtype="string">
         i'd like a flight from boston to san francisco
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="2" dtype="string">
         in the morning
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="4" dtype="string">
         does the american flight serve breakfast
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="6" dtype="string">
         what kind of plane is that
      </GC_DATA>
      <GC_DATA type_utt_text="transcription" turnid="8" dtype="string">
         goodbye
      </GC_DATA>
   </GC_SESSION>
</GC_LOG_ANNOTATIONS>
 

Here is an XML DTD for the above sample:

<?xml version="1.0"?>

<!ELEMENT GC_LOG_ANNOTATIONS (GC_SESSION)*>
<!ATTLIST GC_LOG_ANNOTATIONS annot_version CDATA #IMPLIED>

<!ELEMENT GC_SESSION ( GC_ANNOT | GC_DATA )*>
<!ATTLIST GC_SESSION id NMTOKEN #REQUIRED>
<!ATTLIST GC_SESSION stime NMTOKEN #IMPLIED>
<!ATTLIST GC_SESSION etime NMTOKEN #IMPLIED>

<!ELEMENT GC_ANNOT EMPTY>
<!ATTLIST GC_ANNOT type_task_completion CDATA #REQUIRED>

<!ELEMENT GC_DATA (#PCDATA)>
<!ATTLIST GC_DATA type_utt_text CDATA #IMPLIED>
<!ATTLIST GC_DATA dtype NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA turnid NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA tidx NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA direction NMTOKEN #IMPLIED>