In order to accomplish this goal, we will propose an XML DTD which records the basic events in a Communicator-compliant system which can be annotated with type information indicating that a data element is "significant" from the point of view of annotators (and annotation tools).
To clarify we will consider the following (term definitions are by no means final and are open to suggestion):
Data | Obligatory | Standard access |
Duration of session | yes | readable directly off the XML representation proposed below |
Duration of turn (input or output) | yes | readable directly off the XML representation proposed below |
Duration of generation of output (in a phone demo, the time the synthesizer takes to generate the audio file) | yes | see 1 |
Duration of display of output (in a phone demo, how long it takes to play the audio file) | yes | see 2 |
Duration of recognition of input (in a phone demo, how long it takes the recognizer to produce its hypotheses) | yes | see 3 |
Duration of arbitrary operations | no | readable directly off the XML representation proposed below |
Number of turns within a session | yes | readable directly off the XML representation proposed below |
Number of sessions (in our current model each session is its own logfile) | yes | readable directly off the XML representation proposed below |
The audio files corresponding to the user input and system output and their formats. The audio files should be stored and distributed with the logs, and the pathnames of these files should be relative to the log. | yes | accessed given an arbitrary search of the logged data (see the "audio_input" and "audio_output" values for the type attribute of the GC_DATA tag, as well as the "mime_type" attribute) |
The text of the user input chosen by the system | yes | accessed given an arbitrary search of the logged data (see the "text_input" values for the type attribute of the GC_DATA tag) |
The text of the system output | yes | accessed given an arbitrary search of the logged data (see the "text_output" value for the type attribute of the GC_DATA tag) |
All possible input sentences (from the recognizer) up to a certain limit (TBD) (N/A to systems that use a word lattice) | no | accessed given an arbitrary search of the logged data (see the "text_input_hypothesis" value for the type attribute of the GC_DATA tag) |
Indication of whether the parse succeeded | no | see 4 |
The full input interpretation | no | accessed given an arbitrary search of the logged data |
The elements which may pose minor complications have been left blank. Here we make tentative proposals for each of these:
We propose that operations should be logged as single XML elements. For example:
<operation
server="nl"
turnid="-01"
location="localhost:11000"
name="paraphrase_reply"
stime="930254422.720000"
etime="930254422.790000"
>
<data
type="input"
key="tidx"
>
3813
</data>
<data
type="output"
key=":reply_string"
>
Hi! Welcome to MITRE's Travel demonstration. This call is being recorded for system development. You may hang up or ask for help at any time. How can I help you?
</data>
</operation>
Since in our distributed architecture
messages are sent asynchronously, and many events may occur before the
completion of an operation, some caching (or post processing) will be necessary
to log operations as single elements.
Next we will try to define the main entities
in the logfile and their formats. A DTD is also available which defines
these terms and their relations. We will assume all time types will use
a standard base time known as "the epoch", the number of milliseconds since
January 1, 1970, 00:00:00 GMT.
Name | Description | Type | Required |
id | We should attempt to determine a unique identifier for sessions. MIT's solution for this is of the following format (IP:process id:session counter). Process id's might not be trivial to achieve in different programing languages and OS' however there usually are "equivalent" data available | string | yes |
stime | time when session started | milliseconds | yes |
etime | time when session finished | milliseconds | yes |
GC_TURN | see GC_TURN | GC_TURN | no |
<GC_SESSION
id="129.10.2.200:1010:3"
stime="930254422.720000"
etime="930254434.790000"
>
...
</GC_SESSION>
Name | Description | Type | Required |
id | A unique identifier within each session | number | yes |
stime | time when turn started | milliseconds | yes |
etime | time when turn ended | milliseconds | yes |
GC_DATA | see GC_DATA | GC_DATA | no |
GC_OPERATION | see GC_OPERATION | GC_OPERATION | no |
GC_FRAME | see GC_FRAME | GC_FRAME | no |
<GC_TURN
id="-01"
stime="930254422.720000"
etime="930254424.790000"
>
...
</GC_TURN>
Name | Description | Type | Required |
turnid | the turn id that this operation was executed under | number | yes |
stime | time when operation started | milliseconds | yes |
etime | time when operation ended | milliseconds | yes |
server | the name (according to the program file) of the server that executed the operation | string | yes |
location | the server (real server name or IP address) and its port (server_name:port_number) | string | yes |
name | the name of the operation | string | yes |
GC_DATA | see GC_DATA | GC_DATA | no |
GC_FRAME | see GC_FRAME | GC_FRAME | no |
<GC_OPERATION
server="nl"
turnid="-01"
location="localhost:11000"
name="paraphrase_reply"
stime="930254422.720000"
etime="930254422.790000"
>
<GC_DATA
type="input"
key="tidx"
>
3813
</GC_DATA>
<GC_DATA
type="output"
key=":reply_string"
>
Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for system development. You may hang up or ask for help at any time. How can I help you?
</GC_DATA>
</GC_OPERATION>
Name | Description | Type | Required |
key | the name of this data point | string | yes |
turnid | the turn id that this operation was executed under | number | no |
time | time stamp for this data point | milliseconds | no |
type | valid values of type include audio_input, audio_output, text_input, text_output, text_input_hypothesis, and concept. See the Content section. | string | yes |
mime_type | the mime type of the data | string | no |
<GC_DATAkey=":synth_log_filename">
turnid="-01"
type="audio_output"
mime_type="audio/wav"
/home/communicator/Travel-demo/../logs/travel_cfone/19990624/006/travel_ cfone-19990624-006-synth--01.wav
</GC_DATA><GC_DATA
key=":listening_has_begun">
turnid="000"
time="930254422.790000"
</GC_DATA>
Name | Description | Type | Required |
frame_type | Galaxy frame type | string | no |
name | the name of the frame | string | no |
turnid | the turn id that this operation was executed under | number | no |
GC_DATA | see GC_DATA | GC_DATA | no |
<GC_FRAME
turnid="000"
name="scores"
type="c"
><GC_DATA</GC_FRAME>
key=":total_score"
>
-1408.9955
</GC_DATA>
<GC_DATA
key=":acoustic_score"
>
-1367.4408
</GC_DATA>
<GC_DATA
key=":ngram_score"
>
-15.5547
</GC_DATA>
<GC_DATA
key=":nphones"
>
58
</GC_DATA>
<GC_DATA
key=":nwords"
>13
</GC_DATA>
<?xml version="1.0"?>
<!ELEMENT GC_LOG GC_SESSION*>
<!ELEMENT GC_SESSION GC_TURN* >
<!ATTLIST GC_SESSION id NMTOKEN #REQUIRED>
<!-- time could be defined as CDATA if we chose to use a non
millisecond format -->
<!ATTLIST GC_SESSION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_SESSION etime NMTOKEN #REQUIRED>
<!ELEMENT GC_TURN ( GC_OPERATION | GC_DATA | GC_FRAME )*>
<!ATTLIST GC_TURN id NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN stime NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN etime NMTOKEN #REQUIRED>
<!ELEMENT GC_OPERATION ( GC_DATA | GC_FRAME )*>
<!ATTLIST GC_OPERATION turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION server CDATA #REQUIRED>
<!ATTLIST GC_OPERATION location NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION name CDATA #REQUIRED>
<!ATTLIST GC_OPERATION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION etime NMTOKEN #REQUIRED>
<!ELEMENT GC_DATA ANY>
<!ATTLIST GC_DATA key NMTOKEN #REQUIRED>
<!ATTLIST GC_DATA type NMTOKENS #REQUIRED>
<!ATTLIST GC_DATA mime_type NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA time NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA turnid NMTOKEN #IMPLIED>
<!ELEMENT GC_FRAME ( GC_DATA | GC_FRAME )*>
<!ATTLIST GC_FRAME frame_type NMTOKEN #IMPLIED>
<!ATTLIST GC_FRAME name CDATA #IMPLIED>
<!ATTLIST GC_FRAME turnid NMTOKEN #IMPLIED>