DARPA Communicator Testbed

Log Standard Proposal (v5)

Introduction

This document is intended to establish standards for logfile contents and format. We will try to determine what is the smallest set of data necessary in order to re-run a system, yet also includes meaningful metrics. This may vary depending on how much of the system is to be re-run as well as what we would like to measure. In the process we will attempt to establish a standard format which all logfiles can be converted to (or generated in, although we foresee that at least a minimal amount of inferencing might be required to render the logs in this form). A goal of this document is to provide a standard that is flexible and general enough such that it could be used in different domains.

In order to accomplish this goal, we will propose an XML DTD which records the basic events in a Communicator-compliant system which can be annotated with type information indicating that a data element is "significant" from the point of view of annotators (and annotation tools).

To clarify we will consider the following (term definitions are by no means final and are open to suggestion):

Session - The interaction of a user with the system. In our current demonstration the equivalent of a phone call. A session is composed of a set of turns.
Turn - The set of operations performed by the system in the course of processing and presenting a single dialogue participant's utterance.
Operation - Every command executed by the system within a turn. Every operation can send and receive data.
Message - Messages are items sent from a server to the hub, and their replies. In contrast to operations, messages are initiated by servers.
Events - Examples of events are internal hub errors, locks, alarm expirations, alarm enabling/disabling, and alarm resets.
Data - A set of key/value pairs.

The definition of "turn" requires special attention. In some accounts, a turn is an exchange between user and system. In a robust dialogue context, this definition fails to be adequate when the user or system barges in with follow-up information, etc., or when the dialogue involves more than two parties (a situation which we shouldn't rule out). We propose that the term "turn" in the context of these log files be reserved for the processing of a single participant's utterance (either user or system). This definition is not without its problems. For instance, it's not clear whether a call to the backend belongs at the end of the processing of a user's utterance (because it's the presentation of the utterance to the backend) or the beginning of the processing of the system's utterance (because it's the source of the system's response). We can currently think of nothing that this decision hinges on in the data analysis, and recommend that either interpretation be recognized at the moment.

Content

Here we will try to discuss the granularity of data to be logged in an end-to-end system. The contents of these bullets were derived mainly from the information needed by MITRE to do its own internal evaluation and will probably change as the perspectives of other sites are incorporated. Every log should contain enough information to determine the following (here input refers to the user sending information to the system and output refers to the system sending information to the user). Ideally, all this information should be extractable from the log file without any site-specific analysis. In this table, we describe the data to be logged, whether it's optional or obligatory, and how we propose to standardize access to the data:

Data	Obligatory	Standard access
Duration of session	yes	readable directly off the XML representation proposed below
Duration of turn (input or output)	yes	readable directly off the XML representation proposed below
Duration of generation of output (in a phone demo, the time the synthesizer takes to generate the audio file)	yes	see 1
Duration of display of output (in a phone demo, how long it takes to play the audio file)	yes	see 2
Duration of recognition of input (in a phone demo, how long it takes the recognizer to produce its hypotheses)	yes	see 3
Duration of arbitrary operations	no	readable directly off the XML representation proposed below
Number of turns within a session	yes	readable directly off the XML representation proposed below
Number of sessions (in our current model each session is its own logfile)	yes	readable directly off the XML representation proposed below
The audio files corresponding to the user input and system output and their formats. The audio files should be stored and distributed with the logs, and the pathnames of these files should be relative to the log.	yes	accessed given an arbitrary search of the logged data (see the "audio_input" and "audio_output" values for the type attribute of the GC_DATA tag, as well as the "mime_type" attribute)
The text of the user input chosen by the system	yes	accessed given an arbitrary search of the logged data (see the "text_input" values for the type attribute of the GC_DATA tag)
The text of the system output	yes	accessed given an arbitrary search of the logged data (see the "text_output" value for the type attribute of the GC_DATA tag)
All possible input sentences (from the recognizer) up to a certain limit (TBD) (N/A to systems that use a word lattice)	no	accessed given an arbitrary search of the logged data (see the "text_input_hypothesis" value for the type attribute of the GC_DATA tag)
Indication of whether the parse succeeded	no	see 4
The full input interpretation	no	accessed given an arbitrary search of the logged data

The elements which may pose minor complications have been left blank. Here we make tentative proposals for each of these:

Duration of output generation. In a system where there is a single, obvious call to the synthesizer, this is simply the duration of that operation, but this is only one possible configuration. We propose that the "type" attribute be added to the GC_OPERATION element and that a "virtual" operation be generated by a postprocess phase with a distinguished type (say, "synthesis_duration"); alternatively, we could introduce a new XML element (say, GC_EVENT) reserved for these "virtual" events.
Duration of output presentation. In the MIT system, this is an inference from notifications posted by the audio server (playing_has_begun, playing_has_ended; see the Communicator documentation for the MIT audio server). This could be handled similarly to output generation, or we could add optional start and end time attributes to the GC_DATA element which contains the audio file.
Duration of recognition. Again, we propose to handle this similarly to output generation.
Indication of whether the parse succeeded. Again, this is frequently an inference. We can insert a distinguished GC_DATA element (say, with a type of "input_parse_successful").

We believe that this sort of proposal will allow sites to gather data in the form they prefer, and augment it with sharable semantics in such a way that individual sites' data will retain its site-specific integrity.

Format

We believe that XML would be a good candidate language for this format for many reasons, among them that there is a growing supply of viewers, editors, as well as a variety of parsers available in many programming languages.

We propose that operations should be logged as single XML elements. For example:

<operation
    server="nl"
    turnid="-01"
    location="localhost:11000"
    name="paraphrase_reply"
    stime="930254422.720000"
    etime="930254422.790000"
>
    <data
        type="input"
        key="tidx"
    >
    3813
    </data>
    <data
        type="output"
        key=":reply_string"
    >
    Hi! Welcome to MITRE's Travel demonstration. This call is being recorded for system development. You may hang up or ask for help at any time. How can I help you?
    </data>
</operation>

Since in our distributed architecture messages are sent asynchronously, and many events may occur before the completion of an operation, some caching (or post processing) will be necessary to log operations as single elements.

Next we will try to define the main entities in the logfile and their formats. A DTD is also available which defines these terms and their relations. We will assume all time types will use a standard base time known as "the epoch", the number of milliseconds since January 1, 1970, 00:00:00 GMT.

GC_SESSION

A session represents an interaction of a user with the system. In our current demo the equivalent to a phone call. The elements in this table refer to the XML DTD.

Name	Description	Type	Required
id	We should attempt to determine a unique identifier for sessions. MIT's solution for this is of the following format (IP:process id:session counter). Process id's might not be trivial to achieve in different programing languages and OS' however there usually are "equivalent" data available	string	yes
stime	time when session started	milliseconds	yes
etime	time when session finished	milliseconds	yes
GC_TURN	see GC_TURN	GC_TURN	no

Example:

<GC_SESSION
    id="129.10.2.200:1010:3"
    stime="930254422.720000"
    etime="930254434.790000"
>
    ...
</GC_SESSION>

GC_TURN

Consists of each interaction of the user with the system, as discussed in the introduction. The elements in this table refer to the XML DTD.

Name	Description	Type	Required
id	A unique identifier within each session	number	yes
stime	time when turn started	milliseconds	yes
etime	time when turn ended	milliseconds	yes
GC_DATA	see GC_DATA	GC_DATA	no
GC_OPERATION	see GC_OPERATION	GC_OPERATION	no
GC_FRAME	see GC_FRAME	GC_FRAME	no

Example:

<GC_TURN
    id="-01"
    stime="930254422.720000"
    etime="930254424.790000"
>
    ...
</GC_TURN>

GC_OPERATION

Every command executed by the system within a turn. All operations can send and receive data, frames or audio files. The elements in this table refer to the XML DTD.

Name	Description	Type	Required
type	the type of operation being executed (specific values TBD)	string	no
turnid	the turn id that this operation was executed under	number	yes
stime	time when operation started	milliseconds	yes
etime	time when operation ended	milliseconds	yes
server	the name (according to the program file) of the server that executed the operation	string	yes
location	the server (real server name or IP address) and its port (server_name:port_number)	string	yes
name	the name of the operation	string	yes
reply_type	valid values of reply_type include normal, detroy, and error	string	no
GC_DATA	see GC_DATA	GC_DATA	no

Example:

<GC_OPERATION
    server="nl"
    turnid="-01"
    location="localhost:11000"
    name="paraphrase_reply"
    stime="930254422.720000"
    etime="930254422.790000"
>
    <GC_DATA
        type="input"
        key="tidx"
    >
    3813
    </GC_DATA>
    <GC_DATA
        type="output"
        key=":reply_string"
    >
    Hi! Welcome to Mitre's Travel demonstration. This call is being recorded for system development. You may hang up or ask for help at any time. How can I help you?
    </GC_DATA>
</GC_OPERATION>

GC_MESSAGE

Messages are items sent from a server to the hub, and their replies. In contrast to operations, messages are initiated by servers. The elements in this table refer to the XML DTD.

Name	Description	Type	Required
type	the type of message being issued (specific values TBD)	string	no
turnid	the turn id that this operation was executed under	number	yes
time	time when message issued	milliseconds	yes
server	the name of the server that issued the message	string	yes
location	the server (real server name or IP address) and its port (server_name:port_number)	string	yes
name	the name of the message	string	yes
reply_type	valid values of reply_type include normal, detroy, and error	string	no
GC_DATA	see GC_DATA	GC_DATA	no

Example:

<GC_MESSAGE
    server="audio"
    turnid="0"
    location="localhost:15000"
    name="filelog"
    time="941241950.190000"
>
    <GC_DATA
        key="tidx"
    >
    23
    </GC_DATA>
    <GC_DATA
        key=":utt_log_filename"
    >
    /home/communicator/test/Travel-demo/../logs/travel_cfone/19991029/001/travel_cfone-19991029-001-000.wav
    </GC_DATA>
</GC_MESSAGE>

GC_EVENT

Examples of events are internal hub errors, locks, alarm expirations, alarm enabling/disabling, and alarm resets. The elements in this table refer to the XML DTD.

Name	Description	Type	Required
type	the type of hub event (SYSTEM_ERROR, LOCK, etc.)	string	yes
turnid	the turn id under which this event occurred	number	yes
time	time when message issued	milliseconds	yes
name	the name of the event	string	yes
GC_DATA	see GC_DATA	GC_DATA	no

Example:

<GC_EVENT
    type="LOCK"
    turnid="0"
    name=":hub_get_session_lock"
    time="941473398.130000"
/>

GC_DATA

A key/value pair. This datatype can be used to display the information involved in an operation, as well as to display the contents of a GC_FRAME. The elements in this table refer to the XML DTD.

Name	Description	Type	Required
key	the name of this data point	string	yes
turnid	the turn id that this operation was executed under	number	no
time	time stamp for this data point	milliseconds	no
type	valid values of type include audio_input, audio_output, text_input, text_output, text_input_hypothesis, and concept. See the Content section.	string	no
mime_type	the mime type of the data	string	no
GC_FRAME	see GC_FRAME	GC_FRAME	no

Examples:

<GC_DATA
key=":synth_log_filename"
turnid="-01"
type="audio_output"
mime_type="audio/wav"
>
/home/communicator/Travel-demo/../logs/travel_cfone/19990624/006/travel_ cfone-19990624-006-synth--01.wav
</GC_DATA>
<GC_DATA
key=":listening_has_begun"
turnid="000"
time="930254422.790000"
>
</GC_DATA>

GC_FRAME

This stucture would allow for recording of frames. The elements in this table refer to the XML DTD.

Name	Description	Type	Required
frame_type	Galaxy frame type	string	no
name	the name of the frame	string	no
turnid	the turn id that this operation was executed under	number	no
GC_DATA	see GC_DATA	GC_DATA	no

Example:

<GC_FRAME
    turnid="000"
    name="scores"
    type="c"
>
<GC_DATA
    key=":total_score"
>
-1408.9955
</GC_DATA>
<GC_DATA
    key=":acoustic_score"
>
-1367.4408
</GC_DATA>
<GC_DATA
    key=":ngram_score"
>
-15.5547
</GC_DATA>
<GC_DATA
    key=":nphones"
>
58
</GC_DATA>
<GC_DATA
    key=":nwords"
>13
</GC_DATA>
</GC_FRAME>

Code support

MITRE volunteers to work with sites to produce the appropriate conversion tools from MIT logfiles to the proposed logfile standard. If more appropriate, we will produce a new logging module for the Hub which will simplify this process; however, we don't envision this to be necessary.

Document Type Definition (DTD)

Below we provide an XML DTD to define the above types.

<?xml version="1.0"?>

<!ELEMENT GC_LOG GC_SESSION*>
<!ATTLIST GC_LOG logfile_version NMTOKEN #IMPLIED>

<!ELEMENT GC_SESSION GC_TURN* >
<!ATTLIST GC_SESSION id NMTOKEN #REQUIRED>

<!ATTLIST GC_SESSION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_SESSION etime NMTOKEN #REQUIRED>

<!ELEMENT GC_TURN ( GC_OPERATION | GC_MESSAGE | GC_EVENT )*>
<!ATTLIST GC_TURN id NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN stime NMTOKEN #REQUIRED>
<!ATTLIST GC_TURN etime NMTOKEN #REQUIRED>

<!ELEMENT GC_OPERATION GC_DATA*>
<!ATTLIST GC_OPERATION type NMTOKENS #IMPLIED>
<!ATTLIST GC_OPERATION turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION server CDATA #REQUIRED>
<!ATTLIST GC_OPERATION location NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION name CDATA #REQUIRED>
<!ATTLIST GC_OPERATION reply_type CDATA #IMPLIED>
<!ATTLIST GC_OPERATION stime NMTOKEN #REQUIRED>
<!ATTLIST GC_OPERATION etime NMTOKEN #REQUIRED>

<!ELEMENT GC_MESSAGE GC_DATA*>
<!ATTLIST GC_MESSAGE type NMTOKENS #IMPLIED>
<!ATTLIST GC_MESSAGE turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE server CDATA #REQUIRED>
<!ATTLIST GC_MESSAGE location NMTOKEN #REQUIRED>
<!ATTLIST GC_MESSAGE name CDATA #REQUIRED>
<!ATTLIST GC_MESSAGE reply_type CDATA #IMPLIED>
<!ATTLIST GC_MESSAGE time NMTOKEN #REQUIRED>

<!ELEMENT GC_EVENT GC_DATA*>
<!ATTLIST GC_EVENT type NMTOKENS #REQUIRED>
<!ATTLIST GC_EVENT turnid NMTOKEN #REQUIRED>
<!ATTLIST GC_EVENT name CDATA #REQUIRED>
<!ATTLIST GC_EVENT time NMTOKEN #REQUIRED>

<!ELEMENT GC_DATA ANY>
<!ATTLIST GC_DATA key NMTOKEN #REQUIRED>
<!ATTLIST GC_DATA type NMTOKENS #IMPLIED>
<!ATTLIST GC_DATA mime_type NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA time NMTOKEN #IMPLIED>
<!ATTLIST GC_DATA turnid NMTOKEN #IMPLIED>

<!ELEMENT GC_FRAME GC_DATA*>
<!ATTLIST GC_FRAME frame_type NMTOKEN #IMPLIED>
<!ATTLIST GC_FRAME name CDATA #IMPLIED>
<!ATTLIST GC_FRAME turnid NMTOKEN #IMPLIED>