Galaxy Communicator Tutorial:
Creating an End-to-End System
We're almost there. We can build servers,
write Hub program files, handle errors, set up broker backchannels, understand
the design of UI elements, log our activity, and keep track of multiple
users over multiple interactions. In this last lesson, we'll address a
number of remaining details.
Selecting
your servers
The toy travel demo illustrated a representative
set of servers you might need in your end-to-end system:
-
An audio server which supports some sort of
telephony hardware, such as ComputerFone or Dialogic;
-
A recognition server which maps audio samples
into text strings;
-
A parsing server which maps text strings into
semantic frames;
-
A dialogue manager server which incorporates
each incoming semantic frame into the current context and decides how to
act on the resulting context;
-
An information server, such as a relational
database;
-
A language generation server which maps outgoing
frames into text strings;
-
A speech synthesis server which maps text
strings into audio samples.
There are a wide range of variations on this
configuration. For instance:
-
In some cases, you may choose to "federate"
audio, recognition and synthesis in the same server. You might do this,
for instance, because your recognizer can only read input from the audio
device (this is the case with JSAPI-compliant recognizers in Java 1.2,
due to limitations of Javasound in that version of Java).
-
You may have a separate server for mapping
dialogue representations into representations the information server can
understand (an SQL generation server, for instance).
-
You may have a different information server,
such as a script which harvests information from Web pages, or you may
have multiple information servers, either for cacheing, redundancy, or
to cover multiple domains.
-
Your recognizer may return more structured
output, such as n-best lists or word lattices, or provide confidence measures
along with its recognition results.
-
You may choose to separate the dialogue server's
function of incorporation into context from its function of choosing what
to do with the resulting context.
-
You may choose to provide a user profile server,
either for security or user modeling or both, which might register a user
by voice or by touchtone input.
-
You may choose to have multiple UI elements,
such as a Web browser and a telephone connection working in tandem.
The Galaxy Communicator architecture places
no restrictions on any of these configurations (although in its current
form, it may accidently or intentionally make some configurations easier
than others).
We plan in FY02 to develop an open source
toolkit of all the available Communicator-compliant wrappers and servers
developed by the program participants which are free and available to use.
Contacting
the Hub
One thing we haven't talked about in detail
is how to set up servers which contact the Hub. It's actually trivial.
There's a command line argument -contact_hub
which is available by default to all Communicator-compliant servers which
allows you to specify host:port
where a Hub has set up the appropriate listener. You can find detailed
documentation about this in the Hub
listener documentation. The short version is that there are two steps.
First, you should use the CLIENT_PORT:
program file directive to set up the Hub listener, as we saw in our program
file tutorial. Second, you should start up the server using -contact_hub.
As an example, here's the command line to start up the Audio server from
the toy travel demo:
% $DEMO_ROOT/bin/Audio -audio_data $GC_HOME/tutorial/toy-travel/short-example.frames
-contact_hub localhost:2800 -verbosity 0
The Hub program file has the appropriate listener
set up:
SERVICE_TYPE: Audio
CLIENT_PORT: 2800
OPERATIONS: Play
Under normal operating circumstances, you
should be able to connect as many servers as you want to the Hub listener;
there is no limit imposed. If you want the listener locked to a particular
session, you can use the -session_id
command line argument we talked about in the session
lesson.
Guidelines
for writing servers and Hub programs
Always use the environment to write messages
to the Hub
There are publicly available functions in
the Galaxy Communicator API which allow you to write messages to the Hub
using the connection directly, instead of the environment. But if you use
call environments, you're always guaranteed of passing along the correct
session information. We recommend always using the call environment.
Make sure to declare your dispatch functions
and Hub operations
If you write a dispatch function in a server,
you have to declare
it using GAL_SERVER_OP if you want the server to know about it. Similarly,
if you want the Hub to know that the server has a dispatch function, you
have to declare it in the appropriate OPERATIONS:
directive entry.
Make sure your messages are distinct
The Hub programming language is very forgiving,
in the sense that it will ignore or pay attention to messages which invoke
programs depending on the keys in the message. So, for example, you could
use main for the name of every
message you send to the Hub via GalSS_EnvWriteFrame()
or GalSS_EnvDispatchFrame(),
and use the keys in the message to decide what to do with them. However,
this strategy makes the program file extremely hard to read. We have learned,
from studying MIT's program files, that it's much more straightforward
to distinguish your messages at least by the server that sent them. Our
toy
travel example demostrates this strategy.
Use the listener-in-Hub capability at least
for your UI elements
The easiest way to manage UI connections to
the Hub (that is, GUIs, audio servers, and the like) is to have them contact
the Hub instead of the other way around. If you set things up using the
listener-in-Hub
capability, you'll be able to have UI elements contact the Hub on an ad-hoc
basis, and there are simple hooks for keeping the sessions separate as
well. You can use listener-in-Hub for all your server connections, if you
choose, but it's most desirable for UI elements.
The Builtin
server
The Builtin
server is a special server which is implemented as part of the Hub itself.
This server is always available, which means that it doesn't need to be
declared in the Hub program file. It has special access to the internals
of the Hub operations, so that it can be used to provide information about
the Hub state (such as the servers which are available, or the state of
various namespaces). The Builtin server is a grab bag of functionality,
and we're not going to talk about most of it. We'll concentrate here on
a small number of potentially interesting and relevant functions.
nop
The dispatch function nop
is a no-op. It simply returns its input frame. The reason you might want
to use nop is that there may
be some times when you may want simply to update a key-value pair in a
namespace when a certain condition is met:
RULE: :output_parse == "PARSE_FAILED" --> Builtin.nop
OUT: (:parse_failed 1)
The Hub scripting language probably ought
to provide a case where you can omit the operation, but for now, using
nop
is the appropriate strategy.
call_program
This dispatch function can be used to construct
a new message to the Hub. The name of the program should be passed in as
the value of the :program
key. All the other key-value pairs passed to call_program
will appear in the new message. The new message will be processed like
any other new message; it will either invoke a program with the appropriate
name, be relayed to a server which supports an operation with the appropriate
name, or be discarded. This dispatch function will also return the result
of the executed program, if requested.
For example, the toy travel demo unifies
the processing of its typed input and the "output" of the Recognizer server
by routing both to the same program (named UserInput),
as follows:
PROGRAM: FromRecognizer
RULE: :input_string --> Builtin.call_program
IN: (:program "UserInput") :input_string
OUT: none!
In this example, no result is requested, so
none will be provided.
new_session
This dispatch function creates or resets a
session. For creation, this dispatch function is really not needed, because
sessions are created automatically when they're mentioned, but this dispatch
function is crucial if you try to reuse a session ID. For instance, you
might be running a system which will never have multiple simultaneous users,
and you might be using the default session as your session every time.
This
is not recommended. However, if you must do this, new_session will
reset the current session state and start a new log file in the appropriate
circumstances (i.e., if you're calling new_session
on this session for the second time, or later).
end_session
This dispatch function ends the current session.
We discussed in the session
lesson how important this is.
destroy
This dispatch function destroys the current
token. We alluded to this dispatch function when we talked about the special
destroy!
value
of OUT: in the program file
tutorial. This method of destroying the token is somewhat less efficient
than the OUT: value, but it's an available alternative.
hub_exit
This dispatch function causes the Hub to exit,
in case you ever want this sort of thing under program control.
Debugging
strategies
There are a number of things you can do to
make it easier for you to understand what's going on (and what's going
wrong) with your end-to-end system. In this section, we describe some of
them briefly.
Builtin.hub_break
This Builtin dispatch function suspends the
execution of the Hub and enters a loop where you can inspect the state
of various namespaces. We used this functionality to halt the Hub in our
initial lesson on the toy
travel demo. If there's a Hub program rule that isn't getting fired
and you think it should be (or vice versa), you may find it useful to insert
a call to Builtin.hub_break
at the appropriate point in your Hub program.
The -debug
argument to the Hub
You can also access this same breakpoint behavior
by using the -debug Hub
command line argument. This argument will cause the Hub to break after
it sends each new message. You can shut this behavior off as the Hub is
running if you choose to, by typing a capital C at the breakpoint prompt.
Exploit your verbosity settings
As we described in the toy
travel demo lesson, the Hub and servers support 6 levels of verbose
output, from nothing (level 0) to 6 (everything). These levels can be controlled
by setting the GAL_VERBOSE
environment variable in your shell, or by using the -verbosity
command line arguments for the Hub and servers. The default is 3. If you
choose a level above 3, you'll get more information. These levels are not
clearly defined yet, but among the most reliable are:
-
At level 4, you'll get a full printout of
the message traffic in the servers instead of the truncated messages with
ellipses that you usually see.
-
At level 6, you'll get the actual XDR encodings
which are written to and read from the servers and the Hub. It's not necessarily
advisable to use this level if you have broker connections, since the broker
data encoding will be dumped as well.
In the future, we'll be setting up the verbosity
levels to be more clearly defined and more useful, but you may find them
useful right now.
MODE: pedantic
In normal circumstances, the Hub will report
errors in program files, but a number of those errors will not cause the
Hub to exit (for instance, a reference to an undeclared operation). If
you insert the directive entry "MODE:
pedantic" into your program file, the Hub will always exit when it
encounters these errors. This is a good way to find typos in your server
and operation names.
Fake desktop audio
If you're building a telephony application,
we strongly recommend constructing versions of your audio server to handle
desktop audio, and one to handle text I/O. Both of these should present
as similar as possible a set of functionality as the telephony server.
MITRE is currently leading an effort to develop an open-source audio solution,
but it is not yet available. We're also hoping to cooperate with other
providers of open-source audio servers to settle on a common message set.
Avoiding trouble
Consider using the -assert command line argument
If a server is starting up a listener for
Hub connections (as opposed to using the listener-in-Hub
capability), it will use the first port it finds available. If the requested
port isn't available, it will try the requested port + 1, and so on until
it finds a port it can start up on. This behavior can sometimes lead to
unexpected consequences. For instance, if you start up a server on port
6000 and there's already a server running on that port, the new server
will happily start up on port 6001. If you don't notice this, your Hub
may try to contact the wrong server, or you might not be able to make a
connection at all. One way to avoid this is to use the
-assert
server
command line argument, which will force the server to exit if the requested
port isn't available.
Don't forget about the initial token
You may remember that we talked about the
initial
token when we talked about how to send new messages to the Hub. This
initial token can be specified using the INITIAL_TOKEN:
directive entry in the Hub program file. While we don't really recommend
using the initial token for anything besides global initializations and
simple tests and examples, you should be aware that you don't have complete
control over the appearance and content of the initial token. In particular,
for historical reasons, even if you don't specify INITIAL_TOKEN:, an initial
token will be created if there's a program named main.
If your main program is being
invoked unexpectedly on a token with token index 0, you should check to
see if there is some accidental overlap between the keys you expect and
the keys in the default initial token.
Turn off the Hub pacifier
In all of our exercises, we've used the -suppress_pacifier
command line argument to the Hub. If this flag is not provided, the Hub
will print out a period (".") each second when it is idle. If you're trying
to scan back through the Hub output, many displays (X terminals and the
process monitor, for instance) will cause problems because they scroll
to the end whenever there's output. In the process monitor, you can press
the "Pause" button; but in general, if you're having problems with this,
use -suppress_pacifier.
Don't forget to activate your broker client
When we talked about broker
clients, we emphasized the importance of GalIO_SetBrokerActive. It's
important to reemphasize that if you don't do this, the broker client will
never try to read data from the broker server. This feature is a historical
idiosyncracy which we hope to remove in the future, but for now, this can
cause problems.
Builtin.increment_utterance and logging
You may recall when we talked about the content
of the Hub log, we told you to ignore the features :BEGIN_UTT
and the reference to -01 throughout
the log. These elements have to do with the way the Hub was originally
designed to handle utterance boundaries. The Hub has an utterance counter,
which can be incremented using the Builtin dispatch function increment_utterance.
The original designers assumed that a Communicator-compliant system would
begin with a presentation to the user (utterance -1), followed by a call
to increment_utterance, and
then a call to increment_utterance
each time the Hub program writer determined an utterance boundary was reached.
The utterance counter has had three uses
in the history of the Hub:
-
It is used to index entries in a Hub-internal
database which is local to each session. You can use the STORE:
directive entry to store key-value pairs in a frame associated with the
current utterance counter in the current session, and use the RETRIEVE:
directive entry to retrieve key-value pairs from a frame associated with
(by default) the
previous utterance counter in the current session.
If you have a strong commitment to stateless servers, you may find this
functionality useful as a storage device. However, it clearly depends on
the utterance counter being incremented; otherwise, you'd just be able
to use the session namespace.
-
The original MIT audio server used this counter
to help name the audio files logged for the given session. We believe there
are better ways to do this; in particular, we recommend that the audio
server determine its own file names, and report them to the Hub so that
they can be logged in the Hub log.
-
Incrementing the utterance counter has the
effect of ending an utterance in the log and starting a new one. When we
designed the XML DTD for the XML
version of the Hub log, we introduced a tag called GC_TURN which correspond
this boundary. However, we subsequently moved to a different model of introducing
"meaningful" information into the log, through a system of annotation rules
which "decorate" existing tags with meaningful landmarks which indicate
things like the beginning and end of turns. This was because we didn't
want people to have to write their program files a particular way in order
to guarantee logs that could be scored. As a result, the GC_TURN tag is
mostly ignored in the log analysis process, with a single exception: it's
possible to write annotation rules which rely on GC_TURN as a scoping device,
and under some circumstances, it's difficult to write your rules if you
don't have enough scoping levels. This is a flaw which ought to be addressed
eventually, but has not been yet.
In summary, unless you want to use STORE:
and RETRIEVE:, you can usually ignore the utterance counter. But you should
be aware of the issues involved.
Other historical artifacts
There are a few other elements which are worth
mentioning which may surprise you or can get you into trouble:
-
The frame type (the c
at the beginning of the frame's string representation) is a relic of how
frames were originally used (and are still used in the MIT servers): as
hierarchical representations of parses and semantic interpretations. The
c stands for "clause"; there are also predicates ("p") and topics ("q"
for quantificational element). Frames without the leading type (e.g.,
{
foo :bar 1 }) are interpreted as clauses.
We only use the clause type in the infrastructure proper.
-
The Galaxy Communicator header include line
for C code should always be -I$GC_HOME/include,
and the main header file should always be #include <galaxy/galaxy_all.h>.
This is because all the Galaxy Communicator header files assume that the
include lines end with the include/
directory, and will include other files which are referenced as, say, #include
<galaxy/util.h>. This is a historical
idiosyncracy which will never be changed.
-
When you define a server, we told you that
you should always specify #define USE_SERVER_DATA.
Originally, dispatch functions only had one argument, the frame, and it's
still technically possible to define dispatch functions which don't have
the second call environment argument. There are a few servers still floating
around in the Communicator community which are implemented this way. Don't
do this. You won't have the appropriate access to the channels to write
new messages through, or to probe the call environment in any way. We're
hoping that in time, this declaration will be superfluous, but for now,
it's crucial.
-
Finally, you may wonder why the second argument
of dispatch functions is a void *,
rather than a call environment pointer. This, again, is a historical relic,
but because of the vagaries of typing in C and the base of installed code
already finished, it's something that we can't change.
Summary
Congratulations! You've completed the Galaxy
Communicator tutorial. Not only should you have enough information to understand
the remainder of the documentation, you should also know enough by now
to actually get started building your own end-to-end system. Good luck!
Please send comments and suggestions to:
bugs-darpacomm@linus.mitre.org
Last updated October 4, 2001
Copyright (c) 1999 - 2001
The MITRE
Corporation
ALL RIGHTS RESERVED