Open Source Toolkit Documentation

The MITRE Java(TM) Desktop Audio Server


The MITRE Java(TM) Desktop Audio Server (JDAS) is intended to provide a cross-platform Communicator-compliant desktop audio interface to speech recognition/synthesis servers. This server interacts appropriately with legacy servers using MIT's audio server, features both server-driven and frame-driven barge-in, and introduces the capability of sending and receiving audio in a variety of formats supported by the Java(TM) Sound API.  Telephony support is emerging, but simulated telephony is available by way of a keypad GUI.  JDAS features an event-driven, multi-threaded architecture which is described in the associated JDAS API documentation.



Status

History and plans

We have been working on JDAS for about a year. It works fairly well on Windows, although the voice activity detection algorithm is difficult to control. Unfortunately, due to limitations of JavaSound, the server doesn't work on Sparc Solaris or Intel Linux, and we have no hopes of it working at any time in the future. We have abandoned development of JDAS in favor of a C implementation based on the JDAS design and exploiting the PortAudio cross-platform audio I/O library.

Version history


Usage

In order to run this server, you must have at a minimum the Java(TM) 2 Runtime Environment, Standard Edition v1.3.0 or better, available for download at http://java.sun.com/j2se/.  You must also have the galaxy.jar file from GCSI version 3.0 or later.

Assuming the JRE is installed at JAVATM_HOME, that MITRE_ROOT is set to <GC_HOME>/contrib/MITRE, and that JDAS_ROOT is set to <OSTK_ROOT>/audio_io/jdas, JDAS may be invoked (in Unix csh) as follows:

% setenv CLASSPATH $CLASSPATH:$MITRE_ROOT/bindings/java/lib/galaxy.jar
% setenv CLASSPATH $CLASSPATH:$JDAS_ROOT/src/jdas.jar
% $JAVATM_HOME/bin/java org.mitre.jdas.JdasMainServer
There's also the usual script provided in the bin directory.

Command line

<OSTK_ROOT>/bin/jdas ...

Default port

12345

Command line arguments

All command line arguments requiring values must be followed by an equal sign ('='), which is then followed by
the value.  No spaces are permitted before or after the equal sign.
 
switch optional argument type default value description/constraints
-playbox yes N/A N/A MIT playbox mode. Forces backward compatibility with the MIT playbox brokering protocol, to provide broker-level interoperability with legacy servers.
-ptt yes N/A N/A Disable voice activity detection, allowing for push-to-talk operation.
-nobarge yes N/A N/A Disable server-driven barge-in.
-debug yes N/A N/A Enable debugging output.
-half_duplex yes N/A N/A Enable half duplex operation.  NOTE: This is highly recommended on Linux platforms.
-nogui yes N/A N/A Disable controller GUI
-nokeypad yes N/A N/A Disable keypad GUI
-nostreaming yes N/A N/A Disable streaming mode - force server to record a complete utterance, then send the utterance at once.
-record_encoding yes string PCM_SIGNED Encoding used for recording.  Supported formats: PCM_SIGNED, PCM_UNSIGNED, ULAW, ALAW. See the Java Sound Home Page for more information.
-record_sample_rate yes integer 8000 Sample rate used for recording.
-record_sample_size yes integer 16 Sample size in bits/sample used for recording.
-record_channels yes integer 1 Number of channels to record.
-record_little_endian yes N/A N/A Record using little endian byte ordering rather than big endian.
-playback_encoding yes string PCM_SIGNED Default encoding used for playback.  Supported formats: PCM_SIGNED, PCM_UNSIGNED, ULAW, ALAW.  May be overridden by the :encoding key of a broker frame - see receive_audio
-playback_sample_rate yes integer 8000 Default sample rate used for playback.  May be overridden by the :sample_ratekey of a broker frame - see receive_audio
-playback_sample_size yes integer 16 Default sample size in bits/sample used for playback.  May be overridden by the :sample_size_in_bits key of a broker frame - see receive_audio
-playback_channels yes integer 1 Default number of channels for playback.  May be overridden by the :channels key of a broker frame - see receive_audio
-playback_little_endian yes N/A N/A Playback using little endian byte ordering by default rather than big endian.  May be overridden by the :big_endian key of a broker frame - see receive_audio


Message set



receive_audio handles incoming audio broker requests, and places received audio data in a playback queue.
 
parameter type optional depends on description/constraints
IN: :binary_host string Broker server host.
:binary_port integer Broker server port.
:call_id string Call ID string.
:encoding string yes   Sample encoding.  Same options as listed in command line arguments.
:sample_rate integer yes   Sample rate.
:sample_size_in_bits integer yes Sample size in bits
:channels integer yes Number of channels to play
:frame_size integer yes Frame size in bytes
:frame_rate integer yes Frame rate, frames per second
:big_endian string yes Big endian byte ordering indicator; recognized values are "true" and "false".

This message returns a frame.

disable_streaming turns off stream-based brokering.  This causes the server to broker entire utterances once they are recorded. This message returns a frame.

enable_streaming turns on stream-based brokering.  This causes the server to broker entire utterances one buffer at a time. This message returns a frame.

toggle_streaming toggles the streaming mode. This message returns a frame.

mute_toggle toggles audio muting.  This effectively turns off voice activity detection. This message returns a frame.

resend brokers the most recently recorded utterance. This message returns a frame.

barge_in implements frame-driven barge in. The currently playing message is stopped, but other queued messages are continued. This message returns a frame.

flush_messages implements obnoxious frame-driven barge in, dumping all queued messages.  Currently unimplemented. This message returns a frame.

start_recording implements frame-driven record start. This message returns a frame.

stop_recording implements frame-driven record stop. This message returns a frame.

reinitialize sets up JDAS for operation, including default recording format options. See the special properties of reinitialize.
 
parameter type optional depends on description/constraints
IN: :encoding string yes   Sample encoding.  Same options as listed in command line arguments.
:sample_rate integer yes   Sample rate.
:sample_size_in_bits integer yes Sample size in bits
:channels integer yes Number of channels to play
:frame_size integer yes Frame size in bytes
:frame_rate integer yes Frame rate, frames per second
:big_endian string yes Big endian byte ordering indicator; recognized values are "true" and "false".

 This message returns a frame.


Messages issued

  • Audio Ready
  • Audio Status
  • Simulated Telephony
  • JDAS issues messages to the Hub when recording begins and ends, when playback begins, ends, or is cancelled, and in response to interaction with the simulated telephony keypad GUI.

    Audio Ready

    When the audio server detects audio input, it dispatchs a broker request for another server to read the audio, which can be claimed by the recognizer. The broker request is of the form In default operation, JDAS expects the broker to accept GAL_BINARY data, and expects the receiving server to determine the appropriate audio format from the broker frame.  In playbox compatibility mode, JDAS expects the server to accept GAL_INT_16 (for 16 bit PCM) data, as well as the incoming control messages for the MIT playbox protocol.

    Audio Status

    JDAS dispatches frames to the hub reporting on its current audio I/O state.  These are:
     
    :recording_has_begun,:recording_has_ended,
    :playback_has_begun,:playback_has_ended
    {c jdas <status key> <int: 1>}
    :playback_cancelled {c jdas :playback_cancelled <int: 1>
    :seconds_played <float:>}

    Simulated Telephony

    JDAS offers a "fake telephony" keypad GUI for simulated telephony input.  This includes a normal telephone keypad, plus a button to change the hook status of the "phone", and a push-to-talk button to start/stop recording.  Messages issued in response to keypad presses and hook status button presses are as follows:
     
    :call_answered,:call_disconnected {c jdas <status key> <int: 1>}
    :touchtone {c jdas :touchtone <string: "1"-"9","*","#">}


    Known bugs

    We know they exist, but we haven't tabulated them yet.


    Please send comments and suggestions to: bugs-darpacomm@linus.mitre.org
    Last updated January 25, 2002

    Copyright (c) 1998- 2002
    The MITRE Corporation
    ALL RIGHTS RESERVED