Open Source Toolkit Documentation:

The MITRE SoX Wrapper

License / Documentation home / Help and feedback


MITRE is providing an initial implementation of a Communicator-compliant wrapper for the SoX audio manipulation toolkit. SoX is a well-established open-source tool for audio file conversion of all sorts, including format conversion, resampling, volume manipulation, and effects (reverb, chorus, etc.). MITRE's wrapper can be used for any of these purposes, as well as playing and recording audio files in any of the supported formats. The current wrapper requires SoX version 12.17.3 (note that the subminor versions threaten to be significant for this tool).



Status

Like CATS, the SoX wrapper requires threads. This is because the basic model for SoX has each formatter reading from and writing to a file, and it was necessary to introduce pipes and threads in order to interact with the SoX library. The interaction is complex, and we aren't foolish enough to believe we got it right the first time. However, initial tests appear to work.

One problem with the wrapper is that the SoX library is not cleanly designed as a library. There are two notable problems:

We imagine that the reason for these design flaws is that SoX is primarily a file-based command line tool which doesn't anticipate receiving multiple requests.

Like CATS, the SoX wrapper requires threads (because of SoX's file-based processing model), but is not known to be thread-safe enough that the individual timed tasks can be run as threads themselves. The core Communicator Makefile support for this configuration (thread-aware, but not tasks as threads) should be transparent, but isn't. Just in case you ever wonder why the Unix Makefile is more complicated than normal.

History and plans

The idea for this server originated in conjunction with plans for CATS, and the need for occasional audio manipulation facilities which we didn't want to have to build into CATS. Ideally, of course, they would be, for the sake of efficiency, but we judged a separate server to be a fair place to start.

Given the chance to continue this work, we'll probably try to make some contributions to SoX to address the library design issues.

Version history


Usage

Command line

<OSTK_HOME>/bin/sox_server

Default port

9305

Command line arguments

static char *oas[] = {
  "-cats_compatibility", "recognize only audio formats recognized by CATS",
  "-outbound_brokers", "force SoX to write to brokers instead of proxies"
  NULL
};

This server also accepts the standard server arguments. See the command-line argument parsing library in the Galaxy Communicator documentation for details.


Message set


sox_play  accepts the name of a file, and makes the file available via brokering according to the specified audio formatting and processing parameters. Note that the output data is raw, rather than in a file format; it's completely described by the sample rate, data encoding, sample size and number of channels. This dispatch function will attempt to return an error result if it encounters a processing problem.
 
  parameter type optional depends on description/constraints
IN: :input_filename string no
  name of the file to be played.
  :input_sample_rate integer yes   sample rate of input file, if different from what the file implies or if the file format doesn't include the sample rate
  :input_data_encoding string yes   data encoding of input file, if different from what the file implies or if the file format doesn't include the data encoding. Legal values are PCM_SIGNED, PCM_UNSIGNED, ULAW, ALAW (these names were chosen for consistency with CATS), FLOATING_POINT, ADPCM, IMA_ADPCM, GSM. See the SoX man page for more details.
  :input_channels integer yes   channels in input file, if different from what the filename implies or if the file format doesn't include the number of channels. Legal values are 1, 2, 4.
  :input_filetype string yes   type of input file, if different from what the file implies. See the SoX man page for legal values.
  :input_data_size integer yes   number of bits in each input sample, if different from what the file implies or if the file format doesn't include the data encoding.  Legal values are 8, 16, 32.
  :volume float yes
volume change from input to output. > 1.0 is increase, < 1.0 is decrease. See the SoX man page for more details.
 
:verbose
N/A
yes
 
if present, SoX will provide verbose reports about what it's up to.
  :swap N/A
yes   if this key is present, bytes will be swapped in sample sizes above 8 bits.
  :effects list of lists of strings yes   if you want to add effects processing, this value should be a list of lists, where each sublist is a sequence of strings corresponding to the effects entries on the SoX command line. For instance, you might specify
reverb 1.0 200 100 50
on the command line to add reverb; this would translate to a value for :effects of
( ( "reverb" "1.0" "200" "100" "50" ) )
The reason for this style is that the individual effects digest their arguments from the argv string array, and it's easy to translate a list of strings into an array of strings.
 
:output_sample_rate
integer
yes
 
desired sample rate of brokered output data, if different from the input.
 
:output_data_encoding
string
yes
 
desired data encoding for brokered output data, if different from input. See :input_data_encoding for values.
 
:output_channels
integer
yes
 
desired number of channels for brokered output data, if different from input. See :input_channels for values.
 
:output_data_size
integer
yes
 
desired size in bits of each sample in brokered output data, if different from input. See :input_data_size for values.
OUT:
:proxy
proxy
yes
 
outbound broker proxy for audio data. Proxies are used unless the -outbound_brokers command line argument is present.
 
:host
string
yes
 
if -outbound_brokers is present, host of SoX server
 
:port
integer
yes
 
if -outbound_brokers is present, port where audio is available
 
:call_id
string
yes
 
if -outbound_brokers is present, unique ID of broker request
 
:output_sample_rate
integer
no
 
sample rate of brokered output data
 
:output_data_encoding
string
no
 
data encoding for brokered output data. See :input_data_encoding for values.
 
:output_channels
integer
no
 
number of channels for brokered output data. See :input_channels for values.
 
:output_data_size
integer
no
 
size in bits of each sample in brokered output data. See :input_data_size for values.

This message returns a frame.

sox_record accepts the name of a file, and writes incoming broker data into that file, modifying it according to the specified audio formatting and processing parameters. Note that the input data is raw, rather than in a file format; it's completely described by the sample rate, data encoding, sample size and number of channels. This dispatch function will attempt to return an error result if it encounters a processing problem.
 
  parameter type optional depends on description/constraints
IN: :output_filename string no
  name of the file for recording.

:proxy
proxy
yes
 
inbound broker proxy for audio data
 
:host
string
yes
 
host for inbound broker data, if proxies are not being used
 
:port
integer
yes
 
port for inbound broker data, if proxies are not being used
 
:call_id
string
yes
 
unique call ID for inbound broker data, if proxies are not being used

:output_filetype
string
yes
 
type of output file, if different from what the filename implies. See the SoX man page for legal values.
  :input_sample_rate integer yes   sample rate of brokered input
  :input_data_encoding string yes   data encoding of brokered input. See sox_play for values.
  :input_channels integer yes   channels in brokered input. See sox_play for values.
  :input_data_size integer yes   number of bits in each input sample. See sox_play for values.
  :volume float yes
volume change from input to output. See sox_play for values.

 
:verbose
N/A
yes
 
if present, SoX will provide verbose reports about what it's up to.
 
:swap N/A
yes   if this key is present, bytes will be swapped in sample sizes above 8 bits.
  :effects list of lists of strings yes   See sox_play for values.
 
:output_sample_rate
integer
yes
 
desired sample rate of output data, if different from the input.
 
:output_data_encoding
string
yes
 
desired data encoding for output data, if different from input. See :input_data_encoding for values.
 
:output_channels
integer
yes
 
desired number of channels for output data, if different from input. See :input_channels for values.
 
:output_data_size
integer
yes
 
desired size in bits of each sample in output data, if different from input. See :input_data_size for values.
OUT:
:output_filename
string
no  
identical to :output_filename in input frame.
 
:output_sample_rate
integer
no
 
sample rate of output data
 
:output_data_encoding
string
no
 
data encoding for output data. See :input_data_encoding for values.
 
:output_channels
integer
no
 
number of channels for output data. See :input_channels for values.
 
:output_data_size
integer
no
 
size in bits of each sample in output data. See :input_data_size for values.

This message returns a frame.

sox_convert maps from incoming to outgoing broker, modifying the audio data according to the specified audio formatting and processing parameters. Note that both the input and data will be raw, rather than in a file format; it's completely described by the sample rate, data encoding, sample size and number of channels. This dispatch function will attempt to return an error result if it encounters a processing problem.
 
  parameter type optional depends on description/constraints
IN:
:proxy
proxy
yes
 
proxy for inbound broker data
 
:host
string
yes
 
host for inbound broker data, if proxies are not being used
 
:port
integer
yes
 
port for inbound broker data, if proxies are not being used
 
:call_id
string
yes
 
unique call ID for inbound broker data, if proxies are not being used

:input_sample_rate integer yes   sample rate of brokered input
  :input_data_encoding string yes   data encoding of brokered input. See sox_play for values.
  :input_channels integer yes   channels in brokered input. See sox_play for values.
  :input_data_size integer yes   number of bits in each input sample. See sox_play for values.
  :volume float yes
volume change from input to output. See sox_play for values.

 
:verbose
N/A
yes
 
if present, SoX will provide verbose reports about what it's up to.
 
:swap N/A
yes   if this key is present, bytes will be swapped in sample sizes above 8 bits.
  :effects list of lists of strings yes   See sox_play for values.
 
:output_sample_rate
integer
yes
 
desired sample rate of output data, if different from the input.
 
:output_data_encoding
string
yes
 
desired data encoding for output data, if different from input. See :input_data_encoding for values.
 
:output_channels
integer
yes
 
desired number of channels for output data, if different from input. See :input_channels for values.
 
:output_data_size
integer
yes
 
desired size in bits of each sample in output data, if different from input. See :input_data_size for values.
OUT:
:proxy
proxy
yes
 
outbound broker proxy for audio data. Proxies are used if proxies were used for input.
 
:host
string
yes
 
if brokers were used for input, host of SoX server
 
:port
integer
yes
 
if brokers were used for input, port where audio is available
 
:call_id
string
yes
 
if brokers were used for input, unique ID of broker request
 
:output_sample_rate
integer
no
 
sample rate of brokered output data
 
:output_data_encoding
string
no
 
data encoding for brokered output data. See :input_data_encoding for values.
 
:output_channels
integer
no
 
number of channels for brokered output data. See :input_channels for values.
 
:output_data_size
integer
no
 
size in bits of each sample in brokered output data. See :input_data_size for values.

This message returns a frame.


Messages issued

None. All interactions are synchronous.


Known bugs

Two bugs related to the design of the SoX library are documented in the status section.


License / Documentation home / Help and feedback

Last updated October 2, 2002