(DRAFT -- NOT FOR PUBLIC RELEASE)

Speech Recognition Attributes and Systems

Herein we describe our internal process to select a speech recognition system. First, identify the relevant speech recognition attributes.  Then we map the current research speech recognition systems and commercial speech recognition systems according to the attribute list.


Speech Recognition Attributes

The following attributes were identified as interesting for speech recognition:
 
 
 
 
 
Code
Attribute 
Name
Attribute 
Description
Importance
Research
COTS
LM
Language Models The recognizer can generate new language models, either from a large corpus of sentences (stochastic grammar) or from rules (finite-state grammar).
High
High
N-GRAM
N-GRAM  Models The recognizer can train language models with N-grams.
High
Medium
ACOUST
New Acoustic Models The recognizer can generate new acoustic models from speech waveforms.
Low
Low
PHONE
Telephone Speech Models The recognizer has telephone speech models.
High
High
MIC
Microphone Speech Models The recognizer has microphone speech models.
High
High
FILE
File Input The recognizer can take audio samples from a file.  This attribute helps with reply.
High
Medium
OUT-SENT
Sentence output The recognizer outputs the most likely sentence in a string.
High
High
OUT-LH
Likelihood (prob.) output The recognizer outputs a likelihood (measure, probability, confidence) score.
High
High
OUT-NBEST
N-Best output The recognizer outputs N-best with likelihood scores.
High
High
OUT-LATT
Word lattice The recognizer outputs a word lattice.
Medium
Low
VOC-L
Large vocabulary The recognizer supports a large vocabulary 
( 20K words)
Medium
Low
VOC-M
Medium vocabulary The recognizer supports a medium vocabulary (~ 1 K words)
High
High
VOC-S
Small vocabulary The recognizer supports a small vocabulary 
(< 1K words)
Low
Low
NEW-W
Add new words The recognizer allows new words to be added (dynamically or statically).
High
High
GRAM-DY
Change grammar regions The recognizer supports the dynamic changing of grammar regions.
Medium
Medium
PHONE-MP
Phonetic Mapping The recognizer has tools to convert orthographic spelling to phonetic spelling.
High
High
GAR
Garbage, Noise, Junk The recognizer has models for silence, noise, junk, and garbage.
High
Medium
CLIENT
Client Server Architecture The recognizer supports a client/server architecture meaning that the client gathers the audio samples and the server performs the recognizer.
High
High
API
Recognizer has an API The recognizer has an API.
High
High
JSAPI
 JSAPI The recognizer supports JSAPI.
Low
High
SAPI
SAPI The recognizer supports SAPI.
Low
Medium
ECTF
ECTF Compliance The recognizer complies to ECTF S.100 speech recognition interface standards 
Low
Medium
HUB
Galaxy Hub The recognizer supports the DARPA Communicator Galaxy Hub.
Low
Low
HUBDEV
Develop Galaxy Hub The recognizer has the development tools needed to make the system Hub-compliant.
High
High
FAST
FAST The recognizer is fast (.9*real-time).
Medium
Medium
MEMBER
Communicator Member The university or vendor is a member or an affiliate of the DARPA Communicator program.
High
Medium
MULTI-LING
Multilingual Support Supports other languages besides English.
Low
Low
CUST-SUPP
Customer Support Customer support and/or research expertise available (and responsive) to MITRE
High
High
DATA-OTH
Other Training Data Does the recognizer accept data sources other than speech
Low
Low


Research Speech Recognition Systems

We evaluate the following research systems against our speech recognition attribute criteria:

The column entries are as follows:
Code
MIT-Galaxy
SPHINX
MSU-ISIP
COLORADO
LM
YES
YES 
(w/ tool)
 YES
YES
N-GRAM
YES
YES
 YES
YES
ACOUST
YES (limited)
YES 
(w/ tool)
NO (planned)
YES
PHONE
YES
YES
NO (planned)
YES
MIC
YES
YES
 YES
YES
FILE
YES
YES
 YES
YES
OUT-SENT
YES
YES
 YES
YES
OUT-LH
YES
 YES
 YES
NO (planned)
OUT-NBEST
YES
 YES
 YES
YES
OUT-LATT
 
 YES
 YES
YES
VOC-L
YES
 YES
 YES
Maybe
(not tested)
VOC-M
YES
 YES
 YES
YES
VOC-S
YES
 YES
 YES
YES
NEW-W
 YES
 YES
 YES
YES
GRAM-DY
 NO
 YES
 YES
NO
PHONE-MP
YES
 YES
(w/ tool)
 YES
YES (limited)
GAR
YES (limited)
 YES
 YES
YES
CLIENT
YES
 Maybe**
 NO (planned)
NO (planned)
API
YES
 YES
YES
YES (limited)
JSAPI
 NO
 NO
NO
NO
SAPI
 NO
 NO
NO
NO
ECTF
 
 NO
NO
NO
HUB
 YES
 YES
NO
NO (planned)
HUBDEV
YES
 YES
YES
YES
FAST
YES
 YES
 YES
YES
MEMBER
YES
 YES
 YES
YES
MULTI-LING
YES
 YES
YES
NO
CUST-SUPP
YES (limited)
 YES
YES
YES
DATA-OTH
YES
 YES
YES
YES

** Depends on the configuration.


COTS Speech Recognition Systems

We evaluate the following commercial off-the-shelf systems against our speech recognition attribute criteria:


The column entries are as follows:

Code
NUANCE
Philips
L&H
SpeechWorks
IBM
LM
YES
 
YES
YES
YES
N-GRAM
 NO
 
NO
YES (bigrams only)
NO
ACOUST
 NO
 
NO
 YES
NO
PHONE
YES
 
YES
YES
NO
MIC
YES
 
NO
 NO
YES
FILE
YES
 
YES
 YES
YES
OUT-SENT
YES
 
YES
 YES
YES
OUT-LH
YES
 
YES (rejection)
 YES
NO
OUT-NBEST
YES
 
YES (unexplained asterisk)
 YES
NO
OUT-LATT
 NO
 
YES
 YES
NO
VOC-L
 YES
 
YES
YES
YES
VOC-M
YES
 
YES
YES
YES
VOC-S
YES
 
YES
YES
YES
NEW-W
YES
 
YES
 YES
YES
GRAM-DY
YES
      YES
PHONE-MP
 YES
 
YES
 YES
NO
GAR
 NO
 
YES
 YES
NO
CLIENT
YES
 
YES
 YES
NO
API
YES
 
YES
 YES (not publicly available)
YES
JSAPI
 NO
 
YES (application-
dependent)
 NO
YES
SAPI
 NO
 
YES
 NO
YES
ECTF
NO
 
YES
 NO (planned)
NO
HUB
 NO
 
NO
 NO
NO
HUBDEV
YES
 
YES
  YES
FAST
YES
 
YES
 YES
YES
MEMBER
 NO
 
NO
 YES
YES
MULTI-LING
 YES
 
YES
YES
YES
CUST-SUPP
YES (limited)
 
YES
 
YES
DATA-OTH
 NO
 
NO
  YES

 
 
 
 
 
 


License / Documentation home / Help and feedback

Last updated February 10, 2000.