Speech Recognition Attributes and Systems
Herein we describe our internal process to select a speech recognition system. First, identify the relevant speech recognition attributes. Then we map the current research speech recognition systems and commercial speech recognition systems according to the attribute list.
The following attributes were identified as interesting for speech recognition:
|
Name |
Description |
|
|
|
|
|||
|
Language Models | The recognizer can generate new language models, either from a large corpus of sentences (stochastic grammar) or from rules (finite-state grammar). |
|
|
|
N-GRAM Models | The recognizer can train language models with N-grams. |
|
|
|
New Acoustic Models | The recognizer can generate new acoustic models from speech waveforms. |
|
|
|
Telephone Speech Models | The recognizer has telephone speech models. |
|
|
|
Microphone Speech Models | The recognizer has microphone speech models. |
|
|
|
File Input | The recognizer can take audio samples from a file. This attribute helps with reply. |
|
|
|
Sentence output | The recognizer outputs the most likely sentence in a string. |
|
|
|
Likelihood (prob.) output | The recognizer outputs a likelihood (measure, probability, confidence) score. |
|
|
|
N-Best output | The recognizer outputs N-best with likelihood scores. |
|
|
|
Word lattice | The recognizer outputs a word lattice. |
|
|
|
Large vocabulary | The recognizer supports a large vocabulary
( 20K words) |
|
|
|
Medium vocabulary | The recognizer supports a medium vocabulary (~ 1 K words) |
|
|
|
Small vocabulary | The recognizer supports a small vocabulary
(< 1K words) |
|
|
|
Add new words | The recognizer allows new words to be added (dynamically or statically). |
|
|
|
Change grammar regions | The recognizer supports the dynamic changing of grammar regions. |
|
|
|
Phonetic Mapping | The recognizer has tools to convert orthographic spelling to phonetic spelling. |
|
|
|
Garbage, Noise, Junk | The recognizer has models for silence, noise, junk, and garbage. |
|
|
|
Client Server Architecture | The recognizer supports a client/server architecture meaning that the client gathers the audio samples and the server performs the recognizer. |
|
|
|
Recognizer has an API | The recognizer has an API. |
|
|
|
JSAPI | The recognizer supports JSAPI. |
|
|
|
SAPI | The recognizer supports SAPI. |
|
|
|
ECTF Compliance | The recognizer complies to ECTF S.100 speech recognition interface standards |
|
|
|
Galaxy Hub | The recognizer supports the DARPA Communicator Galaxy Hub. |
|
|
|
Develop Galaxy Hub | The recognizer has the development tools needed to make the system Hub-compliant. |
|
|
|
FAST | The recognizer is fast (.9*real-time). |
|
|
|
Communicator Member | The university or vendor is a member or an affiliate of the DARPA Communicator program. |
|
|
|
Multilingual Support | Supports other languages besides English. |
|
|
|
Customer Support | Customer support and/or research expertise available (and responsive) to MITRE |
|
|
|
Other Training Data | Does the recognizer accept data sources other than speech |
|
|
Research Speech Recognition Systems
We evaluate the following research systems against our speech recognition attribute criteria:
|
|
|
|
COLORADO |
|
|
(w/ tool) |
|
|
|
|
|
|
|
|
|
(w/ tool) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(w/ tool) |
|
|
|
|
|
|
|
|
|
Maybe** |
|
(planned) |
|
|
|
|
YES (limited) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
** Depends on the configuration.
COTS Speech Recognition Systems
We evaluate the following commercial off-the-shelf systems against our speech recognition attribute criteria:
The column entries are as follows:
|
|
|
|
|
IBM |
|
|
|
|
YES | |
|
|
|
|
NO | |
|
|
|
|
NO | |
|
|
|
|
NO | |
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
|
|
NO | |
|
|
|
|
NO | |
|
|
|
|
NO | |
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
YES | |||
|
|
|
|
NO | |
|
|
|
|
NO | |
|
|
|
|
NO | |
|
|
|
|
YES | |
|
|
) |
|
YES | |
|
|
|
|
YES | |
|
|
|
|
NO | |
|
|
|
|
NO | |
|
|
|
YES | ||
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
|
|
YES | |
|
|
|
YES |