CMU Sphinx

Pocketsphinx
Stable release	5-prealpha / August 5, 2015;9 years ago
Written in	C
Operating system	Cross-platform
Type	Image library
License	BSD-style
Website	cmusphinx.github.io/wiki/

Sphinx4
Stable release	5-prealpha / August 3, 2015;9 years ago
Written in	Java
Operating system	Cross-platform
Type	Image library
License	BSD-style
Website	cmusphinx.github.io/wiki/

CMU Sphinx,also called Sphinx for short, is the general term to describe a group ofspeech recognitionsystems developed atCarnegie Mellon University.These include a series of speech recognizers (Sphinx 2 - 4) and anacoustic modeltrainer (SphinxTrain).

In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training,language modelcompilation and apublic domainpronunciation dictionary,cmudict.

Sphinxencompasses a number of software systems, described below.

Sphinx

Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models (HMMs) and ann-gramstatistical language model. It was developed byKai-Fu Lee.Sphinx featured feasibility of continuous-speech, speaker-independent large-vocabulary recognition, the possibility of which was in dispute at the time (1986). Sphinx is of historical interest only; it has been superseded in performance by subsequent versions. An archival article^[2]describes the system in detail.

Sphinx 2

A fast performance-oriented recognizer, originally developed byXuedong Huangat Carnegie Mellon and released asopen-sourcewith aBSD-style license onSourceForgebyKevin Lenzoat LinuxWorld in 2000. Sphinx 2 focuses on real-time recognition suitable for spoken language applications. As such it incorporates functionality such as end-pointing, partial hypothesis generation, dynamic language model switching and so on. It is used in dialog systems and language learning systems. It can be used in computer based PBX systems such asAsterisk.Sphinx 2 code has also been incorporated into a number of commercial products. It is no longer under active development (other than for routine maintenance). Current real-time decoder development is taking place in thePocket Sphinxproject. An archival article^[3]describes the system.

Sphinx 3

Sphinx 2 used asemi-continuousrepresentation for acoustic modeling (i.e., a single set of Gaussians is used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx 3 adopted the prevalentcontinuousHMM representation and has been used primarily for high-accuracy, non-real-time recognition. Recent developments (in algorithms and in hardware) have made Sphinx 3 "near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active development and in conjunction with SphinxTrain provides access to a number of modern modeling techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article onSpeech Recognitionfor descriptions of these techniques).

Sphinx 4

Sphinx 4 is a complete rewrite of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition, written entirely in the Java programming language.Sun Microsystemssupported the development of Sphinx 4 and contributed software engineering expertise to the project. Participants included individuals at MERL,MITandCMU.(Currently supported languages are C, C++, C#, Python, Ruby, Java, and JavaScript.)

Current development goals include:

developing a new (acoustic model) trainer
implementing speaker adaptation (e.g. MLLR)
improving configuration management
creating agraph-based UIfor graphical system design

PocketSphinx

A version of Sphinx that can be used in embedded systems (e.g., based on anARMprocessor). PocketSphinx is under active development and incorporates features such as fixed-point arithmetic and efficient algorithms forGMMcomputation.

References

External links

Sphinx developers recommend Vosk now
CMU Sphinx homepage
Sphinx' repositoryon GitHub should be considered the definitive source for code
SourceForgehosts older releases and files
NeXT on Campus Fall 1990(This document is postscript format compressed with gzip.)Carnegie Mellon University - Breakthroughs in speech recognition and document management,pgs. 12-13

[1] ttp://www.speech.cs.cmu.edu/sphinx

[article-2] "lee_k_f_1990_1.pdf"(PDF).

[huang1992-3] "huang92sphinxii.pdf"(PDF).

[1]

[2]

[3]