Speex

libspeex
Developer(s)	Xiph.Org Foundation,Jean-Marc Valin
Initial release	1.0 / March 2003
Stable release	1.2.1 / June 16, 2022;2 years ago
Repository	gitlab.xiph.org/xiph/speex.git;
Operating system	Cross-platform
Type	Audio codec,reference implementation
License	BSD-style license
Website	Xiph.org downloads

Speex
Filename extension	.spx
Internet media type	audio/x-speex, audio/speex, audio/ogg
Developed by	Xiph.Org Foundation,Jean-Marc Valin
Type of format	Lossyaudio
Contained by	Ogg
Standard	RFC 5574
Open format?	Yes
Website	www.speex.org

Speexis anaudio compression codecspecifically tuned for the reproduction of human speech and also afree software speech codecthat may be used onvoice over IPapplications andpodcasts.^[6]It is based on thecode excited linear predictionspeech coding algorithm.^[7]Its creators claim Speex to be free of anypatentrestrictions and it is licensed under the revised (3-clause)BSD license.It may be used with theOgg container formator directly transmitted overUDP/RTP.It may also be used with theFLVcontainer format.^[8]

The Speex designers see their project as complementary to theVorbisgeneral-purposeaudio compressionproject.

Speex is alossyformat,i.e.quality is permanently degraded to reduce file size.

The Speex project was created on February 13, 2002.^[9]The first development versions of Speex were released underLGPLlicense, but as of version 1.0 beta 1, Speex is released under Xiph's version of the (revised) BSD license.^[10]Speex 1.0 was announced on March 24, 2003, after a year of development.^[11]The last stable version of Speex encoder and decoder is 1.2.1.^[3]

Xiph.Org now considers Speex obsolete; its successor is the more modernOpuscodec, which uses theSILKformat under license fromMicrosoftand surpasses its performance in most areas except at the lowest sample rates.^[12]

Description[edit]

Speex is targeted atvoice over IP(VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband (32kHz sampling rate),wideband(16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate). Since Speex was designed for VoIP instead of cell phone use, the codec must be robust to lost packets, but not to corrupted ones. All this led to the choice ofcode excited linear prediction(CELP) as the encoding technique to use for Speex.^[7]One of the main reasons is that CELP has long proven that it could do the job and scale well to both lowbit rates(as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as withG.728@ 16 kbit/s). The main characteristics can be summarized as follows:

Free software/open-source,patentandroyalty-free.
Integration of narrowband and wideband in the same bit-stream.
Wide range of bit rates available (from 2 kbit/s to 44 kbit/s).
Dynamic bit rate switching andvariable bit-rate(VBR).
Voice activity detection(VAD, integrated with VBR) (not working from version 1.2).
Variable complexity.
Ultra-wideband mode at 32 kHz (up to 48 kHz).
Intensity stereoencoding option.

Features[edit]

Sampling rate: Speex is mainly designed for three different sampling rates: 8 kHz (the same sampling rate to transmittelephonecalls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband.
Quality: Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is aninteger,while for variable bit-rate (VBR), the parameter is a real (floating point) number.
Complexity (variable): With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way similar to the -1 to -9 options togzip compressionutilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but theCPUrequirements for complexity 10 is about five times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4,^[13]though higher settings are often useful when encoding non-speech sounds likeDTMFtones, or if encoding is not in real-time.
Variable bit-rate(VBR): Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds likevowelsand high-energytransientsrequire a higher bit rate to achieve good quality, whilefricatives(e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has three main drawbacks: first, by only specifying quality, there is no guarantee about the final average bit-rate. Second, for some real-time applications likevoice over IP(VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel. Third, encryption of VBR-encoded speech may not ensure complete privacy, as phrases can still be identified, at least in a controlled setting with a small dictionary of phrases,^[14]by analysing the pattern of variation of the bit rate.
Average bit-rate (ABR): Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate.
Voice Activity Detection(VAD): When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called "comfort noisegeneration "(CNG). Last version VAD was working fine is 1.1.12, since v 1.2 it has been replaced with simple Any Activity Detection.
Discontinuous transmission(DTX): Discontinuous transmission is an addition to VAD/VBR operation which allows ceasing transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s).
Perceptual enhancement: Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement).
Algorithmic delay: Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values do not account for the CPU time it takes to encode or decode the frames.

Applications[edit]

There are a large base of applications supporting the Speex codec. Examples include:

Streamingapplications liketeleconference(e.g.TeamSpeak,Mumble)
VoIP systems (e.g.Asterisk)
Videogames (e.g.Xbox Live,^[15]Civilization 4,DropMixvocal tracks,...)
Audio processing applications.

Most of these are based on theDirectShowfilter or OpenACM codec (e.g.Microsoft NetMeeting) onMicrosoft Windows,or Xiph.org's reference implementation, libvorbis, onLinux(e.g.Ekiga). There are alsopluginsfor many audio players. See the plugin and software page on the speex.org site for more details.^[16]

The media type for Speex is audio/ogg while contained by Ogg, and audio/speex (previously audio/x-speex) when transported throughRTPor without container.

TheUnited States Army'sLand Warriorsystem, designed byGeneral Dynamics,also uses Speex for VoIP on anEPLRSradio designed byRaytheon.

The Ear Bible^[17]is a single-ear headphone with a built-in Speex player with 1 GB of flash memory,^[18]preloaded with a recording of theNew American Standard Bible.

ASL Safety & Security's^[19]Linux based VIPA OS software^[20]which is used in long line public address systems andvoice alarm systemsat major international air transport hubs and rail networks.

TheRockboxproject uses Speex for its voice interface. It can also play Speex files on supported players, such as the Apple iPod or the iRiver H10.

The Vernier LabQuest^[21]handheld data acquisition device for science education uses Speex for voice annotations created by students and teachers using either the built-in or an external microphone.

The Google Mobile App foriPhonecurrently incorporates Speex.^[22]It has also been suggested that the newGooglevoice searchiPhoneapp is using Speex to transmit voice to Google servers for interpretation.^[23]

AdobeFlash Playersupports Speex starting with Flash Player 10.0.12.36, released in October 2008.^[24]Because of some bugs in Flash Player, the first recommended version for Speex support is 10.0.22.87 and later. Speex in Flash Player can be used for both kind of communication, throughFlash Media ServerorP2P.Speex can be decoded or converted to any format unlikeNellymoseraudio, which was the only speech format in previous versions of Flash Player.^[25]^[26]Speex can be also used in theFlash Video container format(.flv), starting with version 10 of Video File Format Specification (published in November 2008).^[27]

The JavaSonics ListenUp^[28]voice recorder uses Speex to compress voice messages that are recorded in a browser and then uploaded to a web server. Primary applications are language training, transcription and social networking.

Speex is used as the voice compression algorithm in theSirivoice assistance on theiPhone 4S.^[29]Since text-to-speech occurs on Apple's servers, the Speex codec is used to minimize network bandwidth.

Sources[edit]

References[edit]

^"PlayOgg! - FSF - Free Software Foundation".2010-03-17.Retrieved2013-10-01.
^Jean-Marc Valin (2009)."people.xiph.org - personal webspace of the xiphs - Jean-Marc Valin".Xiph.Org.Retrieved2009-09-11.
^^a ^b"Speex News".Xiph.Org Foundation.Retrieved2023-04-13.
^"The Speex Codec Manual - Speex License".Xiph.Org Foundation.Retrieved2009-09-01.
^"Sample Xiph.Org Variant of the BSD License".Xiph.Org Foundation.Retrieved2009-08-29.
^Xiph.OrgSpeex: A Free Codec For Free Speech,Retrieved 2009-09-01
^^a ^bXiph.OrgIntroduction to CELP Coding,Retrieved 2009-09-01
^AdobeFLV format specification,retrieved 2016-04-18
^Xiph.orgSpeex releases - pre-1.0 - NEWS and ChangeLog in speex-0.0.1.tar.gz,Retrieved 2009-09-01
^Xiph.OrgSpeex FAQ – Under what license is Speex released?,Retrieved 2009-09-01
^Xiph.Org (2003-03-24)Speex reaches 1.0; Xiph.Org now a 501(c)(3) Non-Profit Organization,Retrieved 2009-09-01
^[1]Speex homepage, retrieved 2017-04-11
^"Codec description".www.speex.org.
^"Spot me if you can: Uncovering Spoken Phrases in Encrypted VoIP Conversations (Charles V. Wright Lucas Ballard Scott E. Coull Fabian Monrose Gerald M. Masson)"(PDF).
^As announced by Ralph Giles, theTheoracodec maintainer, onLugRadio episode 29
^"A free codec for free speech".Speex.Retrieved2012-12-29.
^Lascelles, LLC."The worlds most convenient Audio Bible".Ear Bible.Retrieved2012-12-29.
^Lascelles, LLC."Support".Ear Bible.Retrieved2012-12-29.
^"PA/VA, PSIM Software and Station Management Systems > ASL Safety & Security".Asl-control.co.uk.Retrieved2012-12-29.
^IPAM 400: IP Based Intelligent Public Address Amplifier Archived2011-09-04 at theWayback Machine- User Manual
^"LabQuest 2 > Vernier Software & Technology".Vernier.com. 2012-05-23.Retrieved2012-12-29.
^"Legal Notices".Google Inc.Retrieved2014-12-05.
^Baio, Andy (November 18, 2008)."Deconstructing Google Mobile's Voice Search on the iPhone".
^Adobe (2008)Flash Player 10 Datasheet,Retrieved 2009-09-01
^AskMeFlash.com (2009-05-10)Speex for Flash,Retrieved on 2009-08-12
^AskMeFlash.com (2009-05-10)Speex vs Nellymoser Archived2009-04-15 at theWayback Machine,Retrieved on 2009-08-12
^Adobe Systems Incorporated (November 2008)."Video File Format Specification, Version 10"(PDF).Adobe Systems Incorporated. Archived fromthe original(PDF)on 2010-09-23.Retrieved2014-12-05.
^Phil Burk."JavaSonics ListenUp voice recording Applet for Java that uploads messages to a web server".Javasonics.com.Retrieved2012-12-29.
^"Applidium — News".Applidium.com. Archived fromthe originalon 2011-11-16.Retrieved2012-12-29.

External links[edit]

RFC5574– RTP Payload Format for the Speex Codec
Official Speex homepage
Plugin & software page
JSpeex is a port of Speex to the Java platform
NSpeex is a port of Speex to the.NET platform and Silverlight based on JSpeex Archived2011-12-03 at theWayback Machine
CSpeex is a port of Speex to the.NET platform based on JSpeex Archived2009-12-13 at theWayback Machine
RFC5334– Ogg Media Types
[2]Archived2013-12-21 at theWayback Machine- Speex Encoder Player (César MBUMBA)

[1] "PlayOgg! - FSF - Free Software Foundation".2010-03-17.Retrieved2013-10-01.

[2] Jean-Marc Valin (2009)."people.xiph.org - personal webspace of the xiphs - Jean-Marc Valin".Xiph.Org.Retrieved2009-09-11.

[news-3] "Speex News".Xiph.Org Foundation.Retrieved2023-04-13.

[4] "The Speex Codec Manual - Speex License".Xiph.Org Foundation.Retrieved2009-09-01.

[5] "Sample Xiph.Org Variant of the BSD License".Xiph.Org Foundation.Retrieved2009-08-29.

[6] Xiph.OrgSpeex: A Free Codec For Free Speech,Retrieved 2009-09-01

[celp-7] Xiph.OrgIntroduction to CELP Coding,Retrieved 2009-09-01

[8] AdobeFLV format specification,retrieved 2016-04-18

[9] Xiph.orgSpeex releases - pre-1.0 - NEWS and ChangeLog in speex-0.0.1.tar.gz,Retrieved 2009-09-01

[10] Xiph.OrgSpeex FAQ – Under what license is Speex released?,Retrieved 2009-09-01

[release1_0-11] Xiph.Org (2003-03-24)Speex reaches 1.0; Xiph.Org now a 501(c)(3) Non-Profit Organization,Retrieved 2009-09-01

[12] [1]Speex homepage, retrieved 2017-04-11

[13] "Codec description".www.speex.org.

[14] "Spot me if you can: Uncovering Spoken Phrases in Encrypted VoIP Conversations (Charles V. Wright Lucas Ballard Scott E. Coull Fabian Monrose Gerald M. Masson)"(PDF).

[xbox-15] As announced by Ralph Giles, theTheoracodec maintainer, onLugRadio episode 29

[16] "A free codec for free speech".Speex.Retrieved2012-12-29.

[17] Lascelles, LLC."The worlds most convenient Audio Bible".Ear Bible.Retrieved2012-12-29.

[earbible-18] Lascelles, LLC."Support".Ear Bible.Retrieved2012-12-29.

[19] "PA/VA, PSIM Software and Station Management Systems > ASL Safety & Security".Asl-control.co.uk.Retrieved2012-12-29.

[20] IPAM 400: IP Based Intelligent Public Address Amplifier Archived2011-09-04 at theWayback Machine- User Manual

[21] "LabQuest 2 > Vernier Software & Technology".Vernier.com. 2012-05-23.Retrieved2012-12-29.

[22] "Legal Notices".Google Inc.Retrieved2014-12-05.

[23] Baio, Andy (November 18, 2008)."Deconstructing Google Mobile's Voice Search on the iPhone".

[24] Adobe (2008)Flash Player 10 Datasheet,Retrieved 2009-09-01

[25] AskMeFlash.com (2009-05-10)Speex for Flash,Retrieved on 2009-08-12

[26] AskMeFlash.com (2009-05-10)Speex vs Nellymoser Archived2009-04-15 at theWayback Machine,Retrieved on 2009-08-12

[FLV-F4V-27] Adobe Systems Incorporated (November 2008)."Video File Format Specification, Version 10"(PDF).Adobe Systems Incorporated. Archived fromthe original(PDF)on 2010-09-23.Retrieved2014-12-05.

[28] Phil Burk."JavaSonics ListenUp voice recording Applet for Java that uploads messages to a web server".Javasonics.com.Retrieved2012-12-29.

[29] "Applidium — News".Applidium.com. Archived fromthe originalon 2011-11-16.Retrieved2012-12-29.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

v t e Xiph.Org Foundation
Ogg Projectcodecs	Vorbis Daala Theora FLAC Opus CELT Speex OggPCM Ogg Writ XSPF Annodex
Media tools	cdparanoia Icecast Tremor
Related articles	Chris Montgomery CMML Ogg page Ogg Squish Use of Ogg formats in HTML5 Vorbis comment List of open-source codecs

v t e GNU Project
History	GNU Manifesto Free Software Foundation Europe India Latin America History of free software
Licenses	GNU General Public License linking exception font exception GNU Lesser General Public License GNU Affero General Public License GNU Free Documentation License
Software	GNU(variants) Hurd Linux-libre glibc Bash coreutils findutils Build system GCC binutils GDB GRUB GNUstep GIMP Jami GNU Emacs GNU TeXmacs GNU Octave GNU Taler GNU R GSL GMP GNU Electric GNU Archimedes GNUnet GNU Privacy Guard Gnuzilla(IceCat) GNU Health GNUmed GNU LilyPond GNU Go GNU Chess Gnash Guix more...
Contributors	Alexandre Oliva Benjamin Mako Hill Bradley M. Kuhn Brian Fox Federico Heinz Georg C. F. Greve John Sullivan Nagarjuna G. Richard M. Stallman
Other topics	GNU/Linux naming controversy Revolution OS Free Software Foundation anti-Windows campaigns Defective by Design