Skip to content

sevagh/pitch-detection

Repository files navigation

pitch-detection

Autocorrelation-based C++ pitch detection algorithms withO(nlogn) or lowerrunning time:

The size of the FFT used is the same as the size of the input waveform, such that the output is a single pitch for the entire waveform.

Librosa (among other libraries) uses the STFT to createframesof the input waveform, and applies pitch tracking to each frame with a fixed FFT size (typically 2048 or some other power of two). If you want to track the temporal evolution of pitches in sub-sections of the waveform, you have to handle the waveform splitting yourself (look atwav_analyzerfor more details).

📯 Latest news 📰

Dec 27, 2023 🎅 release:

  • Removed SWIPE' algorithm
    • It is not based on autocorrelation, I skipped it in all of the tests, and my implementation was basically copy-pasted fromkylebgorman/swipe:just use their code instead!
  • Fix autocorrelation (in YIN and MPM) for power-of-two sizes in FFTS (seeffts issue #65) by using r2c/c2r transforms (addressesbug #72reported by jeychenne)
  • Fix PYIN bugs to pass all test cases (addresses jansommer's comments inpull-request #84)
  • Added many more unit tests, all passing (228/228)

Other programming languages

  • Go:Go implementation of YINin this repo (for tutorial purposes)
  • Rust:Rust implementation of MPMin this repo (for tutorial purposes)
  • Python:transcribeis a Python version of MPM for a proof-of-concept of primitive pitch transcription
  • Javascript (WebAssembly):pitchlitehas WASM modules of MPM/YIN running at realtime speeds in the browser, and also introduces sub-chunk detection to return the overall pitch of the chunk and the temporal sub-sequence of pitches within the chunk

Usage

Suggested usage of this library can be seen in the utilitywav_analyzerwhich divides a wav file into chunks of 0.01s and checks the pitch of each chunk. Sample output of wav_analyzer:

std::vector<float> chunk; // chunk of audio

float pitch_mpm = pitch::mpm(chunk, sample_rate);
float pitch_yin = pitch::yin(chunk, sample_rate);

Tests

Unit tests

There are unit tests that use sinewaves (both generated withstd::sinand withlibrosa.tone), and instrument tests using txt files containing waveform samples from theUniversity of Iowa MISrecordings:

$./build/pitch_tests
Running main() from./googletest/src/gtest_main.cc
[==========] Running 228 tests from 22 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from MpmSinewaveTestManualAllocFloat
[ RUN ] MpmSinewaveTestManualAllocFloat.OneAllocMultipleFreqFromFile
[ OK ] MpmSinewaveTestManualAllocFloat.OneAllocMultipleFreqFromFile (38 ms)
...
[----------] 5 tests from YinInstrumentTestFloat
...
[ RUN ] YinInstrumentTestFloat.Acoustic_E2_44100
[ OK ] YinInstrumentTestFloat.Acoustic_E2_44100 (1 ms)
[ RUN ] YinInstrumentTestFloat.Classical_FSharp4_48000
[ OK ] YinInstrumentTestFloat.Classical_FSharp4_48000 (58 ms)
[----------] 5 tests from YinInstrumentTestFloat (174 ms total)
...
[----------] 5 tests from MpmInstrumentTestFloat
[ RUN ] MpmInstrumentTestFloat.Violin_A4_44100
[ OK ] MpmInstrumentTestFloat.Violin_A4_44100 (61 ms)
[ RUN ] MpmInstrumentTestFloat.Piano_B4_44100
[ OK ] MpmInstrumentTestFloat.Piano_B4_44100 (24 ms)

...
[==========] 228 tests from 22 test suites ran. (2095 ms total)
[ PASSED ] 228 tests.

Degraded audio tests

All testing files arehere- the progressive degradations are described by the respective numbered JSON file, generated usingaudio-degradation-toolbox.The original clip is a Viola playing E3 from theUniversity of Iowa MIS.The results come from parsing the output of wav_analyzer to count how many 0.1s slices of the input clip were in the ballpark of the expected value of 164.81 - I considered anything 160-169 to be acceptable:

Degradation level MPM # correct YIN # correct
0 26 22
1 23 21
2 19 21
3 18 19
4 19 19
5 18 19

Build and install

You need Linux, cmake, and gcc (I don't officially support other platforms). The library depends onfftsandmlpack.The tests depend onlibnyquist,googletest,andgoogle benchmark.Dependency graph: dep-graph

Build and install with cmake:

cmake -S.-B build -DCMAKE_BUILD_TYPE=Release
cmake --build"build"

#install to your system
cdbuild&&make install

#run tests and benches
./build/pitch_tests
./build/pitch_bench

#run wav_analyzer
./build/wav_analyzer

Docker

To simplify the setup, there's aDockerfilethat sets up a Ubuntu container with all the dependencies for compiling the library and running the included tests and benchmarks:

#build
$ docker build --rm --pull -f"Dockerfile"-t pitchdetection:latest"."
$ docker run --rm --init -it pitchdetection:latest

n.b.You can pull theesimkowitz/pitchdetectionimage from DockerHub, but I can't promise that it's up-to-date.

Detailed usage

Read theheaderand the examplewav_analyzer program.

The namespaces arepitchandpitch_alloc.The functions and classes are templated for<double>and<float>support.

Thepitchnamespace functions perform automatic buffer allocation, whilepitch_alloc::{Yin, Mpm}give you a reusable object (useful for computing pitch for multiple uniformly-sized buffers):

#include<pitch_detection.h>

std::vector<double>audio_buffer(8192);

doublepitch_yin = pitch::yin<double>(audio_buffer,48000);
doublepitch_mpm = pitch::mpm<double>(audio_buffer,48000);
doublepitch_pyin = pitch::pyin<double>(audio_buffer,48000);
doublepitch_pmpm = pitch::pmpm<double>(audio_buffer,48000);

pitch_alloc::Mpm<double>ma(8192);
pitch_alloc::Yin<double>ya(8192);

for(inti =0;i <10000;++i) {
autopitch_yin = ya.pitch(audio_buffer,48000);
autopitch_mpm = ma.pitch(audio_buffer,48000);
autopitch_pyin = ya.probabilistic_pitch(audio_buffer,48000);
autopitch_pmpm = ma.probabilistic_pitch(audio_buffer,48000);
}