Skip to content

sevagh/pitch-detection

Repository files navigation

pitch-detection

Autocorrelation-based C++ pitch detection algorithms with O(nlogn) or lower running time:

The size of the FFT used is the same as the size of the input waveform, such that the output is a single pitch for the entire waveform.

Librosa (among other libraries) uses the STFT to create frames of the input waveform, and applies pitch tracking to each frame with a fixed FFT size (typically 2048 or some other power of two). If you want to track the temporal evolution of pitches in sub-sections of the waveform, you have to handle the waveform splitting yourself (look at wav_analyzer for more details).

📯 Latest news 📰

Dec 27, 2023 🎅 release:

  • Removed SWIPE' algorithm
    • It is not based on autocorrelation, I skipped it in all of the tests, and my implementation was basically copy-pasted from kylebgorman/swipe: just use their code instead!
  • Fix autocorrelation (in YIN and MPM) for power-of-two sizes in FFTS (see ffts issue #65) by using r2c/c2r transforms (addresses bug #72 reported by jeychenne)
  • Fix PYIN bugs to pass all test cases (addresses jansommer's comments in pull-request #84)
  • Added many more unit tests, all passing (228/228)

Other programming languages

  • Go: Go implementation of YIN in this repo (for tutorial purposes)
  • Rust: Rust implementation of MPM in this repo (for tutorial purposes)
  • Python: transcribe is a Python version of MPM for a proof-of-concept of primitive pitch transcription
  • Javascript (WebAssembly): pitchlite has WASM modules of MPM/YIN running at realtime speeds in the browser, and also introduces sub-chunk detection to return the overall pitch of the chunk and the temporal sub-sequence of pitches within the chunk

Usage

Suggested usage of this library can be seen in the utility wav_analyzer which divides a wav file into chunks of 0.01s and checks the pitch of each chunk. Sample output of wav_analyzer:

std::vector<float> chunk; // chunk of audio

float pitch_mpm = pitch::mpm(chunk, sample_rate);
float pitch_yin = pitch::yin(chunk, sample_rate);

Tests

Unit tests

There are unit tests that use sinewaves (both generated with std::sin and with librosa.tone), and instrument tests using txt files containing waveform samples from the University of Iowa MIS recordings:

$ ./build/pitch_tests
Running main() from ./googletest/src/gtest_main.cc
[==========] Running 228 tests from 22 test suites.
[----------] Global test environment set-up.
[----------] 2 tests from MpmSinewaveTestManualAllocFloat
[ RUN      ] MpmSinewaveTestManualAllocFloat.OneAllocMultipleFreqFromFile
[       OK ] MpmSinewaveTestManualAllocFloat.OneAllocMultipleFreqFromFile (38 ms)
...
[----------] 5 tests from YinInstrumentTestFloat
...
[ RUN      ] YinInstrumentTestFloat.Acoustic_E2_44100
[       OK ] YinInstrumentTestFloat.Acoustic_E2_44100 (1 ms)
[ RUN      ] YinInstrumentTestFloat.Classical_FSharp4_48000
[       OK ] YinInstrumentTestFloat.Classical_FSharp4_48000 (58 ms)
[----------] 5 tests from YinInstrumentTestFloat (174 ms total)
...
[----------] 5 tests from MpmInstrumentTestFloat
[ RUN      ] MpmInstrumentTestFloat.Violin_A4_44100
[       OK ] MpmInstrumentTestFloat.Violin_A4_44100 (61 ms)
[ RUN      ] MpmInstrumentTestFloat.Piano_B4_44100
[       OK ] MpmInstrumentTestFloat.Piano_B4_44100 (24 ms)

...
[==========] 228 tests from 22 test suites ran. (2095 ms total)
[  PASSED  ] 228 tests.

Degraded audio tests

All testing files are here - the progressive degradations are described by the respective numbered JSON file, generated using audio-degradation-toolbox. The original clip is a Viola playing E3 from the University of Iowa MIS. The results come from parsing the output of wav_analyzer to count how many 0.1s slices of the input clip were in the ballpark of the expected value of 164.81 - I considered anything 160-169 to be acceptable:

Degradation level MPM # correct YIN # correct
0 26 22
1 23 21
2 19 21
3 18 19
4 19 19
5 18 19

Build and install

You need Linux, cmake, and gcc (I don't officially support other platforms). The library depends on ffts and mlpack. The tests depend on libnyquist, googletest, and google benchmark. Dependency graph: dep-graph

Build and install with cmake:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build "build"

# install to your system
cd build && make install

# run tests and benches 
./build/pitch_tests
./build/pitch_bench

# run wav_analyzer
./build/wav_analyzer

Docker

To simplify the setup, there's a Dockerfile that sets up a Ubuntu container with all the dependencies for compiling the library and running the included tests and benchmarks:

# build
$ docker build --rm --pull -f "Dockerfile" -t pitchdetection:latest "."
$ docker run --rm --init -it pitchdetection:latest

n.b. You can pull the esimkowitz/pitchdetection image from DockerHub, but I can't promise that it's up-to-date.

Detailed usage

Read the header and the example wav_analyzer program.

The namespaces are pitch and pitch_alloc. The functions and classes are templated for <double> and <float> support.

The pitch namespace functions perform automatic buffer allocation, while pitch_alloc::{Yin, Mpm} give you a reusable object (useful for computing pitch for multiple uniformly-sized buffers):

#include <pitch_detection.h>

std::vector<double> audio_buffer(8192);

double pitch_yin = pitch::yin<double>(audio_buffer, 48000);
double pitch_mpm = pitch::mpm<double>(audio_buffer, 48000);
double pitch_pyin = pitch::pyin<double>(audio_buffer, 48000);
double pitch_pmpm = pitch::pmpm<double>(audio_buffer, 48000);

pitch_alloc::Mpm<double> ma(8192);
pitch_alloc::Yin<double> ya(8192);

for (int i = 0; i < 10000; ++i) {
        auto pitch_yin = ya.pitch(audio_buffer, 48000);
        auto pitch_mpm = ma.pitch(audio_buffer, 48000);
        auto pitch_pyin = ya.probabilistic_pitch(audio_buffer, 48000);
        auto pitch_pmpm = ma.probabilistic_pitch(audio_buffer, 48000);
}