A virtual musical instrument inspired by the Theremin. You control the instrument's pitch by moving your thumb up and down in front of a webcam. Quick demo:
Motion tracking in this application is based on Mean Shift (OpenCV's Mean Shift tutorial was very useful for this.). The user's finger is tracked and its position on the screen is used to determine the pitch to be played. This logic can be summarized as follows:
- Record a color sample of the user's thumb (or any object for that matter) and calculate a histogram for it.
- Capture a camera frame, back-project the sample histogram onto it (see Back Projection) and obtain a confidence map that assigns each pixel a probability of belonging to the thumb.
- Detect the thumbs's current location by locally maximizing confidence values using Mean Shift.
- Calculate a pitch frequency according to the finger's position relative to the playing region and feed it to the audio back end to generate sound.
- Go back to 2.
As a final course project for a University of Buenos Aires (UBA) Computer Organization course, a motion tracking application using this very tracking algorithm was also implemented purely with Intel x86 Streaming SIMD Extensions. A substantial improvement in performance was achieved, with the SSE vectorial implementation running up to 20 to 30 times faster than different sequential C++ implementations. While the application is not currently limited by frame-to-frame performance, such improvements are interesting since they provide leeway for further processing of camera frame information, such as camera noise reduction.
The SSE implementation is available in this repository. A very detailed description of the algorithm, its re-implementation to fit the SIMD model and time benchmarking results are included in report-ES.pdf. This document is written in Spanish since it was presented as a lab report for the related school project.
This implementation allows for an easy replacement of the sound back end by creating a subclass of the abstract class SoundGenerator
. The new class defines its update
method which handles updating playback according to tracking information provided to it with each invocation.
Such a SoundGenerator
subclass, called RangeSoundGenerator
, is included as an example and is used by default in the master
branch. In each call to update
, it determines the frequency to be played out of a statically defined set of frequencies and explicitly produces an array of samples to be fed to the system through the PortAudio audio I/O API.
This is a very low-level approach that is mostly intended to allow for a proof of concept of the application. It is quite inflexible and leads to lesser audio quality.
A demo of the application running on this low-level back end is available here:
A higher-level audio back end implementation is possible through MIDI. In this approach, a SoundGenerator
subclass can take care of updating playback by writing MIDI notes into a port for a running MIDI back end to read and reproduce. A working implementation of this approach is available in the theremoog
branch.
The application demo shown in the first video, linked in the beginning of this README, is using such a back end. This is a result of a collaboration with the creators of the RaffoSynth, where their virtual Minimoog is running as a MIDI back end. In this form, this project was presented in the University of Buenos Aires (UBA) Computer Science Fair 2017.