Skip to content

groupmm/real_time_swipe

Repository files navigation

Real-Time SWIPE

This repository contains code examples and applications that accompany the following paper:

  1. 📘 Peter Meier, Sebastian Strahl, Simon Schwär, Meinard Müller, and Stefan Balke
    Pitch Estimation in Real Time: Revisiting SWIPE with Causal Windowing
    In Proceedings of the International Symposium on Computer Music Multidisciplinary Research (CMMR), 2025, Accepted.
@inproceedings{MeierSSMB25_RealTimeSWIPE_CMMR,
    author    = {Peter Meier and Sebastian Strahl and Simon Schw{\"a}r and Meinard M{\"u}ller and Stefan Balke},
    title     = {Pitch Estimation in Real Time: Revisiting {SWIPE} With Causal Windowing},
    booktitle = {Proceedings of the International Symposium on Computer Music Multidisciplinary Research ({CMMR})},
    address   = {London, UK},
    year      = {2025, Accepted}
}
  1. 📘 Peter Meier, Meinard Müller, and Stefan Balke
    A Multi-User Interface for Real-Time Intonation Monitoring in Music Ensembles
    In Proceedings of the Workshop for Innovative Computer-Based Music Interfaces (ICMI): 1–5, 2025.
@inproceedings{MeierMB25_IntonationMonitoring_ICMI,
    author       = {Peter Meier and Meinard M{\"u}ller and Stefan Balke},
    title        = {A Multi-User Interface for Real-Time Intonation Monitoring in Music Ensembles},
    booktitle    = {Proceedings of the Workshop for Innovative Computer-Based Music Interfaces ({ICMI})},
    address      = {Chemnitz, Germany},
    doi          = "10.18420/muc2025-mci-ws06-202",
    howpublished = "Mensch und Computer 2025 - Workshopband",
    publisher    = "Gesellschaft für Informatik e.V.",
    year         = {2025},
    pages        = {1--5},
}
  1. 📘 Peter Meier, Simon Schwär, Gerhard Krump, and Meinard Müller
    Evaluating Real-Time Pitch Estimation Algorithms for Creative Music Game Interaction
    In: INFORMATIK 2023 — Designing Futures: Zukünfte gestalten, Gesellschaft für Informatik e.V.: 873–882, 2023.
@incollection{MeierSKM23_EvaluatingPitchGame_GI,
    author    = {Peter Meier and Simon Schw{\"a}r and Gerhard Krump and Meinard M{\"u}ller},
    title     = {Evaluating Real-Time Pitch Estimation Algorithms for Creative Music Game Interaction},
    booktitle = {INFORMATIK 2023 -- Designing Futures: Zuk{\"u}nfte gestalten},
    publisher = {Gesellschaft f{\"u}r Informatik e.V.},
    address   = {Bonn, Germany},
    year      = {2023},
    doi       = {10.18420/inf2023_97},
    pages     = {873--882}
}

💻 Install

This project requires Python 3.12 or higher to be installed on your system. Please ensure you have the correct version before proceeding with installation and execution of the code. You can download Python from the official Python website or use a package manager suitable for your operating system.

To verify your Python version, run the following command in your terminal or command prompt:

python --version

Install uv Package Manager

This project uses uv for fast and reliable Python package management. If you don't have uv installed, please visit the official uv installation guide for installation instructions for your operating system.

Install Python Environment

Navigate to the project directory and install the basic dependencies:

cd rtswipe
uv sync

This command will:

  • Create a virtual environment in .venv/.
  • Install the core dependencies (numpy, resampy, soundfile) used by the rtswipe module.

Install Application Dependencies

For specific applications, you'll need to install additional dependency groups:

  • For the cli.py applications (pitch estimation, OSC server):
uv sync --group cli
  • For the game.py pitch game (pygame):
uv sync --group game
  • For the web.py applications (FastAPI server, Websockets):
uv sync --group web
  • For development (testing, linting, Jupyter notebooks):
uv sync --group dev

Note: The Python package sounddevice requires Portaudio installed on your system to capture audio from a microphone.

Install All Application Dependencies

To install all dependencies for the module and all applications, run:

uuv sync --all-groups

📂 Content

Module: rtswipe

The rtswipe module provides a real-time pitch estimation algorithm based on the SWIPE method with causal windowing. It can process audio streams frame-by-frame and return pitch estimates with confidence values.

Real-Time Processing Example

import numpy as np
from rtswipe import RTSwipe

# Initialize the RTSwipe estimator
swipe = RTSwipe(
    fs=22050,          # Sample rate
    hop_len=256,       # Hop length (frame size)
    f_min=55.0,        # Minimum frequency (Hz)
    f_max=1760.0,      # Maximum frequency (Hz)
    num_channels=1,    # Number of audio channels
    delay=0.0          # Delay factor (0 = no delay, 1 = maximum delay)
)

# Process audio frames (shape: hop_len x num_channels)
audio_frame = np.random.randn(256, 1)  # Example audio frame
freqs, confs = swipe(audio_frame)

💡 How to use the rtswipe module inside an audio callback function is demonstrated in the cli.py, game.py, and web.py applications described below.

Application 1: cli.py Pitch Command-Line Interface

Make sure you have the cli dependencies installed:

uv sync --group cli

Getting help:

uv run src/cli/cli.py --help
uv run src/cli/cli.py --help
usage: cli.py [-h] [-l] [--device ID] [--channel NUMBER] [--samplerate FS] [--blocksize SAMPLES] [--freq FMIN FMAX] [--ip IP] [--port PORT]

Pitch (C)ommand (L)ine (I)nterface.

options:
    -h, --help           show this help message and exit
    -l, --list-devices   show list of audio devices and exit
    --device ID          (None) device id for sounddevice input
    --channel NUMBER     (1) channel number for sounddevice input
    --samplerate FS      (44100) samplerate for sounddevice
    --blocksize SAMPLES  (512) blocksize for sounddevice
    --freq FMIN FMAX     ([55, 1760]) pitch range in Hz
    --ip IP              (0.0.0.0) ip address for OSC client
    --port PORT          (5005) port for OSC client

Show list of audio devices and exit:

uv run src/cli/cli.py -l

Start the application listening to your system's default input sound device:

uv run src/cli/cli.py

Terminal output example:

OSC to 0.0.0.0:5005: time=20:23:06.972 freqs=[941.63, 941.63] confs=[0.6, 0.6]
OSC to 0.0.0.0:5005: time=20:23:06.994 freqs=[945.89, 945.89] confs=[0.6, 0.6]
OSC to 0.0.0.0:5005: time=20:23:07.004 freqs=[973.61, 973.61] confs=[0.56, 0.56]
OSC to 0.0.0.0:5005: time=20:23:07.015 freqs=[975.36, 975.36] confs=[0.59, 0.59]
OSC to 0.0.0.0:5005: time=20:23:07.026 freqs=[996.72, 996.72] confs=[0.55, 0.55]
OSC to 0.0.0.0:5005: time=20:23:07.036 freqs=[1030.57, 1030.57] confs=[0.58, 0.58]
OSC to 0.0.0.0:5005: time=20:23:07.047 freqs=[1102.74, 1102.74] confs=[0.6, 0.6]
OSC to 0.0.0.0:5005: time=20:23:07.058 freqs=[1098.77, 1098.77] confs=[0.46, 0.46]
OSC to 0.0.0.0:5005: time=20:23:07.068 freqs=[1188.52, 1188.52] confs=[0.59, 0.59]
OSC to 0.0.0.0:5005: time=20:23:07.078 freqs=[1220.04, 1220.04] confs=[0.59, 0.59]
  • The freqs and confs lists provide a snapshot of the audio analysis for each audio frame for each audio channel.
  • This data is both displayed in the terminal and transmitted as OSC messages, enabling integration with other systems or applications that support OSC for further processing or visualization.

Application 2: game.py Pich Game

The game.py application is an interactive pitch-based game where players control a character not only with traditional controllers but also with their singing voice.

For detailed information and gameplay examples, please refer to the Dagstuhl Music Game Experiment Website.

game.py

Make sure you have the game dependencies installed:

uv sync --group game

Start the game:

uv run src/game/game.py

Getting help:

uv run src/game/game.py --help
usage: game.py [-h] [-l] [--device ID] [--channel NUMBER] [--samplerate FS] [--blocksize SAMPLES] [--freq FMIN FMAX] [--rootnote NUMBER] [--user ID] [--level ID]

Pitch Game: Sing Your Way

options:
  -h, --help           show this help message and exit
  -l, --list-devices   show list of audio devices and exit
  --device ID          (None) device id for sounddevice input
  --channel NUMBER     (1) channel number for sounddevice input
  --samplerate FS      (48000) samplerate for sounddevice
  --blocksize SAMPLES  (512) blocksize for sounddevice
  --freq FMIN FMAX     ([55, 1760]) pitch range in Hz
  --rootnote NUMBER    (48) MIDI note number for root note in game level
  --user ID            (0) ID number of the user
  --level ID           (0) ID number of the game level
  • The game listens to your system's default input sound device or you can specify a different device with --device ID.
  • This game implementation is designed to run experiments with multiple users, each having their own game level and pitch range.
  • The --rootnote NUMBER argument specifies the MIDI note number for the root note of the game level (default: 48). Singing this pitch will make a pitch line appear just above the ground level in the game.
  • For experiments involving multiple users, you can set the user ID with --user ID (default: 0) and the game level with --level ID (default: 0).
  • Starting a game automatically saves game information (user ID, level ID, timestamp, audio, events, pitchdata) in the src/game/users directory for later analysis.
  • This repository includes levels 0 to 5 from the Dagstuhl Music Gaming Experiment. For more details, visit Dagstuhl Music Game Experiment Website.
  • You can also create your own levels or modify existing ones using the Tiled Map Editor.

Game Controls

The game supports multiple input methods:

⌨️ Keyboard Controls
  • Left/Right Arrow Keys: Move character left/right
  • Up Arrow: Fly mode (the vertical position is controlled by the pitch you sing)
  • Down Arrow: Delete note block (while landing on it)
  • Space: Jump
  • F: Toggle mute (silence note block)
  • D: Toggle drone sound (this plays the root note you set with --rootnote as a reference pitch)
🎮 Gamepad Controls
  • D-Pad Left/Right (buttons 13/14): Move character left/right
  • D-Pad Up (button 11): Fly mode (the vertical position is controlled by the pitch you sing)
  • D-Pad Down (button 12): Delete note block (while landing on it)
  • Button A (button 0): Jump
  • Button B (button 1): Toggle mute (silence note block)
  • Button X (button 2): Toggle drone sound (this plays the root note you set with --rootnote as a reference pitch)
  • Button Y (button 3): Fly mode (the vertical position is controlled by the pitch you sing)
🎤 Voice Controls
  • Singing: A pitch line appears on screen showing your current vocal pitch relative to the game world.
  • The --rootnote parameter sets the reference pitch (default: MIDI note 48 = C4) with which the pitch line aligns to the bottom of the game world.
  • Fly Mode: Control the character's vertical position by singing different pitches.
  • Singing higher pitches moves the character up, lower pitches move down.

Game Assets

The game assets for game.py are from the "Pixel Adventure" and Pixel Adventure 2 by Pixel Frog and are licensed under Creative Commons CC0. Thank you!

Application 3: web.py Intonation Monitoring Web Application

The web application provides a web-based visualization interface for multi-channel pitch estimation and real-time intonation monitoring using FastAPI and WebSockets.

Conductor View Single View

Make sure you have the web dependencies installed:

uv sync --group web

Getting help:

uv run src/web/web.py --help
uv run src/web/web.py --help
usage: web.py [-h] [-l] [--device ID_OR_NAME] [--channel NUMBER_OR_RANGE] [--samplerate FS] [--blocksize SAMPLES] [--freq FMIN FMAX] [--ip IP] [--port PORT] [--tuning TUNING] [--channel-names NAME [NAME ...]] [--smoothing FRAMES] [--gate DBFS] [--websocket-fps FPS]

Pitch (C)ommand (L)ine (I)nterface.

options:
  -h, --help            show this help message and exit
  -l, --list-devices    show list of audio devices and exit
  --device ID_OR_NAME   device id (int) or name (str) for sounddevice input
  --channel NUMBER_OR_RANGE
                        channel number or range for sounddevice input (e.g. 1 or 9-12)
  --samplerate FS       (44100) samplerate for sounddevice
  --blocksize SAMPLES   (512) blocksize for sounddevice
  --freq FMIN FMAX      ([55, 1760]) pitch range in Hz
  --ip IP               (0.0.0.0) ip address for OSC client
  --port PORT           (5005) port for OSC client
  --tuning TUNING       (440.0) tuning reference frequency in Hz
  --channel-names NAME [NAME ...]
                        (None) channel names (e.g. --channel-names A B C D)
  --smoothing FRAMES    (15) smoothing window size in frames for moving average
  --gate DBFS           (-60.0) gate threshold in dBFS; channels below this RMS level have confidence set to zero
  --websocket-fps FPS   (30) WebSocket update rate in frames per second; lower values reduce network traffic

Start the web server with default parameters:

uv run src/web/web.py
Pitch Estimation Web App: {'list_devices': False, 'device': None, 'channel': '1', 'samplerate': 44100, 'blocksize': 512, 'freq': [55, 1760], 'ip': '0.0.0.0', 'port': 5005, 'tuning': 440.0, 'channel_names': None, 'smoothing': 15, 'gate': -60.0, 'websocket_fps': 30}
INFO:     Started server process [6319]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Open your browser and navigate to http://localhost:8000 to access the web interface.

Set specific parameters:

uv run src/web/web.py --device MixDevice --channel 1-4 --tuning 442 --channel-names Soprano Alto Tenor Bass

This example uses the audio device named MixDevice, listens to channels 1 to 4, sets the tuning reference frequency to 442 Hz, and assigns custom names (Soprano, Alto, Tenor, Bass) to the channels to be displayed in the web interface.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS. This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant numbers 500643750 (MU 2686/15-1) and 555525568 (MU 2686/18-1).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published