Skip to content

evenfurther/phonopaper-rs

Repository files navigation

phonopaper-rs

A Rust library and command-line tool for encoding and decoding audio in the PhonoPaper format.

PhonoPaper (invented by Alexander Zolotov / NightRadio) represents audio as a camera-readable grayscale spectrogram image that can be printed on paper and played back by sweeping a phone camera across it.

Independence notice: phonopaper-rs is an independent, clean-room implementation written from scratch by studying the publicly available PhonoPaper format specification. It is not affiliated with, endorsed by, or derived from Alexander Zolotov's original work in any way. The PhonoPaper name and concept belong to their respective author.

AI disclosure: Large language models were used heavily during the development of this library — for code drafting, test generation, and documentation — under strict human supervision and validation.


Repository layout

This is a Cargo workspace with two crates:

Crate Path Description
phonopaper-rs phonopaper-rs/ Core library — encode, decode, render, vector output
phonopaper-cli phonopaper-cli/ phonopaper binary — four subcommands

Format summary

Property Value
X axis Time, left → right
Y axis Frequency, logarithmic musical scale (top = high, bottom = low)
Pixel brightness White = silence, black = maximum amplitude
Frequency bins 384 (8 octaves × 12 semitones × 4 subdivisions)
Frequency range ≈ 66 Hz (C2) … ≈ 4186 Hz (C8); bins 0–95 above C8 are above human hearing
Marker bands Alternating black/white stripes at top and bottom for auto-calibration

For a detailed technical description see PHONOPAPER_SPEC.md.


Command-line tool

Installation

From this repository:

cargo install --path phonopaper-cli

Or run directly from the workspace:

cargo run -p phonopaper-cli -- <subcommand> [options]

Or, if you want to install the released version:

cargo install phonopaper-cli

decode — image → WAV

Convert a PhonoPaper image to a WAV audio file.

phonopaper decode <INPUT> [OUTPUT] [OPTIONS]

Supported input formats: PNG, JPEG, SVG, PDF. Default output: <input-stem>.wav (mono, 16-bit PCM).

Option Default Description
--samples-per-column <N> 353 PCM samples synthesised per image column (353 ≈ Android's 44 100 ÷ 125)
--sample-rate <HZ> 44100 Output sample rate
--gain <GAIN> 3.0 Master output gain (linear)
--gamma <G> 1.0 Inverse gamma for pixel → amplitude; must match the --gamma used when encoding
--min-db <DB> -60.0 Lower bound of the dB window used when the image was encoded
--max-db <DB> -10.0 Upper bound of the dB window used when the image was encoded
--amplitude-threshold <T> off Binary threshold (e.g. 0.85 for Android-like fidelity); conflicts with --linear
--linear off Linear amplitude mode with no threshold; conflicts with --amplitude-threshold

Examples:

phonopaper decode code.png
phonopaper decode code.png output.wav --gain 2.0
phonopaper decode scan.pdf output.wav
phonopaper decode code.svg output.wav --amplitude-threshold 0.85

encode — WAV / MP3 → image

Convert an audio file to a PhonoPaper image.

phonopaper encode <INPUT> [OUTPUT] [OPTIONS]

Supported input formats: WAV (any bit depth, mono or stereo), MP3. Supported output formats: PNG, JPEG, SVG, PDF (inferred from extension). Default output: <input-stem>.png.

Option Default Description
--fft-window <N> 4096 FFT window size in samples (power of two)
--hop-size <N> 353 Hop size between FFT frames; controls columns/second
--px-per-octave <PX> 90 Pixels per octave (total data height = value × 8)
--gamma <G> 1.0 Gamma correction applied to amplitudes before writing pixels
--min-db <DB> -60.0 Bins quieter than this map to white (silence)
--max-db <DB> -10.0 Bins at this level or louder map to black (maximum)
--octave-lines off Draw light-gray octave separator lines (⚠ produces audible hum on decode)
--pdf-page <SIZE> a4-landscape PDF page size for centred output; use pixel-perfect for a 1 px = 1 pt page (PDF only)

Supported --pdf-page values: a4-landscape (default), a4-portrait, a3-landscape, a3-portrait, letter-landscape, letter-portrait, pixel-perfect.

Examples:

phonopaper encode input.wav
phonopaper encode input.mp3 code.png
phonopaper encode input.wav code.svg
phonopaper encode input.wav code.pdf --px-per-octave 120

JPEG output applies lossy compression that can corrupt amplitude data and degrade decode quality. Prefer PNG for any software round-trip; JPEG is acceptable only when the image will be scanned by a phone camera.

robust-decode — camera photo → WAV

Decode a PhonoPaper print photographed by a camera, tolerating keystone distortion, tilt, and paper curl. Uses per-column marker detection followed by linear interpolation of the data boundaries — no explicit de-warp step is required.

phonopaper robust-decode --input <FILE> --output <FILE> [OPTIONS]
Option Default Description
-i, --input <FILE> required Input image (PNG or JPEG)
-o, --output <FILE> required Output WAV file (mono, 16-bit PCM)
--sample-columns <N> 50 Number of evenly-spaced columns to sample for marker detection
--gain <GAIN> 3.0 Master output gain (linear)
--sample-rate <HZ> 44100 Output sample rate
--samples-per-column <N> 353 PCM samples synthesised per image column
--min-db <DB> -60.0 Lower bound of the dB window
--max-db <DB> -10.0 Upper bound of the dB window
--debug-image <FILE> off Save a PNG with detected data_top/data_bottom lines overlaid
--rectified <FILE> off Save a perspective-rectified PNG of the data area

Example:

phonopaper robust-decode --input photo.jpg --output decoded.wav
phonopaper robust-decode --input photo.jpg --output decoded.wav \
    --debug-image debug.png --rectified rectified.png

blank — generate a blank template PDF

Generate a ready-to-print PhonoPaper template: marker bands, blank white data area for hand-drawn music, octave separator lines, octave labels (C2–C8), and a footer.

phonopaper blank [OPTIONS]
Option Default Description
-o, --output <FILE> blank_phonopaper.pdf Output PDF path
--paper <FORMAT> a4 Paper size: a4, a3, letter, legal
--portrait off Portrait orientation (default is landscape)

Example:

phonopaper blank
phonopaper blank --paper a3 --output template_a3.pdf
phonopaper blank --paper letter --portrait

Library usage

Add to your Cargo.toml:

[dependencies]
phonopaper-rs = "0.1.0"

The crate name on disk is phonopaper-rs; the Rust module name is phonopaper_rs.

Decode an image to a WAV file

use phonopaper_rs::decode::{decode_image_to_wav, SynthesisOptions};

decode_image_to_wav("code.png", "output.wav", SynthesisOptions::default())?;
# Ok::<(), phonopaper_rs::PhonoPaperError>(())

Encode an audio file to an image

use phonopaper_rs::encode::{encode_wav_to_image, AnalysisOptions};
use phonopaper_rs::render::RenderOptions;

encode_wav_to_image(
    "input.wav",
    "code.png",
    AnalysisOptions::default(),
    RenderOptions::default(),
)?;
# Ok::<(), phonopaper_rs::PhonoPaperError>(())

Real-time / column-by-column playback

Synthesizer maintains oscillator phase across calls, avoiding clicks at column boundaries — matching the PhonoPaper Android app's playback model.

use phonopaper_rs::decode::{
    Synthesizer, SynthesisOptions,
    column_amplitudes_from_image, detect_markers,
};

let image = image::open("code.png")?;
let bounds = detect_markers(&image)?;

let mut synth = Synthesizer::<353>::new(SynthesisOptions::default());
let mut pcm = [0.0_f32; 353];

for col_x in 0..image.width() {
    let amps = column_amplitudes_from_image(&image, Some(bounds), col_x)?;
    synth.synthesize_column(&amps, &mut pcm);
    // Send `pcm` to an audio device...
}
# Ok::<(), phonopaper_rs::PhonoPaperError>(())

Vector output (SVG / PDF)

use phonopaper_rs::encode::{AnalysisOptions, audio_to_spectrogram};
use phonopaper_rs::render::RenderOptions;
use phonopaper_rs::vector::{PdfPageLayout, page_size, spectrogram_to_pdf, spectrogram_to_svg};

let (mono, sample_rate) = phonopaper_rs::audio::read_audio_file("input.wav")?;
let spec = audio_to_spectrogram(&mono, sample_rate, &AnalysisOptions::default())?;
let render = RenderOptions::default();

std::fs::write("code.svg", spectrogram_to_svg(&spec, &render))?;
// A4 landscape, 10 mm margins, image centred on the page:
std::fs::write("code.pdf", spectrogram_to_pdf(&spec, &render, PdfPageLayout::FitToPage {
    page_width_pt:  page_size::A4_LANDSCAPE.0,
    page_height_pt: page_size::A4_LANDSCAPE.1,
    margin_pt: 28.35,
}))?;
# Ok::<(), Box<dyn std::error::Error>>(())

Public API overview

Module Key items
phonopaper_rs::error PhonoPaperError, Result
phonopaper_rs::format TOTAL_BINS, OCTAVES, SAMPLE_RATE, bin_to_freq, freq_to_bin
phonopaper_rs::spectrogram Spectrogram<S>, SpectrogramVec, SpectrogramBuf, SpectrogramBufMut
phonopaper_rs::render RenderOptions, spectrogram_to_image, spectrogram_to_image_buf
phonopaper_rs::vector spectrogram_to_svg, spectrogram_to_pdf, image_from_svg, image_from_pdf, PdfPageLayout, page_size
phonopaper_rs::audio read_audio_file (WAV + MP3 → mono f32 + sample rate)
phonopaper_rs::encode AnalysisOptions, audio_to_spectrogram, encode_wav_to_image
phonopaper_rs::decode SynthesisOptions, AmplitudeMode, Synthesizer<SPS>, DataBounds, detect_markers, detect_markers_at_column, column_amplitudes_from_image, spectrogram_to_audio, decode_image_to_wav, decode_image_to_wav_sps

Supported formats

Direction Format Notes
Input audio WAV Any bit depth (8/16/24/32-bit int, 32-bit float); mono or stereo
Input audio MP3 Via symphonia; mono or stereo
Input image PNG Lossless; preferred for software round-trips
Input image JPEG Accepted; lossy compression may degrade decode quality
Input image SVG Embedded raster image extracted automatically
Input image PDF Embedded raster image extracted automatically
Output image PNG Lossless; strongly preferred
Output image JPEG Lossy; acceptable for camera-scanned use cases only
Output image SVG Vector; marker bands as <rect> elements + embedded PNG
Output image PDF Vector; marker bands as rectangles + deflate-compressed raster XObject
Output audio WAV Mono, 16-bit signed PCM

Development

All four commands must pass with zero warnings and zero errors:

cargo fmt --check --all
cargo clippy --workspace --all-targets
cargo test --workspace
cargo bench -p phonopaper-rs

Coverage (informational, not a hard gate):

cargo llvm-cov -p phonopaper-rs --tests \
    --ignore-filename-regex='(benches|examples)' --summary-only

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.

About

Phonopaper encoder and decoder

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages