am2mid = Audio Melody to MIDI. Tiny CLI that turns a YouTube track into a melody MIDI file (and/or an isolated melody stem you can drag into NeuralNote or any other audio→MIDI VST).
Pipeline:
YouTube URL ──► yt-dlp ──► (optional trim) ──► Demucs (htdemucs) ──► melody stem ──► Basic Pitch ──► .mid
System tools (install once):
brew install ffmpegPython deps are managed with uv:
uv syncThat installs yt-dlp, demucs, basic-pitch, etc. into a local .venv.
Single track (default: all stems saved as wav, MIDI generated for other):
uv run am2mid "https://www.youtube.com/watch?v=zGDzdps75ns" \
--range "1:05-2:05" \
--out ./outputSkip transcription if you plan to use NeuralNote VST instead:
uv run am2mid "<URL>" --range "1:05-2:05" --no-midiTranscribe every stem (not just other):
uv run am2mid "<URL>" --range "1:05-2:05" --midi-allQuantize the transcription to a 1/16th grid (BPM auto-detected from the drums stem):
uv run am2mid "<URL>" --range "1:05-2:05" --quantize 1/16Force a specific BPM and also keep the stem .wav files:
uv run am2mid "<URL>" --range "1:05-2:05" --quantize 1/16 --bpm 138 --with-stemsUse the 6-stem model when the lead is a recognizable guitar/piano:
uv run am2mid "<URL>" --range "1:05-2:05" --model htdemucs_6s --stem pianoBatch mode from a YAML file (see example.yaml):
uv run am2mid --batch example.yaml| Flag | Default | Description |
|---|---|---|
url (positional) |
— | YouTube URL (omit if using --batch). |
--range, -r |
full track | Melody window, format MM:SS-MM:SS. |
--batch, -b |
— | YAML file with a list of songs. |
--out, -o |
./output |
Output directory. |
--name, -n |
from video title | Folder name (single-URL mode). |
--stem |
other |
Which stem gets transcribed to MIDI. All stems are always saved as .wav. |
--model |
htdemucs |
Demucs model (use htdemucs_6s for guitar/piano stems). |
--midi-all |
off | Transcribe every stem, not just --stem. |
--no-midi |
off | Skip basic-pitch (use a VST instead). |
--quantize, -q |
off | Snap MIDI notes to a grid. Subdivision as 1/16, 1/8, 16… (powers of 2, 1–32). Writes a sidecar *.q<N>.mid. |
--bpm |
auto (when --quantize) |
Tempo. Either a number (138) or auto to detect from the drums stem. |
--with-stems, -s / --no-stems |
on | Save the chosen melody stem .wav (only --stem). |
--all-stems, -a |
off | Save ALL stems as .wav (for A/B comparison). |
--with-full / --no-full |
on | Save the un-separated full audio as midi/stems/<name>_full.wav. |
--bpm-range |
120-160 |
Window used to fold auto-detected BPM (e.g. 70 → 140). Format MIN-MAX. Ignored when --bpm is explicit. |
--force, -f |
off | Overwrite existing midi/stem files without prompting. |
output/
midi/
<name>.mid # primary MIDI (the --stem one, default 'other')
<name>.q16.mid # quantized sidecar (if --quantize)
<name>_drums.mid # extra MIDIs when --midi-all
stems/ # created when --with-stems / --all-stems / --with-full
<name>_full.wav # un-separated audio (default on)
<name>_other.wav # chosen melody stem (default on)
<name>_drums.wav # other stems only with --all-stems
...
.work/<name>/ # raw download + demucs intermediates
<name> is your --name, or the YouTube title sanitized into a readable, filesystem-safe form.
example.yaml:
defaults: # any CLI flag, applied to every song; CLI overrides win
quantize: 1/16
bpm: auto # auto-detected, folded into bpm_range
bpm_range: 120-160 # trance-friendly window
with_full: true
songs:
- url: https://www.youtube.com/watch?v=...
name: my-track
range: "1:05-2:05"
- url: https://www.youtube.com/watch?v=...
name: tricky-tempo
bpm: 140 # per-song override beats defaults + CLI
all_stems: true # also dump every stem just for this trackRun with:
uv run am2mid --batch example.yaml
# Override a default at runtime (per-song overrides still win):
uv run am2mid --batch example.yaml --no-fullauto (the default when --quantize is set) runs librosa's beat tracker on the drums stem (cleaner than the full mix). The raw estimate is then folded into --bpm-range (default 120-160) by ×2 / ÷2 multiples, so half-time detections like 70 BPM become 140 BPM. Explicit --bpm 138 is never altered.
- The synth lead almost always lands in the
otherstem ofhtdemucs(the default 4-stem model). If the track has a clear guitar or piano lead, switch to the 6-stem model with--model htdemucs_6s --stem guitar(orpiano). - Pick a clean melodic section (no vocal chops, no big build-up FX) with
--range— Basic Pitch is much happier with 30–60s of clear melody than the whole 7-minute track. - For best quality, keep the stem (
--keep-stem) and run it through NeuralNote in your DAW, then nudge octaves and quantize manually.