Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ __pycache__/
*.pyw
*.pyz
*.pywz
*.pyzw
*.pyzw

.DS_Store
33 changes: 32 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,46 @@ uv run noise-canceller.py input.flac --filter BVC

# Use WebRTC built-in noise suppression (faster, local processing)
uv run noise-canceller.py input.wav --filter WebRTC

# Run all filters and save separate output files
uv run noise-canceller.py input.mp3 --filter all
```

### Filter Types

- **NC**: Standard enhanced noise cancellation (default)
- **BVC**: Background voice cancellation (removes background voices + noise)
- **BVCTelephony**: BVC optimized for telephony applications
- **aic-quail-l**: Ai-Coustics QUAIL-L speech enhancement
- **aic-quail-vfl**: Ai-Coustics QUAIL-VF-L speech enhancement
- **WebRTC**: For comparison purposes, apply WebRTC built-in `noise_suppression` to the audio

### Transcription & WER Analysis

When a ground-truth transcript is provided via `-t`, the tool transcribes both the original and processed audio using [LiveKit Inference STT](https://docs.livekit.io/agents/integrations/stt/) and generates a Markdown report comparing word error rates.

Transcription runs in parallel with audio processing — original audio chunks are streamed to both the noise cancellation pipeline and the STT service simultaneously, and processed chunks are sent to a second STT stream as they arrive from the pipeline.

```bash
# Transcribe and compare against ground truth
uv run noise-canceller.py input.mp3 --filter NC -t transcript.txt

# Use a different STT model
uv run noise-canceller.py input.mp3 --filter BVC -t transcript.txt --stt deepgram/nova-3:en

# Run all filters with transcription
uv run noise-canceller.py input.mp3 --filter all -t transcript.txt
```

The report is saved as a `.transcript.md` file alongside each output file and includes:

- **Metrics table** with WER, substitutions, insertions, and deletions for both original and processed audio
- **Raw transcripts** for both original and processed audio
- **Diff view** with errors annotated inline:
- ~~word~~ — missing word (in ground truth but not transcribed)
- **word** — extra word (transcribed but not in ground truth)
- ~~expected~~**actual** — wrong word (substitution)

## License

This tool is provided as-is under the MIT License. See [LICENSE](LICENSE) for details.
This tool is provided as-is under the MIT License. See [LICENSE](LICENSE) for details.
Loading
Loading