Release notes from NeMoFeatureExtractor-iOS

v1.0.5

2026-02-06T09:32:04Z

v1.0.5: cleanup debug logging, fix resource loading

- Simplify mel_filterbank.bin loading (supports both .copy and .process)
- Remove debug print statements
- Update README with correct org URL and version

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

1.0.4: debug: add logging for resource loading

2026-02-06T09:01:26Z

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

1.0.3: fix: use .process instead of .copy for resources

2026-02-06T08:49:19Z

Fixes code signing issues on iOS simulator where .copy creates
an unrecognized bundle format.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

v1.0.2: Fix frame count formula based on NeMo source code

2026-02-06T08:27:29Z

Analyzed NeMo's FilterbankFeatures.get_seq_len() and forward() methods.

Correct formula:

STFT frames = 1 + audio_length // hop_length (torch.stft with center=True)
Output frames = round_up(stft_frames, pad_to) if pad_to > 0

Previous formula (n + windowSize) / hopLength was incorrect.

Now all model configs match NeMo exactly:

VAD: [80, 52] == [80, 52] (pad_to=2)
Speaker: [80, 64] == [80, 64] (pad_to=16)
ASR: [80, 51] == [80, 51] (pad_to=0)

Also improved tests:

Added strict frame count equality checks
Updated tolerances to 1e-4 max diff, 1e-5 avg diff

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

v1.0.1: Fix STFT frame count formula to match NeMo exactly

2026-02-06T08:18:00Z

Changed frame count calculation for center padding from:
sampleCount / hopLength + 1 (gave 51 frames)
to:
(sampleCount + windowSize) / hopLength (gives 52 frames)

This matches NeMo's AudioToMelSpectrogramPreprocessor output exactly.
Also updated output frame calculation to use nFrames instead of validFrames.

Test results now show exact shape match:

Swift mel shape: [80, 52]
NeMo mel shape: [80, 52]
Max diff: 5.4e-05

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Initial release v1.0.0 - NeMoFeatureExtractor for iOS

2026-02-06T07:47:49Z

Swift library for extracting mel-spectrogram features compatible with
NVIDIA NeMo speech models. Features:

Exact compatibility with NeMo's feature extraction pipeline
Supports VAD (MarbleNet), Speaker (TitaNet), and ASR models
High performance using Apple's Accelerate framework (vDSP)
Pre-computed mel filterbank from NeMo for maximum accuracy
Output as [[Float]] or MLMultiArray for CoreML inference
Tested against NeMo Python reference (max diff < 6e-05)

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com