add v3 support #4

Alex-Wengg · 2025-11-03T00:44:42Z


================================================================================
BENCHMARK RESULTS
================================================================================
   Dataset: librispeech test-clean
   Model: parakeet-v3
   Device: NPU
   Files processed: 2620
   fluidaudio v3 (multi) Average WER |   2.6%
   Average WER: 3.7%
   Median WER: 0.0%
   Average CER: 1.9%
   Median CER: 0.0%
   Median RTFx: 23.5x
   Overall RTFx: 25.7x (19452.5s / 756.4s)
   Benchmark runtime: 789.8s
Results saved to: v3.json
   Normalization: OpenAI Whisper English
================================================================================


================================================================================
BENCHMARK RESULTS
================================================================================
   Dataset: librispeech test-clean
   Model: parakeet-v2
   Device: NPU
   Files processed: 2620
   fluidAudio v2 (EN) Average WER: 2.2%  
   Average WER: 2.8%
   Median WER: 0.0%
   Average CER: 1.1%
   Median CER: 0.0%
   Median RTFx: 25.7x
   Overall RTFx: 27.4x (19452.5s / 711.2s)
   Benchmark runtime: 764.2s
Results saved to: v2_100.json
   Normalization: OpenAI Whisper English

claude · 2025-11-03T00:48:52Z

Pull Request Review: Add Parakeet v3 Support

Overview

This PR adds support for the Parakeet v3 model alongside the existing v2 model, with improved benchmarking metrics (WER and CER). The implementation is generally solid with good performance results, but there are several areas that need attention.

Code Quality & Best Practices

Strengths:

Clean separation of model configurations in model_configs.hpp
Good use of const-correctness and inline constants
Proper use of structured bindings and modern C++ features
Consistent naming conventions
Helpful comments explaining format differences

Issues:

1. Magic Number Without Documentation (src/eddy_c.cpp:379)
The value 8192 is a magic number with no explanation. Should be documented as a named constant and consider extracting both v2 (1024) and v3 (8192) blank token IDs to model_configs.hpp.

2. Silent Fallback Logic (src/eddy_c.cpp:379)
All non-8192 values default to v2, which could hide configuration errors. Should validate that blank_token_id matches expected values and fail explicitly on mismatch.

3. Windows-Only Legacy Path Handling (src/eddy_c.cpp:401-406)
Only checks for v2 legacy path, not v3. Should either support both or be documented as migration code with a deprecation plan.

Potential Bugs

1. CRITICAL: Vocabulary Size Validation Missing (src/models/parakeet-v2/tokenizer.cpp:24-34)
The v3 format includes vocab_size metadata, but it is never validated against the actual loaded vocabulary. If the vocab file is corrupted or truncated, the mismatch will not be detected. Add validation after loading to check vocab_.size() matches expected vocab_size.

2. Integer Overflow Risk (src/models/parakeet-v2/tokenizer.cpp:39)
std::stoul can throw std::out_of_range for values larger than ULONG_MAX. Should wrap in try-catch or validate before conversion.

3. Benchmark Metric Calculation Changed (benchmarks/benchmark.py:401-404)
The PR changes from corpus-level WER to per-file average WER. These are mathematically different metrics:

Per-file average weights all files equally
Corpus-level WER weights by word count (standard in ASR)
Impact: Benchmark results are not comparable to previous versions
Fix: Either keep corpus-level WER as primary metric or document this breaking change prominently

Performance Considerations

Positive:

Excellent benchmark results: v3 shows 23.5x median RTFx
Good WER/CER metrics (3.7% / 1.9% for v3)

Concerns:

1. Redundant String Construction (src/eddy_c.cpp:388-394)
Creates unnecessary string copy for comparison. Consider using std::string_view.

2. Double Iteration Over Vocabulary
First loop finds max_id, second loop populates vocab. Could be optimized to single pass if vocab loading is on critical path.

Security Concerns

1. Unchecked File Path Concatenation (src/eddy_c.cpp:412-416)
If model_dir comes from user input (which it can via config.model_dir), there is no path traversal validation. User could pass ../../sensitive_dir as model_dir. Fix: Validate that model_dir does not contain ../ or canonicalize the path before use.

2. No Input Validation on blank_token_id
C++ side (eddy_c.cpp:421) allows arbitrary positive values without validation.

Test Coverage

CRITICAL ISSUE: No automated tests for this functionality

The PR adds significant new functionality but includes:

No unit tests for v3 tokenizer format parsing
No integration tests for v3 model loading
No tests for blank_token_id inference logic
No tests for model_dir resolution with cache special value
No tests for vocabulary size validation
Only manual benchmark results in PR description

Recommendations:

Add unit tests for Tokenizer::load() with both v2 and v3 format fixtures
Add tests for edge cases: empty vocab, malformed JSON, wrong vocab_size
Add tests for eddy_parakeet_create() with different blank_token_id values
Add CI job to run benchmarks on subset of data to catch regressions

Additional Issues

1. Inconsistent Default Behavior
C++ defaults to v2 when blank_token_id is invalid while Python defaults to v2 explicitly. These should align.

2. Missing Documentation

No updates to README.md about v3 support
No mention of how to use v3 in docs/

3. Breaking Change Not Highlighted
The WER calculation change (corpus-level to per-file average) is a breaking change that makes old benchmarks incomparable. This should be documented in the PR description and commit message.

Recommendations Summary

Must Fix (Blocking):

Add validation for blank_token_id values (reject invalid values)
Add vocabulary size validation for v3 format
Fix or document the WER metric change
Add basic unit tests for new functionality

Should Fix:

Extract magic numbers (8192, 1024) to named constants
Add path validation for user-provided model_dir
Handle v3 legacy path on Windows or document v2-only behavior
Update README.md with v3 usage instructions

Nice to Have:

Optimize vocabulary loading (single-pass iteration)
Use std::string_view to avoid unnecessary copies
Add CI integration tests

Conclusion

This is a solid PR that successfully adds v3 support with good performance results. However, it has several security and correctness issues that should be addressed before merging. The lack of automated tests is concerning for a core feature addition.

Recommendation: Request changes - address critical issues (validation, tests, breaking change documentation) before merging.

Great work on the implementation! The code is clean and the benchmarks show excellent performance. With the suggested fixes, this will be a strong addition to the codebase.

Review generated with assistance from Claude Code

BrandonWeng

nice

add v3 support

5d90e43

BrandonWeng approved these changes Nov 3, 2025

View reviewed changes

Alex-Wengg marked this pull request as ready for review November 3, 2025 02:10

Alex-Wengg merged commit 5bdbbb5 into main Nov 3, 2025
2 checks passed

BrandonWeng deleted the parakeet-v3 branch November 3, 2025 02:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add v3 support #4

add v3 support #4

Uh oh!

Alex-Wengg commented Nov 3, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 3, 2025

Uh oh!

BrandonWeng left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add v3 support #4

add v3 support #4

Uh oh!

Conversation

Alex-Wengg commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Nov 3, 2025

Pull Request Review: Add Parakeet v3 Support

Overview

Code Quality & Best Practices

Strengths:

Issues:

Potential Bugs

Performance Considerations

Positive:

Concerns:

Security Concerns

Test Coverage

Additional Issues

Recommendations Summary

Conclusion

Uh oh!

BrandonWeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alex-Wengg commented Nov 3, 2025 •

edited

Loading