Skip to content

Conversation

@Alex-Wengg
Copy link
Contributor

i decided to try to run v3 and v2 from scratched with a refresh git clone

  • Added --model parakeet-v2|parakeet-v3 flag to parakeet_cli
  • Fixed v3 model repo ID: 0.6b-v3-ov → 1.1b-v3-ov
  • Fixed v3 preprocessor bug: round sample counts to nearest 1000 to avoid edge case failures
  • Added dynamic windowing support for v3 (10s chunks)
  • Added encoder fallback for dynamic shapes (1250 frames default)
  • Enhanced README with OpenVINO installation, build tools, vcpkg setup, and Quick Start guide

@Alex-Wengg Alex-Wengg marked this pull request as ready for review November 6, 2025 06:29
@claude
Copy link

claude bot commented Nov 6, 2025

Pull Request Review

Overview

This PR adds Parakeet v3 model support and improves the build/setup documentation. The changes are functional and well-intentioned, but there are several areas that need attention.


Critical Issues

1. Workaround Needs Better Investigation (parakeet_preprocessor.cpp:47-56)

// Workaround for v3 preprocessor bug: round sample count to nearest 1000
// v3 fails on specific odd-length audio (e.g., 240,135 samples)
AudioSegment working_segment = segment;
const size_t original_size = segment.pcm.size();
const size_t round_to = 1000;
const size_t rounded_size = ((original_size + round_to - 1) / round_to) * round_to;

if (rounded_size != original_size) {
  working_segment.pcm.resize(rounded_size, 0.0F);  // Pad with zeros
}

Issues:

  • This unconditionally copies the entire audio segment even when padding isn't needed (line 49)
  • The root cause of the "v3 preprocessor bug" is unclear - is this a model export issue, OpenVINO compilation issue, or actual model limitation?
  • Padding with zeros could affect transcription quality for audio that ends precisely at boundaries
  • The value 1000 appears arbitrary without justification

Recommendations:

  • Only copy the segment when padding is actually needed: if (rounded_size != original_size) { working_segment.pcm = segment.pcm; working_segment.pcm.resize(...); }
  • Investigate the root cause - this might be a model export issue that should be fixed upstream
  • Document why 1000 samples (62.5ms at 16kHz) was chosen
  • Consider if this should only apply to v3, not all models

2. Magic Number for Dynamic Encoder Fallback (parakeet_openvino.cpp:395-399)

// Fallback for dynamic shapes: Use 1250 frames (10 seconds at 125 frames/sec mel rate)
if (impl_->encoder_expected_frames == 0) {
    impl_->encoder_expected_frames = 1250;
}

Issues:

  • The comment says "125 frames/sec mel rate" but this needs verification
  • No explanation for why 10 seconds is the right chunk size for dynamic models
  • Should this be different for v2 vs v3?

Recommendation:

  • Add a constant: constexpr size_t DEFAULT_ENCODER_FRAMES = 1250; // 10s at 125 frames/sec
  • Document why this matches the preprocessor window size

3. Inconsistent Windowing Logic (parakeet_preprocessor.cpp:79-83)

if (window_samples == 0 && working_segment.pcm.size() > 160000) {
    window_samples = 160000;  // 10 seconds at 16kHz (matches v2)
}

Issue:

  • This only sets windowing if audio is longer than 10 seconds, but the comment says it "matches v2"
  • If the model has dynamic shape AND audio is ≤10s, window_samples stays 0 and single-shot path is used
  • This is probably correct, but the logic flow is confusing

Recommendation:

  • Add a comment explaining the decision tree more clearly
  • Consider: constexpr size_t MAX_WINDOW_SAMPLES = 160000; // 10s at 16kHz

Code Quality Issues

4. Input Validation Missing (parakeet_cli.cpp:57-65)

The CLI accepts --model parakeet-v2|parakeet-v3 but doesn't validate the value against the actual MODEL_MAP in model_configs.hpp.

Recommendation:

  • Use model_configs::MODEL_MAP.find(model_name) to validate input
  • This makes the code more maintainable as new models are added

5. Hardcoded Paths in Documentation (README.md:79)

cmake -S . -B build -DCMAKE_TOOLCHAIN_FILE=C:\vcpkg\scripts\buildsystems\vcpkg.cmake

Issue:

  • Windows path is hardcoded, but vcpkg location varies

Recommendation:

  • Use [path-to-vcpkg] placeholder like the comment above it
  • Show both Windows and Linux examples

6. Incorrect Comment (README.md:147)

# Note: V3 currently only tested with English. Multilingual support coming soon.

Issue:

  • The code changes don't actually add language parameter support for v3
  • This creates false expectations

Recommendation:

  • Either implement language support or clarify that v3 multilingual is not yet integrated

Performance Considerations

7. Unnecessary Audio Copy

As mentioned in Issue #1, the audio segment is copied unconditionally for v3, even when no padding is needed. For long audio files, this could double memory usage temporarily.

Impact: Minor for typical use cases (<1 minute audio), but could be significant for long recordings.

8. No Benchmark for V3

The PR adds v3 support but doesn't mention if benchmarks were run to compare:

  • WER (Word Error Rate)
  • RTFx (Real-Time Factor)
  • Memory usage

Recommendation:

  • Run benchmark_librispeech.exe with both v2 and v3
  • Document results in PR description or BENCHMARK_RESULTS.md

Security Concerns

9. Integer Overflow Checks

The existing code already has good overflow protection (lines 147-150, 209-213 in parakeet_preprocessor.cpp). No new issues introduced.

10. Input Size Validation ⚠️

The rounding workaround could theoretically cause issues if:

  • original_size is near SIZE_MAX
  • The padding calculation overflows

Recommendation:

  • Add overflow check: if (original_size > SIZE_MAX - round_to) { throw; }

Test Coverage

11. No Automated Tests

The repository has no unit tests (tests/ directory doesn't exist). This PR adds complex preprocessing logic without test coverage.

Recommendations:

  • Add unit tests for:
    • Audio padding logic (especially edge cases like 999, 1000, 1001 samples)
    • v2 vs v3 model selection
    • Dynamic shape handling
    • CLI argument parsing
  • Add integration test with a small test audio file for both v2 and v3

12. Manual Testing Status

The PR description says "i decided to try to run v3 and v2 from scratch" but doesn't specify:

  • What audio files were tested?
  • What were the transcription results?
  • Were there any quality regressions?

Documentation

13. README Improvements

The documentation improvements are excellent:

  • Clear prerequisites for OpenVINO, CMake, vcpkg
  • Platform-specific instructions
  • Quick start guide

14. Missing Details

  • No explanation of when to use v2 vs v3
  • Model size mentioned (600MB vs 1.1GB) but no accuracy comparison
  • No mention of v3 performance characteristics (RTFx on CPU/NPU)

Best Practices

15. Commit Message

The PR title "build eddy from scratch" doesn't reflect the actual changes.

Suggestion:

  • "Add Parakeet v3 model support and improve setup documentation"
  • Follow conventional commits: feat: add parakeet-v3 model support

16. Code Comments

Comments are generally good and explain the "why" not just the "what". The preprocessor workaround comment could be more detailed.


Summary

Strengths:

  • Documentation improvements are very helpful
  • Model configuration abstraction is clean
  • Existing overflow protection maintained
  • Dynamic shape handling is thoughtful

Must Fix Before Merge:

  1. Optimize audio segment copying (Issue Parakeet TDT v2 implementation with chunking, #1)
  2. Investigate and document the "v3 preprocessor bug" root cause
  3. Add overflow check for padding calculation (Issue #10)

Should Fix:
4. Add at least basic unit tests for the new preprocessing logic
5. Run and document benchmarks comparing v2 and v3
6. Validate CLI model input against MODEL_MAP
7. Document when to use v2 vs v3

Nice to Have:
8. Extract magic numbers into named constants
9. Better commit message
10. Test with various audio lengths to validate the workaround


Overall Assessment: The functionality appears correct, but the preprocessing workaround is concerning without understanding the root cause. The documentation improvements are valuable. I'd recommend testing more thoroughly and adding some basic unit tests before merging.

Would you like me to help implement any of these suggestions?

Copy link
Member

@BrandonWeng BrandonWeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

@Alex-Wengg Alex-Wengg merged commit 56281eb into main Nov 10, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants