build eddy from scratch #7

Alex-Wengg · 2025-11-06T06:29:25Z

i decided to try to run v3 and v2 from scratched with a refresh git clone

Added --model parakeet-v2|parakeet-v3 flag to parakeet_cli
Fixed v3 model repo ID: 0.6b-v3-ov → 1.1b-v3-ov
Fixed v3 preprocessor bug: round sample counts to nearest 1000 to avoid edge case failures
Added dynamic windowing support for v3 (10s chunks)
Added encoder fallback for dynamic shapes (1250 frames default)
Enhanced README with OpenVINO installation, build tools, vcpkg setup, and Quick Start guide

claude · 2025-11-06T06:31:24Z

Pull Request Review

Overview

This PR adds Parakeet v3 model support and improves the build/setup documentation. The changes are functional and well-intentioned, but there are several areas that need attention.

Critical Issues

1. Workaround Needs Better Investigation (`parakeet_preprocessor.cpp:47-56`)

// Workaround for v3 preprocessor bug: round sample count to nearest 1000
// v3 fails on specific odd-length audio (e.g., 240,135 samples)
AudioSegment working_segment = segment;
const size_t original_size = segment.pcm.size();
const size_t round_to = 1000;
const size_t rounded_size = ((original_size + round_to - 1) / round_to) * round_to;

if (rounded_size != original_size) {
  working_segment.pcm.resize(rounded_size, 0.0F);  // Pad with zeros
}

Issues:

This unconditionally copies the entire audio segment even when padding isn't needed (line 49)
The root cause of the "v3 preprocessor bug" is unclear - is this a model export issue, OpenVINO compilation issue, or actual model limitation?
Padding with zeros could affect transcription quality for audio that ends precisely at boundaries
The value 1000 appears arbitrary without justification

Recommendations:

Only copy the segment when padding is actually needed: if (rounded_size != original_size) { working_segment.pcm = segment.pcm; working_segment.pcm.resize(...); }
Investigate the root cause - this might be a model export issue that should be fixed upstream
Document why 1000 samples (62.5ms at 16kHz) was chosen
Consider if this should only apply to v3, not all models

2. Magic Number for Dynamic Encoder Fallback (`parakeet_openvino.cpp:395-399`)

// Fallback for dynamic shapes: Use 1250 frames (10 seconds at 125 frames/sec mel rate)
if (impl_->encoder_expected_frames == 0) {
    impl_->encoder_expected_frames = 1250;
}

Issues:

The comment says "125 frames/sec mel rate" but this needs verification
No explanation for why 10 seconds is the right chunk size for dynamic models
Should this be different for v2 vs v3?

Recommendation:

Add a constant: constexpr size_t DEFAULT_ENCODER_FRAMES = 1250; // 10s at 125 frames/sec
Document why this matches the preprocessor window size

3. Inconsistent Windowing Logic (`parakeet_preprocessor.cpp:79-83`)

if (window_samples == 0 && working_segment.pcm.size() > 160000) {
    window_samples = 160000;  // 10 seconds at 16kHz (matches v2)
}

Issue:

This only sets windowing if audio is longer than 10 seconds, but the comment says it "matches v2"
If the model has dynamic shape AND audio is ≤10s, window_samples stays 0 and single-shot path is used
This is probably correct, but the logic flow is confusing

Recommendation:

Add a comment explaining the decision tree more clearly
Consider: constexpr size_t MAX_WINDOW_SAMPLES = 160000; // 10s at 16kHz

Code Quality Issues

4. Input Validation Missing (`parakeet_cli.cpp:57-65`)

The CLI accepts --model parakeet-v2|parakeet-v3 but doesn't validate the value against the actual MODEL_MAP in model_configs.hpp.

Recommendation:

Use model_configs::MODEL_MAP.find(model_name) to validate input
This makes the code more maintainable as new models are added

5. Hardcoded Paths in Documentation (`README.md:79`)

cmake -S . -B build -DCMAKE_TOOLCHAIN_FILE=C:\vcpkg\scripts\buildsystems\vcpkg.cmake

Issue:

Windows path is hardcoded, but vcpkg location varies

Recommendation:

Use [path-to-vcpkg] placeholder like the comment above it
Show both Windows and Linux examples

6. Incorrect Comment (`README.md:147`)

# Note: V3 currently only tested with English. Multilingual support coming soon.

Issue:

The code changes don't actually add language parameter support for v3
This creates false expectations

Recommendation:

Either implement language support or clarify that v3 multilingual is not yet integrated

Performance Considerations

7. Unnecessary Audio Copy

As mentioned in Issue #1, the audio segment is copied unconditionally for v3, even when no padding is needed. For long audio files, this could double memory usage temporarily.

Impact: Minor for typical use cases (<1 minute audio), but could be significant for long recordings.

8. No Benchmark for V3

The PR adds v3 support but doesn't mention if benchmarks were run to compare:

WER (Word Error Rate)
RTFx (Real-Time Factor)
Memory usage

Recommendation:

Run benchmark_librispeech.exe with both v2 and v3
Document results in PR description or BENCHMARK_RESULTS.md

Security Concerns

9. Integer Overflow Checks ✅

The existing code already has good overflow protection (lines 147-150, 209-213 in parakeet_preprocessor.cpp). No new issues introduced.

10. Input Size Validation ⚠️

The rounding workaround could theoretically cause issues if:

original_size is near SIZE_MAX
The padding calculation overflows

Recommendation:

Add overflow check: if (original_size > SIZE_MAX - round_to) { throw; }

Test Coverage

11. No Automated Tests ❌

The repository has no unit tests (tests/ directory doesn't exist). This PR adds complex preprocessing logic without test coverage.

Recommendations:

Add unit tests for:
- Audio padding logic (especially edge cases like 999, 1000, 1001 samples)
- v2 vs v3 model selection
- Dynamic shape handling
- CLI argument parsing
Add integration test with a small test audio file for both v2 and v3

12. Manual Testing Status

The PR description says "i decided to try to run v3 and v2 from scratch" but doesn't specify:

What audio files were tested?
What were the transcription results?
Were there any quality regressions?

Documentation

13. README Improvements ✅

The documentation improvements are excellent:

Clear prerequisites for OpenVINO, CMake, vcpkg
Platform-specific instructions
Quick start guide

14. Missing Details

No explanation of when to use v2 vs v3
Model size mentioned (600MB vs 1.1GB) but no accuracy comparison
No mention of v3 performance characteristics (RTFx on CPU/NPU)

Best Practices

15. Commit Message

The PR title "build eddy from scratch" doesn't reflect the actual changes.

Suggestion:

"Add Parakeet v3 model support and improve setup documentation"
Follow conventional commits: feat: add parakeet-v3 model support

16. Code Comments ✅

Comments are generally good and explain the "why" not just the "what". The preprocessor workaround comment could be more detailed.

Summary

Strengths:

Documentation improvements are very helpful
Model configuration abstraction is clean
Existing overflow protection maintained
Dynamic shape handling is thoughtful

Must Fix Before Merge:

Optimize audio segment copying (Issue Parakeet TDT v2 implementation with chunking, #1)
Investigate and document the "v3 preprocessor bug" root cause
Add overflow check for padding calculation (Issue #10)

Should Fix:
4. Add at least basic unit tests for the new preprocessing logic
5. Run and document benchmarks comparing v2 and v3
6. Validate CLI model input against MODEL_MAP
7. Document when to use v2 vs v3

Nice to Have:
8. Extract magic numbers into named constants
9. Better commit message
10. Test with various audio lengths to validate the workaround

Overall Assessment: The functionality appears correct, but the preprocessing workaround is concerning without understanding the root cause. The documentation improvements are valuable. I'd recommend testing more thoroughly and adding some basic unit tests before merging.

Would you like me to help implement any of these suggestions?

BrandonWeng

good call

tried to use v2 and v3 from scratch

b2cbf16

Alex-Wengg marked this pull request as ready for review November 6, 2025 06:29

BrandonWeng approved these changes Nov 10, 2025

View reviewed changes

Alex-Wengg merged commit 56281eb into main Nov 10, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build eddy from scratch #7

build eddy from scratch #7

Uh oh!

Alex-Wengg commented Nov 6, 2025

Uh oh!

claude bot commented Nov 6, 2025

Uh oh!

BrandonWeng left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

build eddy from scratch #7

build eddy from scratch #7

Uh oh!

Conversation

Alex-Wengg commented Nov 6, 2025

Uh oh!

claude bot commented Nov 6, 2025

Pull Request Review

Overview

Critical Issues

1. Workaround Needs Better Investigation (parakeet_preprocessor.cpp:47-56)

2. Magic Number for Dynamic Encoder Fallback (parakeet_openvino.cpp:395-399)

3. Inconsistent Windowing Logic (parakeet_preprocessor.cpp:79-83)

Code Quality Issues

4. Input Validation Missing (parakeet_cli.cpp:57-65)

5. Hardcoded Paths in Documentation (README.md:79)

6. Incorrect Comment (README.md:147)

Performance Considerations

7. Unnecessary Audio Copy

8. No Benchmark for V3

Security Concerns

9. Integer Overflow Checks ✅

10. Input Size Validation ⚠️

Test Coverage

11. No Automated Tests ❌

12. Manual Testing Status

Documentation

13. README Improvements ✅

14. Missing Details

Best Practices

15. Commit Message

16. Code Comments ✅

Summary

Uh oh!

BrandonWeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Workaround Needs Better Investigation (`parakeet_preprocessor.cpp:47-56`)

2. Magic Number for Dynamic Encoder Fallback (`parakeet_openvino.cpp:395-399`)

3. Inconsistent Windowing Logic (`parakeet_preprocessor.cpp:79-83`)

4. Input Validation Missing (`parakeet_cli.cpp:57-65`)

5. Hardcoded Paths in Documentation (`README.md:79`)

6. Incorrect Comment (`README.md:147`)