Voxtral Realtime: Refactor StandardEncoderRingKVCache by manuelcandales · Pull Request #17729 · pytorch/executorch

manuelcandales · 2026-02-26T03:03:46Z

This pull request improves device and dtype consistency for streaming encoder buffers and cache classes in voxtral_realtime/model.py. The main changes ensure that all buffers and caches are initialized with the same dtype and device as the encoder weights, which helps prevent runtime errors and improves compatibility across different hardware and export scenarios. It also refactors the cache classes for better code reuse and maintainability.

Device and dtype consistency improvements:

The EncoderRingKVCache and StandardEncoderRingKVCache classes now accept dtype and device parameters, ensuring their buffers (k_cache, v_cache) are initialized to match the encoder's configuration.
The StreamingAudioEncoderExport class infers dtype and device from encoder weights and uses them when initializing convolution states and cache buffers, ensuring all state is consistently placed and typed.

Cache class refactoring and simplification:

StandardEncoderRingKVCache now subclasses EncoderRingKVCache to reduce code duplication, and its docstring clarifies its intended use for export scenarios.
The redundant create_causal_mask method in StandardEncoderRingKVCache is removed in favor of the inherited implementation.

Other minor improvements:

The create_causal_mask method in EncoderRingKVCache now always uses the device of the cache buffer, simplifying device management.

pytorch-bot · 2026-02-26T03:03:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17729

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures

As of commit 76c8860 with merge base bac3ad3 ():

NEW FAILURES - The following jobs have failed:

periodic / test-models-linux (buck2, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
RuntimeError: Command docker exec -t 2e6e05937857c161585cde0d357bc79dbfc4e9795fdc2278d80d0f2b91544879 /exec failed with exit code 1
periodic / test-models-linux (cmake, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
RuntimeError: Command docker exec -t f5d81f30a5840a33c8f116fca72a8a4595691a9a38cdd701c0f526e63cb1cb74 /exec failed with exit code 1
pull / test-llama-runner-linux-android / linux-job (gh)
RuntimeError: Command docker exec -t 8048f5bd865f28e005f4e2b73620eebbdf820dba70f18f4aa3d2bc8b239f85de /exec failed with exit code 1
pull / test-models-linux (add, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t 5431907fd023b1cd60d8b60878814703a01882e0f72f8fef5e90a3aee11739a6 /exec failed with exit code 1
pull / test-models-linux (emformer_join, portable, linux.4xlarge.memory) / linux-job (gh)
RuntimeError: Command docker exec -t 5ecf1a8cd63f2bc97bf45daea46f32ea8f74c7d50e7a1e52add94e093f305b5a /exec failed with exit code 1
pull / test-models-linux (mv2, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t e4020dc50e423d3f1d8466b82ef2d346791212ca58ba206914bc70cce0fb5c01 /exec failed with exit code 1
pull / test-models-linux (w2l, portable, linux.4xlarge.memory) / linux-job (gh)
RuntimeError: Command docker exec -t 693719f9c48b3dc991b1f23f794c9d499877c8dcc991cc48686ac1c3ad6820c4 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.13) / linux-job (gh)
RuntimeError: Command docker exec -t fe232f39c74e862fcb877de5410f2271525bdf2b23d31a60985ab0bed55c2d0e /exec failed with exit code 1
pull / test-selective-build-linux / linux-job (gh)
RuntimeError: Command docker exec -t f60d74f061f08afb0cfbcebe8cbb8d0eb4dc8e48b1e9cefa8c5a93b851540116 /exec failed with exit code 1
pull / test-voxtral-realtime-xnnpack-linux / linux-job (gh)
RuntimeError: Command docker exec -t f2963070ec2a32f85e0b401c6f47db2e80a10265e9121984051fbbf71b427d77 /exec failed with exit code 1
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 1cd8827bcb97afb3f9cb32e297a252fa97f787f9878c71daf6bfbc68baeb5a60 /exec failed with exit code 1
pull / unittest-arm-backend-with-no-deps (test_run_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 60a8af32b9c255ea6f14318d925a1e61bcb38a1639db1f78295e72c382570eec /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copilot

Pull request overview

This pull request refactors the streaming encoder cache classes to improve device and dtype consistency, ensuring all buffers are initialized to match the encoder's configuration. This helps prevent runtime errors and improves compatibility across different hardware and export scenarios.

Changes:

Added dtype and device parameters to cache classes (EncoderRingKVCache and StandardEncoderRingKVCache)
Modified StreamingAudioEncoderExport to infer dtype/device from encoder weights and use them consistently for all buffer initialization
Refactored StandardEncoderRingKVCache to inherit from EncoderRingKVCache, eliminating code duplication

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

manuelcandales added 2 commits February 25, 2026 21:59

consistent device

6782f74

StandardEncoderRingKVCache inherits from EncoderRingKVCache

76c8860

manuelcandales requested a review from mergennachin February 26, 2026 03:03

manuelcandales requested a review from lucylq as a code owner February 26, 2026 03:03

Copilot AI review requested due to automatic review settings February 26, 2026 03:03

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 26, 2026

Copilot started reviewing on behalf of manuelcandales February 26, 2026 03:04 View session

manuelcandales removed the request for review from lucylq February 26, 2026 03:04

manuelcandales added the release notes: none Do not include this in the release notes label Feb 26, 2026

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voxtral Realtime: Refactor StandardEncoderRingKVCache#17729

Voxtral Realtime: Refactor StandardEncoderRingKVCache#17729
manuelcandales wants to merge 2 commits intomainfrom
manuel/metal-vr-streaming-refactor

manuelcandales commented Feb 26, 2026

Uh oh!

pytorch-bot bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

manuelcandales commented Feb 26, 2026

Uh oh!

pytorch-bot bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17729

❌ 12 New Failures

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Feb 26, 2026 •

edited

Loading