Fix quantization failure for GraniteMoeHybrid models by upgrading llm… by j4ys0n · Pull Request #15 · MissionSquad/msquant

j4ys0n · 2025-11-05T19:23:13Z

…compressor

Root Cause (VERIFIED):
The error "torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow" occurs because _update_mamba_mask() in GraniteMoeHybrid models contains control flow that cannot be traced by torch.fx.

Investigation Process:

Searched llmcompressor issue #1603 and PR #1599 for similar fixes
Found DatasetArguments.tracing_ignore list in llmcompressor source
Verified _update_mamba_mask was added in commit 4cfc0e6 (Oct 14, 2025)
Confirmed latest PyPI release (0.8.1, Oct 8, 2025) predates the fix

The Fix (VERIFIED):
Install llmcompressor from git main branch instead of PyPI to get commit 4cfc0e6 which adds "_update_mamba_mask" to the default tracing_ignore list in DatasetArguments.

Changes:

Added git to system packages (required for pip git+https install)
Changed from: pip install "llmcompressor>=0.8.0"
Changed to: pip install git+https://github.com/vllm-project/llm-compressor.git

This ensures the quantization engine skips tracing _update_mamba_mask during AWQ sequential tracing, preventing the TraceError.

Reference: vllm-project/llm-compressor@4cfc0e6

Copilot

Pull Request Overview

This PR updates the GPU Docker configuration to install llmcompressor from the git main branch instead of a PyPI release to incorporate a critical bug fix for _update_mamba_mask that's not yet available in the published package.

Adds git as a system dependency to support installing from GitHub repositories
Changes llmcompressor installation from PyPI version constraint to git repository installation
Adds documentation explaining why the git installation is necessary

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-05T19:24:23Z

docker/Dockerfile.gpu

+# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
+# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)


The date 'Oct 14, 2025' appears to be a typo. Since the current date is November 2025, this likely refers to October 2024. Please verify and correct the year.

Suggested change

# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore

# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)

# Commit 4cfc0e6 (Oct 14, 2024) added _update_mamba_mask to tracing_ignore

# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2024)

Copilot · 2025-11-05T19:24:23Z

docker/Dockerfile.gpu

+# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
+# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)


The date 'Oct 8, 2025' appears to be a typo. This likely refers to October 2024. Please verify and correct the year.

Suggested change

# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore

# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)

# Commit 4cfc0e6 (Oct 14, 2024) added _update_mamba_mask to tracing_ignore

# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2024)

Copilot · 2025-11-05T19:24:23Z

docker/Dockerfile.gpu

+# Install llmcompressor from git main branch to get the _update_mamba_mask fix
+# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
+# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)
+RUN pip install git+https://github.com/vllm-project/llm-compressor.git


Installing from the main branch without pinning to a specific commit or tag can lead to unpredictable builds and potential breaking changes. Consider pinning to the specific commit (4cfc0e6) mentioned in the comment: RUN pip install git+https://github.com/vllm-project/llm-compressor.git@4cfc0e6

Suggested change

RUN pip install git+https://github.com/vllm-project/llm-compressor.git

RUN pip install git+https://github.com/vllm-project/llm-compressor.git@4cfc0e6

j4ys0n · 2025-11-05T19:24:43Z

Automated review 🤖

Summary of Changes
This PR resolves a quantization failure in GraniteMoeHybrid models by upgrading the llmcompressor dependency to a version that includes a fix for tracing control flow in _update_mamba_mask. The change updates the Dockerfile to install llmcompressor directly from the git main branch, which contains the necessary patch.

Key Changes & Positives

🟢 Added git to system packages to support pip install git+https://...
🟢 Updated llmcompressor installation to use the git main branch to include the fix for _update_mamba_mask tracing
🟢 Includes inline comments explaining the root cause and fix, improving maintainability

Potential Issues & Recommendations

Issue / Risk: Installing llmcompressor from git+https://github.com/vllm-project/llm-compressor.git may introduce instability or breakage if the main branch is not stable.
Impact: Could lead to unexpected behavior or build failures in production environments.
Recommendation: Pin to a specific commit hash instead of using main for better reproducibility and stability.
Status: 🟡 Needs review
Issue / Risk: The fix relies on a future commit that has not yet been released to PyPI.
Impact: May cause issues if the commit is reverted or modified in the future.
Recommendation: Add a comment or documentation to track when this can be reverted to a PyPI release.
Status: 🟡 Needs review

Language/Framework Checks

Python: ✅ No issues found in Python usage or dependency handling.
Docker: ✅ Correctly adds git to system packages and uses pip install git+https for dependency.

Security & Privacy

No security or privacy concerns introduced.

Build/CI & Ops

No build or CI changes included.
The Dockerfile change is sufficient for the fix.

Tests

No tests added or modified.
Recommend adding a test case to verify quantization of GraniteMoeHybrid models works after this change.

Approval Recommendation
Approve with caveats

Pin llmcompressor to a specific commit hash for stability
Add documentation or tracking for when this can be reverted to a PyPI release
Consider adding a test case for quantization of GraniteMoeHybrid models

…compressor Root Cause (VERIFIED): The error "torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow" occurs because _update_mamba_mask() in GraniteMoeHybrid models contains control flow that cannot be traced by torch.fx. Investigation Process: 1. Searched llmcompressor issue #1603 and PR #1599 for similar fixes 2. Found DatasetArguments.tracing_ignore list in llmcompressor source 3. Verified _update_mamba_mask was added in commit 4cfc0e6 (Oct 14, 2025) 4. Confirmed latest PyPI release (0.8.1, Oct 8, 2025) predates the fix The Fix (VERIFIED): Install llmcompressor from git main branch instead of PyPI to get commit 4cfc0e6 which adds "_update_mamba_mask" to the default tracing_ignore list in DatasetArguments. Changes: - Added git to system packages (required for pip git+https install) - Changed from: pip install "llmcompressor>=0.8.0" - Changed to: pip install git+https://github.com/vllm-project/llm-compressor.git This ensures the quantization engine skips tracing _update_mamba_mask during AWQ sequential tracing, preventing the TraceError. Reference: vllm-project/llm-compressor@4cfc0e6

j4ys0n · 2025-11-05T19:29:36Z

Automated review 🤖

Summary of Changes
This PR fixes a quantization failure for GraniteMoeHybrid models by upgrading the llmcompressor dependency to a version that includes a fix for tracing control flow in _update_mamba_mask. The change updates the Dockerfile to install llmcompressor from a specific Git commit that resolves a torch.fx.proxy.TraceError.

Key Changes & Positives

🟢 Added git to system packages to support pip install git+https
🟢 Updated llmcompressor installation to use a Git commit (4cfc0e6) that includes the fix for _update_mamba_mask tracing
🟢 Clearly documented the root cause and the fix in the Dockerfile comments

Potential Issues & Recommendations

Issue / Risk: Using a specific Git commit for dependency installation can lead to instability or drift if not pinned properly.
Impact: The build may break if the commit is removed or modified upstream.
Recommendation: Consider using a tagged release or a more stable version pin if available.
Status: 🟡 Needs review
Issue / Risk: The Dockerfile installs git only for this one dependency, which may not be ideal for long-term maintainability.
Impact: Could lead to confusion or missed dependencies in other parts of the build.
Recommendation: Evaluate if git should be a permanent system dependency or if the installation method can be refactored.
Status: 🟡 Needs review

Language/Framework Checks

Python: Dependency installation via pip install git+https://... is valid and aligns with the fix.
Docker: The Dockerfile is well-structured and follows best practices for layering and cleanup.

Security & Privacy

No new secrets or sensitive data introduced.
Using a Git commit hash ensures reproducibility and avoids pulling unverified code.

Build/CI & Ops

The change ensures that the quantization pipeline works for GraniteMoeHybrid models.
No breaking changes to the build process or runtime behavior.

Tests

No new tests added; however, the fix is verified through the root cause analysis and the specific commit.
Ensure that the quantization job for GraniteMoeHybrid models is tested in CI.

Approval Recommendation
Approve with caveats

Confirm that the Git commit hash is stable and will not be removed from the repo
Evaluate if git should be a permanent system dependency or if there's a better way to manage this install
Ensure that the quantization job for GraniteMoeHybrid models is validated in CI after this change

Copilot AI review requested due to automatic review settings November 5, 2025 19:23

Copilot AI reviewed Nov 5, 2025

View reviewed changes

j4ys0n force-pushed the claude/debug-quantize-job-failure-011CUpArWK9yCugt4gXXGchh branch from 362cf8c to a39e51e Compare November 5, 2025 19:28

j4ys0n merged commit b6ee937 into main Nov 5, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quantization failure for GraniteMoeHybrid models by upgrading llm…#15

Fix quantization failure for GraniteMoeHybrid models by upgrading llm…#15
j4ys0n merged 1 commit intomainfrom
claude/debug-quantize-job-failure-011CUpArWK9yCugt4gXXGchh

j4ys0n commented Nov 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

Copilot AI Nov 5, 2025

Uh oh!

j4ys0n commented Nov 5, 2025

Uh oh!

j4ys0n commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
		# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)

	RUN pip install git+https://github.com/vllm-project/llm-compressor.git
	RUN pip install git+https://github.com/vllm-project/llm-compressor.git@4cfc0e6

Conversation

j4ys0n commented Nov 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

j4ys0n commented Nov 5, 2025

Uh oh!

j4ys0n commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants