Skip to content

Fix quantization failure for GraniteMoeHybrid models by upgrading llm…#15

Merged
j4ys0n merged 1 commit intomainfrom
claude/debug-quantize-job-failure-011CUpArWK9yCugt4gXXGchh
Nov 5, 2025
Merged

Fix quantization failure for GraniteMoeHybrid models by upgrading llm…#15
j4ys0n merged 1 commit intomainfrom
claude/debug-quantize-job-failure-011CUpArWK9yCugt4gXXGchh

Conversation

@j4ys0n
Copy link
Contributor

@j4ys0n j4ys0n commented Nov 5, 2025

…compressor

Root Cause (VERIFIED):
The error "torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow" occurs because _update_mamba_mask() in GraniteMoeHybrid models contains control flow that cannot be traced by torch.fx.

Investigation Process:

  1. Searched llmcompressor issue #1603 and PR #1599 for similar fixes
  2. Found DatasetArguments.tracing_ignore list in llmcompressor source
  3. Verified _update_mamba_mask was added in commit 4cfc0e6 (Oct 14, 2025)
  4. Confirmed latest PyPI release (0.8.1, Oct 8, 2025) predates the fix

The Fix (VERIFIED):
Install llmcompressor from git main branch instead of PyPI to get commit 4cfc0e6 which adds "_update_mamba_mask" to the default tracing_ignore list in DatasetArguments.

Changes:

This ensures the quantization engine skips tracing _update_mamba_mask during AWQ sequential tracing, preventing the TraceError.

Reference: vllm-project/llm-compressor@4cfc0e6

Copilot AI review requested due to automatic review settings November 5, 2025 19:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the GPU Docker configuration to install llmcompressor from the git main branch instead of a PyPI release to incorporate a critical bug fix for _update_mamba_mask that's not yet available in the published package.

  • Adds git as a system dependency to support installing from GitHub repositories
  • Changes llmcompressor installation from PyPI version constraint to git repository installation
  • Adds documentation explaining why the git installation is necessary

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 37 to 38
# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date 'Oct 14, 2025' appears to be a typo. Since the current date is November 2025, this likely refers to October 2024. Please verify and correct the year.

Suggested change
# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)
# Commit 4cfc0e6 (Oct 14, 2024) added _update_mamba_mask to tracing_ignore
# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2024)

Copilot uses AI. Check for mistakes.
Comment on lines 37 to 38
# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date 'Oct 8, 2025' appears to be a typo. This likely refers to October 2024. Please verify and correct the year.

Suggested change
# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)
# Commit 4cfc0e6 (Oct 14, 2024) added _update_mamba_mask to tracing_ignore
# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2024)

Copilot uses AI. Check for mistakes.
# Install llmcompressor from git main branch to get the _update_mamba_mask fix
# Commit 4cfc0e6 (Oct 14, 2025) added _update_mamba_mask to tracing_ignore
# This fix is not yet in the latest PyPI release (0.8.1, Oct 8, 2025)
RUN pip install git+https://github.com/vllm-project/llm-compressor.git
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing from the main branch without pinning to a specific commit or tag can lead to unpredictable builds and potential breaking changes. Consider pinning to the specific commit (4cfc0e6) mentioned in the comment: RUN pip install git+https://github.com/vllm-project/llm-compressor.git@4cfc0e6

Suggested change
RUN pip install git+https://github.com/vllm-project/llm-compressor.git
RUN pip install git+https://github.com/vllm-project/llm-compressor.git@4cfc0e6

Copilot uses AI. Check for mistakes.
@j4ys0n
Copy link
Contributor Author

j4ys0n commented Nov 5, 2025

Automated review 🤖

Summary of Changes
This PR resolves a quantization failure in GraniteMoeHybrid models by upgrading the llmcompressor dependency to a version that includes a fix for tracing control flow in _update_mamba_mask. The change updates the Dockerfile to install llmcompressor directly from the git main branch, which contains the necessary patch.

Key Changes & Positives

  • 🟢 Added git to system packages to support pip install git+https://...
  • 🟢 Updated llmcompressor installation to use the git main branch to include the fix for _update_mamba_mask tracing
  • 🟢 Includes inline comments explaining the root cause and fix, improving maintainability

Potential Issues & Recommendations

  1. Issue / Risk: Installing llmcompressor from git+https://github.com/vllm-project/llm-compressor.git may introduce instability or breakage if the main branch is not stable.
    Impact: Could lead to unexpected behavior or build failures in production environments.
    Recommendation: Pin to a specific commit hash instead of using main for better reproducibility and stability.
    Status: 🟡 Needs review

  2. Issue / Risk: The fix relies on a future commit that has not yet been released to PyPI.
    Impact: May cause issues if the commit is reverted or modified in the future.
    Recommendation: Add a comment or documentation to track when this can be reverted to a PyPI release.
    Status: 🟡 Needs review

Language/Framework Checks

  • Python: ✅ No issues found in Python usage or dependency handling.
  • Docker: ✅ Correctly adds git to system packages and uses pip install git+https for dependency.

Security & Privacy

  • No security or privacy concerns introduced.

Build/CI & Ops

  • No build or CI changes included.
  • The Dockerfile change is sufficient for the fix.

Tests

  • No tests added or modified.
  • Recommend adding a test case to verify quantization of GraniteMoeHybrid models works after this change.

Approval Recommendation
Approve with caveats

  • Pin llmcompressor to a specific commit hash for stability
  • Add documentation or tracking for when this can be reverted to a PyPI release
  • Consider adding a test case for quantization of GraniteMoeHybrid models

…compressor

Root Cause (VERIFIED):
The error "torch.fx.proxy.TraceError: symbolically traced variables cannot be
used as inputs to control flow" occurs because _update_mamba_mask() in
GraniteMoeHybrid models contains control flow that cannot be traced by torch.fx.

Investigation Process:
1. Searched llmcompressor issue #1603 and PR #1599 for similar fixes
2. Found DatasetArguments.tracing_ignore list in llmcompressor source
3. Verified _update_mamba_mask was added in commit 4cfc0e6 (Oct 14, 2025)
4. Confirmed latest PyPI release (0.8.1, Oct 8, 2025) predates the fix

The Fix (VERIFIED):
Install llmcompressor from git main branch instead of PyPI to get commit 4cfc0e6
which adds "_update_mamba_mask" to the default tracing_ignore list in
DatasetArguments.

Changes:
- Added git to system packages (required for pip git+https install)
- Changed from: pip install "llmcompressor>=0.8.0"
- Changed to: pip install git+https://github.com/vllm-project/llm-compressor.git

This ensures the quantization engine skips tracing _update_mamba_mask during
AWQ sequential tracing, preventing the TraceError.

Reference: vllm-project/llm-compressor@4cfc0e6
@j4ys0n j4ys0n force-pushed the claude/debug-quantize-job-failure-011CUpArWK9yCugt4gXXGchh branch from 362cf8c to a39e51e Compare November 5, 2025 19:28
@j4ys0n
Copy link
Contributor Author

j4ys0n commented Nov 5, 2025

Automated review 🤖

Summary of Changes
This PR fixes a quantization failure for GraniteMoeHybrid models by upgrading the llmcompressor dependency to a version that includes a fix for tracing control flow in _update_mamba_mask. The change updates the Dockerfile to install llmcompressor from a specific Git commit that resolves a torch.fx.proxy.TraceError.

Key Changes & Positives

  • 🟢 Added git to system packages to support pip install git+https
  • 🟢 Updated llmcompressor installation to use a Git commit (4cfc0e6) that includes the fix for _update_mamba_mask tracing
  • 🟢 Clearly documented the root cause and the fix in the Dockerfile comments

Potential Issues & Recommendations

  1. Issue / Risk: Using a specific Git commit for dependency installation can lead to instability or drift if not pinned properly.
    Impact: The build may break if the commit is removed or modified upstream.
    Recommendation: Consider using a tagged release or a more stable version pin if available.
    Status: 🟡 Needs review

  2. Issue / Risk: The Dockerfile installs git only for this one dependency, which may not be ideal for long-term maintainability.
    Impact: Could lead to confusion or missed dependencies in other parts of the build.
    Recommendation: Evaluate if git should be a permanent system dependency or if the installation method can be refactored.
    Status: 🟡 Needs review

Language/Framework Checks

  • Python: Dependency installation via pip install git+https://... is valid and aligns with the fix.
  • Docker: The Dockerfile is well-structured and follows best practices for layering and cleanup.

Security & Privacy

  • No new secrets or sensitive data introduced.
  • Using a Git commit hash ensures reproducibility and avoids pulling unverified code.

Build/CI & Ops

  • The change ensures that the quantization pipeline works for GraniteMoeHybrid models.
  • No breaking changes to the build process or runtime behavior.

Tests

  • No new tests added; however, the fix is verified through the root cause analysis and the specific commit.
  • Ensure that the quantization job for GraniteMoeHybrid models is tested in CI.

Approval Recommendation
Approve with caveats

  • Confirm that the Git commit hash is stable and will not be removed from the repo
  • Evaluate if git should be a permanent system dependency or if there's a better way to manage this install
  • Ensure that the quantization job for GraniteMoeHybrid models is validated in CI after this change

@j4ys0n j4ys0n merged commit b6ee937 into main Nov 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants