Skip to content

Conversation

@kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Jun 26, 2025

Purpose

Background

In the latest transformers release, this change landed which changed the name of the function which generates the causal mask. huggingface/transformers#37866

Changes

  • Extend the list of function names to ignore during tracing, specifically targeting functions which create causal masks
  • Update debugger tool to use ignore list from DatasetArguments
  • Update Tracer to skip masking function as part of autowrapping any functions which were not caught by the autowrapper

Testing

  • tests/llmcompressor/transformers/tracing/test_models.py now passes with the latest transformers==4.53

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@kylesayrs kylesayrs changed the title add create_causal_mask [Tracing] Update ignored functions list Jun 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kylesayrs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical tracing issue encountered with Llama models by updating the application's configuration. It specifically enhances the list of function names that are ignored during the tracing process, particularly those involved in the creation and manipulation of causal attention masks. This change ensures compatibility with recent updates in external libraries, thereby enabling successful and accurate tracing of Llama models.

Highlights

  • Tracing Configuration Update: Expanded the tracing_ignore list within the DatasetArguments class in src/llmcompressor/args/dataset_arguments.py. This update adds several function names related to causal mask generation (e.g., create_causal_mask, make_causal_mask, get_causal_mask, mask_function, _prepare_4d_causal_attention_mask, _prepare_fsmt_decoder_inputs, _prepare_4d_causal_attention_mask_with_cache_position) to the list of functions to be ignored during tracing.
  • Llama Model Tracing Fix: The primary purpose of this change is to fix tracing issues for Llama models, specifically addressing a recent change in the huggingface/transformers library where the function name for generating causal masks was updated. By ignoring these new function names, Llama models can now be successfully traced.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@kylesayrs kylesayrs added the ready When a PR is ready for review label Jun 26, 2025
rahul-tuli
rahul-tuli previously approved these changes Jun 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request updates the tracing_ignore list in DatasetArguments to include additional functions related to causal masks. This change aims to fix tracing for llama models by ignoring specific functions that create causal masks. I suggested using a set instead of a list for tracing_ignore to improve performance.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs dismissed stale reviews from brian-dellabetta and rahul-tuli via c8d7323 June 26, 2025 18:21
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
rahul-tuli
rahul-tuli previously approved these changes Jun 27, 2025
@rahul-tuli rahul-tuli self-requested a review June 27, 2025 14:20
@rahul-tuli
Copy link
Collaborator

The transformers test still seem to be failing, is that expected? changes look good otherwise

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs enabled auto-merge (squash) July 1, 2025 16:03
@kylesayrs kylesayrs merged commit 9e8210b into main Jul 1, 2025
10 checks passed
@kylesayrs kylesayrs deleted the kylesayrs/update-tracing_ignore branch July 1, 2025 16:50
dsikka added a commit that referenced this pull request Jul 7, 2025
## Prerequisites ##
* #1599

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
aireilly pushed a commit to aireilly/llm-compressor that referenced this pull request Jul 30, 2025
## Purpose ##
* Fix tracing for model definitions introduced as part of
`transformers==4.53`
* Resolves vllm-project#1603

## Background ##
In the latest transformers release, this change landed which changed the
name of the function which generates the causal mask.
huggingface/transformers#37866

## Changes ##
* Extend the list of function names to ignore during tracing,
specifically targeting functions which create causal masks
* Update debugger tool to use ignore list from `DatasetArguments`
* Update Tracer to skip masking function as part of autowrapping any
functions which were not caught by the autowrapper

## Testing ##
* `tests/llmcompressor/transformers/tracing/test_models.py` now passes
with the latest `transformers==4.53`

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
aireilly pushed a commit to aireilly/llm-compressor that referenced this pull request Jul 30, 2025
## Prerequisites ##
* vllm-project#1599

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

I got "torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow" error

4 participants