Skip to content

Conversation

Rocketknight1
Copy link
Member

@Rocketknight1 Rocketknight1 commented Mar 25, 2025

This PR replaces three more problematic regexes. In all cases I eliminate the regex entirely.

When parsing Python files for import statements we now rely on ast.parse() instead. This is the most complex code but it seems to pass tests.

In chat.py I think I figured out what the regex was doing and wrote a Python method to duplicate its functionality - cc @gante to confirm I got it right!

Finally, the change in configuration_utils.py was very straightforward; we could just use str.startswith() and str.endswith() instead.

@Rocketknight1 Rocketknight1 marked this pull request as ready for review March 25, 2025 15:07
@github-actions github-actions bot requested a review from ArthurZucker March 25, 2025 15:07
@Rocketknight1 Rocketknight1 removed the request for review from ArthurZucker March 25, 2025 15:07
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with config changes!
imports should be tested in terms of speed. Ast is a bit slow as I remember it but for something lile this we can cache / optimize!
cc @LysandreJik

@Rocketknight1
Copy link
Member Author

ast.parse() seems quite performant for me! I parsed modeling_phi4.py (2.3k LOC) in ~10ms.

@gante
Copy link
Member

gante commented Mar 26, 2025

@Rocketknight1 yeah, the chat changes look good to me

@Rocketknight1 Rocketknight1 force-pushed the more-redos-purging branch 2 times, most recently from f52247a to a6d5eba Compare March 27, 2025 13:30
@Rocketknight1
Copy link
Member Author

One more performance note: Since most Transformers code is contained within classes, we can speed up walking the ast tree a lot if we just don't recurse into classes. This will miss imports inside classes, but I don't know if we count those as required imports or not, and probably we don't want to?

However, I don't think this is really that important - parsing and walking the tree takes << 1s for me. I'm not even sure it's slower than the regex solution!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM then! Parsing a single file is fast!

@Rocketknight1 Rocketknight1 force-pushed the more-redos-purging branch 4 times, most recently from 3378c82 to ce86970 Compare April 2, 2025 14:19
@Rocketknight1 Rocketknight1 merged commit 126abe3 into main Apr 2, 2025
21 checks passed
@Rocketknight1 Rocketknight1 deleted the more-redos-purging branch April 2, 2025 17:46
@Rocketknight1
Copy link
Member Author

cc @Michellehbn merged! Should be in the next release and then we can close out those issues

duanjunwen pushed a commit to duanjunwen/transformers that referenced this pull request Apr 3, 2025
* More ReDOS fixes!

* Slight regex cleanup

* Cleanup regex replacement

* Drop that regex entirely too

* The regex didn't match config.json, let's make sure we don't either

* Cleanup allowed_value_chars a little

* Cleanup the import search

* Catch multi-condition blocks too

* Trigger tests

* Trigger tests
@Spico197
Copy link

Spico197 commented Apr 3, 2025

@Rocketknight1

Hi Matt, I find the ast-style parsing would introduce an error if a function is ast.Attribute rather than a ast.Name. Here's a minimal reproduction:

import torch.nn as nn


class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 10)

    def forward(self, x):
        if x.size() == 1:   # error
            raise ValueError("x.size() == 1")
        return self.linear(x)

When I try to run get_imports("debug_model.py"), it would lead to this error. Here I changed the code a little bit to help debugging. Maybe we should add an additional if-else judgment to check if a func is ast.Name ?

image

@Rocketknight1
Copy link
Member Author

Hi @Spico197, yes, we noticed that bug too! My fault - I fixed it here: #37245

@Spico197
Copy link

Spico197 commented Apr 3, 2025

Hi @Spico197, yes, we noticed that bug too! My fault - I fixed it here: #37245

Thanks for the prompt reply! Really impressive fast feedback~

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* More ReDOS fixes!

* Slight regex cleanup

* Cleanup regex replacement

* Drop that regex entirely too

* The regex didn't match config.json, let's make sure we don't either

* Cleanup allowed_value_chars a little

* Cleanup the import search

* Catch multi-condition blocks too

* Trigger tests

* Trigger tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants