Skip to content

Conversation

@bolkedebruin
Copy link
Contributor

Regexps can be tough to get right. Typically someone would like to allow any classes below 'mymodule' to match. For example, 'mymodule.dataclasses' by setting allowed_deserialization_classes to 'mymodule.*'. However this matches everything starting with mymodule, so also mymodulemalicious. This change replaces bare '.' with '..' so it matches the literal '.' as well.

@kaxil


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Regexps can be tough to get right. Typically someone would like
to allow any classes below 'mymodule' to match. For example,
'mymodule.dataclasses' by setting allowed_deserialization_classes
to 'mymodule.*'. However this matches everything starting with
mymodule, so also mymodulemalicious. This change replaces
bare '.' with '\..' so it matches the literal '.' as well.
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@bolkedebruin bolkedebruin merged commit 12adb5e into apache:main Jan 10, 2023
@bolkedebruin bolkedebruin deleted the intuitive branch January 10, 2023 20:17
@kaxil kaxil added this to the Airflow 2.5.2 milestone Jan 15, 2023
@pierrejeambrun pierrejeambrun added the type:misc/internal Changelog: Misc changes that should appear in change log label Feb 27, 2023
@pierrejeambrun
Copy link
Member

Depends on #28829

@barrysteyn
Copy link

barrysteyn commented Sep 4, 2023

@kaxil @bolkedebruin

This change has totally broken things... for example, I have a class that needs to be on the allowed_deserialization_classes list... the error message I get is libs.dynamodb.models.ideas.IdeasModel is not on the allowed list.. I do understand that this is regex that is being matched against, however altering the regex is not (in my opinion) good. Firstly, people who are doing this are technical and should know that a . matches anything.

But things seem broken... For example, I have tested things by putting the following in theallowed_deserialization_classes (remember I am testing libs.dynamodb.models.ideas.IdeasModel for a match):

  1. libs.dynamodb.models.ideas.IdeasModel --> False (not expected)
  2. libs[.]dynamodb[.]models[.]ideas[.]IdeasModel --> True (weird)
  3. libs.dynamodb.* --> False
  4. libs[.]dynamodb.* --> True
  5. libs.* --> True (this mimic'd the unit test of this change).

It is obvious that the replace . with .. is the culprit. Instead, I suggest we undo the changes, and rather document in detail on how to use the regex.

@barrysteyn
Copy link

So, this is the regex replace for this PR:
p = re.sub(r"(\w)\.", r"\1\..", p)

I am no expert, but it seems to me that it replaces word. with word\.. - I think the idea is to match ., however all that does is ensure that there must be a . followed by any character... That is why things are failing...

I don't advocate for changing anything, but if you really want to match just ., then the regex should be p = re.sub(r"(\w)\.", r"\1[.]", p)

@barrysteyn
Copy link

At the very least, if I am using things incorrectly, then add more tests to explain how it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:serialization type:misc/internal Changelog: Misc changes that should appear in change log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants