-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow per item validators to be specified by string #419
base: master
Are you sure you want to change the base?
Conversation
This latest version should pass all tests, including the new one, at least on 3.9-3.11 (I don't have 3.8 installed to test but it should be fine). |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #419 +/- ##
=======================================
Coverage 79.39% 79.39%
=======================================
Files 76 76
Lines 3222 3222
Branches 534 534
=======================================
Hits 2558 2558
Misses 593 593
Partials 71 71
☔ View full report in Codecov by Sentry. |
'DummyItem': '/path/to/dummyitem_schema.json', | ||
'OtherItem': '/path/to/otheritem_schema.json', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… I don’t like that this is done by name instead of import path. Would you be OK with switching approaches? It is how Scrapy settings handle such situations (in fact, Scrapy settings first supported import paths as strings, and later also added support for actual objects).
@@ -56,7 +56,7 @@ def set_validators(loader, schema): | |||
if type(schema) in (list, tuple): | |||
schema = {Item: schema} | |||
for obj, paths in schema.items(): | |||
key = obj.__name__ | |||
key = obj.__name__ if hasattr(obj, "__name__") else str(obj) | |||
paths = paths if type(paths) in (list, tuple) else [paths] | |||
objects = [loader(v) for v in paths] | |||
validators[key].extend(objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm…
So, I am not very familiar with Spidermon code, but it feels wrong that the pre-existing code was only taking into account obj.__name__
. Sounds like different item types with the same name but imported from different modules would be validated with the same schema.
I find the pre-existing code a bit hard to read, though, so I might be misreading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your read is correct. The find_validators
method at line 126 only uses item.__class__.__name__
to select a validator out of the dict so same name different module still gets the same schema. This is why I just went with simple class names in the string alternative because they were already being stored that way internally so the change was small.
I like the idea of using fully qualified import paths as keys but it would require some careful thought to be backwards compatible (unless we want a breaking change?). Internally we'd have to store validators using the import path but then during lookup if we fail to find a given item class we could loop over keys and check just the class name part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it might be cleaner to just store two keys internally, fully qualified and class name and then look them up in that order. No more looping over strings and splitting them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came back to look at this again finally and this is a bigger lift than I have time to work on right now, especially being less familiar with this code myself. Besides requiring changing the existing internal representation and all the related tests, it would likely be considered a breaking change given that someone could be inadvertently relying on the (admittedly bad) behavior of mapping only by the unqualified class name and would need some careful documentation.
Can I request that you either accept this PR as is since it's consistent with the existing code or reject it and turn @Gallaecio's request into it's own issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about we keep the current behavior, but we require strings to be complete import parts, we load them with load_object
, and then use their name?
obj = load_object(obj)
key = obj.__name__
It’s backward-compatible, and while not fixing the “comparison by name” issue now, it does allow us to fix it in the future if we want without requiring user code to change, which would not be possible if we started allowing class names as strings now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like it would work well as a compromise keeping existing behavior intact but solving for the problem I was originally dealing with.
Co-authored-by: Adrián Chaves <adrian@chaves.io>
We ran into a problem where another extension required our settings to be json serializable which was barfing on the class names used for specifying json schemas. I'm separately working on a fix for that extension but since under the hood the classes are just getting converted to
obj.__name__
anyway I wondered why I couldn't just specify them that way to begin with.