Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow per item validators to be specified by string #419

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rvandam
Copy link

@rvandam rvandam commented Sep 13, 2023

We ran into a problem where another extension required our settings to be json serializable which was barfing on the class names used for specifying json schemas. I'm separately working on a fix for that extension but since under the hood the classes are just getting converted to obj.__name__ anyway I wondered why I couldn't just specify them that way to begin with.

@rvandam rvandam changed the title Master Allow per item validators to be specified by string Sep 13, 2023
@rvandam
Copy link
Author

rvandam commented Sep 14, 2023

This latest version should pass all tests, including the new one, at least on 3.9-3.11 (I don't have 3.8 installed to test but it should be fine).

@codecov
Copy link

codecov bot commented Sep 15, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (6edb037) 79.39% compared to head (a75e6fb) 79.39%.

❗ Current head a75e6fb differs from pull request most recent head 5acd876. Consider uploading reports for the commit 5acd876 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #419   +/-   ##
=======================================
  Coverage   79.39%   79.39%           
=======================================
  Files          76       76           
  Lines        3222     3222           
  Branches      534      534           
=======================================
  Hits         2558     2558           
  Misses        593      593           
  Partials       71       71           
Files Changed Coverage Δ
spidermon/contrib/scrapy/pipelines.py 97.80% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/source/item-validation.rst Outdated Show resolved Hide resolved
Comment on lines +149 to +150
'DummyItem': '/path/to/dummyitem_schema.json',
'OtherItem': '/path/to/otheritem_schema.json',
Copy link
Member

@Gallaecio Gallaecio Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… I don’t like that this is done by name instead of import path. Would you be OK with switching approaches? It is how Scrapy settings handle such situations (in fact, Scrapy settings first supported import paths as strings, and later also added support for actual objects).

@@ -56,7 +56,7 @@ def set_validators(loader, schema):
if type(schema) in (list, tuple):
schema = {Item: schema}
for obj, paths in schema.items():
key = obj.__name__
key = obj.__name__ if hasattr(obj, "__name__") else str(obj)
paths = paths if type(paths) in (list, tuple) else [paths]
objects = [loader(v) for v in paths]
validators[key].extend(objects)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm…

So, I am not very familiar with Spidermon code, but it feels wrong that the pre-existing code was only taking into account obj.__name__. Sounds like different item types with the same name but imported from different modules would be validated with the same schema.

I find the pre-existing code a bit hard to read, though, so I might be misreading.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your read is correct. The find_validators method at line 126 only uses item.__class__.__name__ to select a validator out of the dict so same name different module still gets the same schema. This is why I just went with simple class names in the string alternative because they were already being stored that way internally so the change was small.

I like the idea of using fully qualified import paths as keys but it would require some careful thought to be backwards compatible (unless we want a breaking change?). Internally we'd have to store validators using the import path but then during lookup if we fail to find a given item class we could loop over keys and check just the class name part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it might be cleaner to just store two keys internally, fully qualified and class name and then look them up in that order. No more looping over strings and splitting them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came back to look at this again finally and this is a bigger lift than I have time to work on right now, especially being less familiar with this code myself. Besides requiring changing the existing internal representation and all the related tests, it would likely be considered a breaking change given that someone could be inadvertently relying on the (admittedly bad) behavior of mapping only by the unqualified class name and would need some careful documentation.

Can I request that you either accept this PR as is since it's consistent with the existing code or reject it and turn @Gallaecio's request into it's own issue?

Copy link
Member

@Gallaecio Gallaecio Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about we keep the current behavior, but we require strings to be complete import parts, we load them with load_object, and then use their name?

obj = load_object(obj)
key = obj.__name__

It’s backward-compatible, and while not fixing the “comparison by name” issue now, it does allow us to fix it in the future if we want without requiring user code to change, which would not be possible if we started allowing class names as strings now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it would work well as a compromise keeping existing behavior intact but solving for the problem I was originally dealing with.

Co-authored-by: Adrián Chaves <adrian@chaves.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants