- 
                Notifications
    You must be signed in to change notification settings 
- Fork 31k
Validate processing kwargs with @strict from huggingface_hub #40793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate processing kwargs with @strict from huggingface_hub #40793
Conversation
| The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. | 
| @gante I think you’re the best person to review this. One issue I ran into is with fields that rely on safe-imports (e.g. torch or PIL objects). In CI they're failing rn with errors like  I suggest we skip the auto-validation when the type hint resolves to a  | 
| Two global comments from a brief review of the PR: 
 Other than that, data validation 🫶 | 
| 
 I wanted to do that as my first option though that would mean we will have to delete typed dicts and we will lose beautiful type hinting in  
 Thanks, I will take a look. TBH these can be evaluated with Python but that requires us to have the package in the namespace and already imported. Sometimes the packages are imported in another file from which we import to another and so on.... Typical large codebase. So this caused issues because python couldn't resolve the package name. I will definitely try your way as well | 
| 
 That's a good point, and important for DevX. In that case, I'd suggest documenting this design decision in  Ping me again when you're happy with a solution for the forward references, so I can nitpick a version closer to its merging state :D | 
| @gante Updated the docs. For the second point on safe type checks: Using  For ex, if we import  I couldn't find a way to trigger python to resolve annotations using imported modules's namespaces as well, have you encountered anything similar prev? | 
| @zucchini-nlp I haven't 🤔 (Then maybe the solution for the future is to add forward reference resolution to the hub code) | 
| Yep, that is what I also wanted. Though it means we need to first wait for the next hub release and pin the version in out deps. Let me submit a PR on hub and see | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added a few nits 🤗
I like that adding the validators actually resulted in catching many issues with type hints and test values 🫶
Missing: minimal tests for TypedDictAdapter (e.g. that it raises an exception when it should)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, meant to approve!
| Hey @zucchini-nlp @gante, I have now opened huggingface/huggingface_hub#3408 to validate TypedDict schemas directly from  from typing import Annotated, TypedDict
from huggingface_hub.dataclasses import validate_typed_dict
def positive_int(value: int):
    if not value >= 0:
        raise ValueError(f"Value must be positive, got {value}")
class User(TypedDict):
    name: str
    age: Annotated[int, positive_int]
# Valid data
validate_typed_dict(User, {"name": "John", "age": 30})Could you have a look at this PR huggingface/huggingface_hub#3408 and let me know what you think? Would be even very nice if you could install it locally and adapt this branch to check that is correctly handles validation. Note: I suppose at some point you could even have a mixin class that automatically validates the  | 
| @Wauplin hi, will there be a  | 
| Hey @zucchini-nlp , sorry for the delay. The  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Massive work @zucchini-nlp !! Left a couple comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good from an huggingface_hub integration perspective but I left a comment regarding optional values. Let me know what you think :)
| [For maintainers] Suggested jobs to run (before merge) run-slow: aria, beit, bridgetower, cohere2_vision, conditional_detr, convnext, deepseek_vl, deepseek_vl_hybrid, deformable_detr, detr, dia, donut, dpt, efficientloftr, efficientnet, emu3 | 
| Yay! Glad to see  | 
…face#40793) * initial design draft * delete * fix a few tests * fix * fix the rest of tests * common-kwargs * why the runner complains about typing with "|"? * revert * forgot to delete * update * fix last issues * add more detalis in docs * pin the latest hub release * fix tests for new models * also fast image processor * fix copies * image processing ast validated * fix more tests * typo.and fix copies * bump * style * fix some tests * fix copies * pin rc4 and mark all TypedDict as non-total * delete typed dict adaptor * address comments * delete optionals
…face#40793) * initial design draft * delete * fix a few tests * fix * fix the rest of tests * common-kwargs * why the runner complains about typing with "|"? * revert * forgot to delete * update * fix last issues * add more detalis in docs * pin the latest hub release * fix tests for new models * also fast image processor * fix copies * image processing ast validated * fix more tests * typo.and fix copies * bump * style * fix some tests * fix copies * pin rc4 and mark all TypedDict as non-total * delete typed dict adaptor * address comments * delete optionals
What does this PR do?
Draft PR which will allow us to have a strict type validation on all processing kwargs without having to add a dataclass object for each. The idea is to keep
TypedDictfor hinting and dynamically adapt aTypedDictto be compatible withhuggingface_hub.strictvalidatorsThis will allow us to get rid of some validations we already have in vision processing and enforce a better validation on all kwargs
For reviewers: I recommend to start from
/utils/type_validators.pyandprocessing_utils.py. The model files just fix incorrect and incomplete type hints we had