-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize file utils #16264
Reorganize file utils #16264
Conversation
from ..models.auto import get_values | ||
from ..utils.versions import importlib_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was imported from the wrong place.
|
||
>>> tokenizer = {processor_class}.from_pretrained("{checkpoint}") | ||
>>> model = {model_class}.from_pretrained("{checkpoint}") | ||
# Backward compatibility imports, to make sure all those objects can be found in file_utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here to line 122 is what you should read to review this PR, no code has changed and this summarizes all the moves done.
The documentation is not available anymore as the PR was closed or merged. |
src/transformers/__init__.py
Outdated
@@ -91,6 +91,7 @@ | |||
"debug_utils": [], | |||
"dependency_versions_check": [], | |||
"dependency_versions_table": [], | |||
"doc_utils": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additions here are for the repo consistency checks to pass. When we settle on names and objects to be moved, each submodule will contain the objects of the public init it defines.
Related question: we have many files named |
src/transformers/file_utils.py
Outdated
>>> next_token_logits = outputs.logits[:, -1] | ||
``` | ||
""" | ||
MULTIPLE_CHOICE_DUMMY_INPUTS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those constants imported a lot? They are only used for TF models no? Should we maybe move them to modeling_tf_utils.py
?
Thanks a lot for working on this! Very much needed change :-) Regarding the two points above:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That file really needed to be split, indeed. Agree with both points as well (keeping it named file_utils.py
and moving everything in the utils/
directory).
src/transformers/doc_utils.py
Outdated
```python | ||
>>> from transformers import {processor_class}, {model_class} | ||
>>> import torch | ||
|
||
>>> tokenizer = {processor_class}.from_pretrained("{checkpoint}") | ||
>>> model = {model_class}.from_pretrained("{checkpoint}") | ||
|
||
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | ||
>>> labels = torch.tensor([1] * inputs["input_ids"].size(1)).unsqueeze(0) # Batch size 1 | ||
|
||
>>> outputs = model(**inputs, labels=labels) | ||
>>> loss = outputs.loss | ||
>>> logits = outputs.logits | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe at some point we could provide a better example than this 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this! I'm in favour of creating a utils subfolder.
As per your comments, I've put everything in a
If you agree, I'll clean up all internal imports then merge @patil-suraj @patrickvonplaten @LysandreJik |
@gante This is more or less legacy from where the repo was flat and had no subfolders. Some could probably go in the utils subfolder as well, though I'm keeping this PR focused on |
df51abf
to
b366d5f
Compare
* Split file_utils in several submodules * Fixes * Add back more objects * More fixes * Who exactly decided to import that from there? * Second suggestion to code with code review * Revert wront move * Fix imports * Adapt all imports * Adapt all imports everywhere * Revert this import, will fix in a separate commit
What does this PR do?
This PR reorganizes the gigantic
file_utils
in several submodules, namely:doc_utils
contain all the decorators we use for our docstrings as well as the generic examplesfile_utils
keep all Hub-related functionality (it might warrant a rename tohub_utils
see below)generic_utils
contain the "real" utils like custom Enums, to_py_obj etc.import_utils
contains all theis_xxx_available
as well as the modules that power our lazy initsTo avoid any breaking change, this is all done while reimporting the objects moved into
file_utils
. The only thing that will break is if a user imported a private object from there that has moved, but I think that's okay. The fact all tests pass while no import (except the private objects) was changed (yet) proves it.The easiest way to review this PR is to just look at the
file_utils
module and look after my comment, where you will find a summary of the functions/classes/constants that were moved in the form of imports. The rest of the diff (apart from the main util files) come from private objects imports, and one import that was incorrect.I have kept the PR simple for now but will change all the relative imports in every file to take each object from its real source, but before that I wanted to have your opinion on the splits done, as well as the two following points:
file_utils
to a more appropriatehub_utils
(while keeping afile_utils
for backward compatibility but with just the imports)..utils.doc
,.utils.hub
,.utils.generic
and.utils.import
for instance), still with keeping BC.As soon as we all agree on the move and the names of the new submodules, I will make the changes of import paths in all the files of the lib :-)