Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize file utils #16264

Merged
merged 11 commits into from
Mar 23, 2022
Merged

Reorganize file utils #16264

merged 11 commits into from
Mar 23, 2022

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Mar 18, 2022

What does this PR do?

This PR reorganizes the gigantic file_utils in several submodules, namely:

  • doc_utils contain all the decorators we use for our docstrings as well as the generic examples
  • file_utils keep all Hub-related functionality (it might warrant a rename to hub_utils see below)
  • generic_utils contain the "real" utils like custom Enums, to_py_obj etc.
  • import_utils contains all the is_xxx_available as well as the modules that power our lazy inits

To avoid any breaking change, this is all done while reimporting the objects moved into file_utils. The only thing that will break is if a user imported a private object from there that has moved, but I think that's okay. The fact all tests pass while no import (except the private objects) was changed (yet) proves it.

The easiest way to review this PR is to just look at the file_utils module and look after my comment, where you will find a summary of the functions/classes/constants that were moved in the form of imports. The rest of the diff (apart from the main util files) come from private objects imports, and one import that was incorrect.

I have kept the PR simple for now but will change all the relative imports in every file to take each object from its real source, but before that I wanted to have your opinion on the splits done, as well as the two following points:

  • as said above, we could also rename file_utils to a more appropriate hub_utils (while keeping a file_utils for backward compatibility but with just the imports).
  • also, the modules I have created could also go in the utils subfolder instead of the root subfolder (and be named .utils.doc, .utils.hub, .utils.generic and .utils.import for instance), still with keeping BC.

As soon as we all agree on the move and the names of the new submodules, I will make the changes of import paths in all the files of the lib :-)

from ..models.auto import get_values
from ..utils.versions import importlib_metadata
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was imported from the wrong place.


>>> tokenizer = {processor_class}.from_pretrained("{checkpoint}")
>>> model = {model_class}.from_pretrained("{checkpoint}")
# Backward compatibility imports, to make sure all those objects can be found in file_utils
Copy link
Collaborator Author

@sgugger sgugger Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here to line 122 is what you should read to review this PR, no code has changed and this summarizes all the moves done.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 18, 2022

The documentation is not available anymore as the PR was closed or merged.

@@ -91,6 +91,7 @@
"debug_utils": [],
"dependency_versions_check": [],
"dependency_versions_table": [],
"doc_utils": [],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additions here are for the repo consistency checks to pass. When we settle on names and objects to be moved, each submodule will contain the objects of the public init it defines.

@gante
Copy link
Member

gante commented Mar 20, 2022

also, the modules I have created could also go in the utils subfolder instead of the root subfolder (and be named .utils.doc, .utils.hub, .utils.generic and .utils.import for instance), still with keeping BC.

Related question: we have many files named xxx_utils.py outside the utils folder. Is it due to legacy files, or is it intentional?

>>> next_token_logits = outputs.logits[:, -1]
```
"""
MULTIPLE_CHOICE_DUMMY_INPUTS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those constants imported a lot? They are only used for TF models no? Should we maybe move them to modeling_tf_utils.py?

@patrickvonplaten
Copy link
Contributor

Thanks a lot for working on this! Very much needed change :-)

Regarding the two points above:

  • I'd be in favor of keep calling the file file_utils.py actually as I think it's not only related to the Hub. E.g. the name of the model weights is also how they will be called locally "pytorch_model.bin". So don't think it's worth renaming the file here
  • I would be in favor of moving everything into a utils folder actually

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That file really needed to be split, indeed. Agree with both points as well (keeping it named file_utils.py and moving everything in the utils/ directory).

Comment on lines 152 to 165
```python
>>> from transformers import {processor_class}, {model_class}
>>> import torch

>>> tokenizer = {processor_class}.from_pretrained("{checkpoint}")
>>> model = {model_class}.from_pretrained("{checkpoint}")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1] * inputs["input_ids"].size(1)).unsqueeze(0) # Batch size 1

>>> outputs = model(**inputs, labels=labels)
>>> loss = outputs.loss
>>> logits = outputs.logits
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at some point we could provide a better example than this 😄

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this! I'm in favour of creating a utils subfolder.

@sgugger
Copy link
Collaborator Author

sgugger commented Mar 22, 2022

As per your comments, I've put everything in a utils submodule and reimported everything:

  • in file_utils for backward compatibility (again see all tests passing with no change yet)
  • in the utils init: this way if we deprecate file_utils at some point, the code migration is super easy: replace from transfomers.file_utils import xxx by from transformers.utils import xxx.

If you agree, I'll clean up all internal imports then merge @patil-suraj @patrickvonplaten @LysandreJik

@sgugger
Copy link
Collaborator Author

sgugger commented Mar 22, 2022

Related question: we have many files named xxx_utils.py outside the utils folder. Is it due to legacy files, or is it intentional?

@gante This is more or less legacy from where the repo was flat and had no subfolders. Some could probably go in the utils subfolder as well, though I'm keeping this PR focused on file_utils (it will be big enou already ;-) )

@sgugger sgugger force-pushed the reorganize_file_utils branch from df51abf to b366d5f Compare March 23, 2022 13:23
@sgugger sgugger merged commit 4975002 into main Mar 23, 2022
@sgugger sgugger deleted the reorganize_file_utils branch March 23, 2022 14:26
FrancescoSaverioZuppichini pushed a commit that referenced this pull request Mar 24, 2022
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
@weiji14 weiji14 mentioned this pull request Dec 13, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants