-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ray] Fix datasets_modules
ImportError with Ray Tune
#12749
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,12 +14,15 @@ | |
""" | ||
Integrations with other Python libraries. | ||
""" | ||
import functools | ||
import importlib.util | ||
import numbers | ||
import os | ||
import sys | ||
import tempfile | ||
from pathlib import Path | ||
|
||
from .file_utils import is_datasets_available | ||
from .utils import logging | ||
|
||
|
||
|
@@ -246,8 +249,34 @@ def _objective(trial, local_trainer, checkpoint_dir=None): | |
"Trainer `args`.".format(cls=type(kwargs["scheduler"]).__name__) | ||
) | ||
|
||
trainable = ray.tune.with_parameters(_objective, local_trainer=trainer) | ||
|
||
@functools.wraps(trainable) | ||
def dynamic_modules_import_trainable(*args, **kwargs): | ||
""" | ||
Wrapper around ``tune.with_parameters`` to ensure datasets_modules are loaded on each Actor. | ||
|
||
Without this, an ImportError will be thrown. See https://github.com/huggingface/transformers/issues/11565. | ||
|
||
Assumes that ``_objective``, defined above, is a function. | ||
""" | ||
if is_datasets_available(): | ||
import datasets.load | ||
|
||
dynamic_modules_path = os.path.join(datasets.load.init_dynamic_modules(), "__init__.py") | ||
# load dynamic_modules from path | ||
spec = importlib.util.spec_from_file_location("datasets_modules", dynamic_modules_path) | ||
datasets_modules = importlib.util.module_from_spec(spec) | ||
sys.modules[spec.name] = datasets_modules | ||
spec.loader.exec_module(datasets_modules) | ||
Comment on lines
+266
to
+271
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible to use runtime environments here instead? just curious There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not without editing Ray Tune itself as it would need to be added to an Actor option in trial executor. Also it doesn't appear you can actually import a module from path in a runtime env (only pip and conda), unless I missed that in the docs |
||
return trainable(*args, **kwargs) | ||
|
||
# special attr set by tune.with_parameters | ||
if hasattr(trainable, "__mixins__"): | ||
dynamic_modules_import_trainable.__mixins__ = trainable.__mixins__ | ||
|
||
analysis = ray.tune.run( | ||
ray.tune.with_parameters(_objective, local_trainer=trainer), | ||
dynamic_modules_import_trainable, | ||
config=trainer.hp_space(None), | ||
num_samples=n_trials, | ||
**kwargs, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another comment: should this be moved upstream to
datasets
eventually?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it belongs there. The actual import needs to somewhere in Tune, and here is the most convenient place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's fine then.