Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ray] Fix datasets_modules ImportError with Ray Tune #12749

Merged
merged 2 commits into from
Jul 19, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion src/transformers/integrations.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,15 @@
"""
Integrations with other Python libraries.
"""
import functools
import importlib.util
import numbers
import os
import sys
import tempfile
from pathlib import Path

from .file_utils import is_datasets_available
from .utils import logging


Expand Down Expand Up @@ -246,8 +249,34 @@ def _objective(trial, local_trainer, checkpoint_dir=None):
"Trainer `args`.".format(cls=type(kwargs["scheduler"]).__name__)
)

trainable = ray.tune.with_parameters(_objective, local_trainer=trainer)

@functools.wraps(trainable)
def dynamic_modules_import_trainable(*args, **kwargs):
"""
Wrapper around ``tune.with_parameters`` to ensure datasets_modules are loaded on each Actor.

Without this, an ImportError will be thrown. See https://github.com/huggingface/transformers/issues/11565.

Assumes that ``_objective``, defined above, is a function.
"""
if is_datasets_available():
import datasets.load

dynamic_modules_path = os.path.join(datasets.load.init_dynamic_modules(), "__init__.py")
# load dynamic_modules from path
spec = importlib.util.spec_from_file_location("datasets_modules", dynamic_modules_path)
datasets_modules = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = datasets_modules
Comment on lines +266 to +270
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another comment: should this be moved upstream to datasets eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it belongs there. The actual import needs to somewhere in Tune, and here is the most convenient place

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that's fine then.

spec.loader.exec_module(datasets_modules)
Comment on lines +266 to +271
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to use runtime environments here instead? just curious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without editing Ray Tune itself as it would need to be added to an Actor option in trial executor. Also it doesn't appear you can actually import a module from path in a runtime env (only pip and conda), unless I missed that in the docs

return trainable(*args, **kwargs)

# special attr set by tune.with_parameters
if hasattr(trainable, "__mixins__"):
dynamic_modules_import_trainable.__mixins__ = trainable.__mixins__

analysis = ray.tune.run(
ray.tune.with_parameters(_objective, local_trainer=trainer),
dynamic_modules_import_trainable,
config=trainer.hp_space(None),
num_samples=n_trials,
**kwargs,
Expand Down