You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to #28084, but different due to the additional constraint for reproduction that it only comes up during datasets that require dynamic dataset generation.
On call to ray.data.from_huggingface with HF IterableDataset (streaming=True), datasets that requires running remote code to generate will cause Ray Data to crash when attempting to materialize/interact with the dataset. Likely this is because of the dynamic import of the datasets_modules isn't loaded yet during dataset generation on HF's end when Ray tries to iterate over the dataset.
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
Jemoka
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jan 1, 2025
What happened + What you expected to happen
Similar to #28084, but different due to the additional constraint for reproduction that it only comes up during datasets that require dynamic dataset generation.
On call to
ray.data.from_huggingface
with HF IterableDataset (streaming=True
), datasets that requires running remote code to generate will cause Ray Data to crash when attempting to materialize/interact with the dataset. Likely this is because of the dynamic import of thedatasets_modules
isn't loaded yet during dataset generation on HF's end when Ray tries to iterate over the dataset.Trace from reproducer below:
Versions / Dependencies
ray==2.40.0
datasets==3.2.0
huggingface-hub==0.27.0
Reproduction script
import datasets
import ray
ds = datasets.load_dataset(path="hotpotqa/hotpot_qa",
name="fullwiki",
split="train",
streaming=True,
trust_remote_code=True)
ds = ray.data.from_huggingface(ds)
ds.materialize()
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: