Open
Description
What happened + What you expected to happen
I am trying to process some image data using Ray Data. However, when I try to use any torchvision
dataset with variable image shapes, it fails with an ray.air.util.tensor_extensions.arrow.ArrowConversionError: Error converting data to Arrow
. Example stacktrace here. Repro script below.
Versions / Dependencies
ray 2.42.0
Python 3.11.11
Reproduction script
import numpy as np
import ray
import torchvision
ray.init()
def extract_and_process_image(row: dict) -> dict:
"""Discard label and convert image to numpy array."""
return {"image": np.array(row["item"][0])}
dataset = torchvision.datasets.Caltech256(root="~/tmp/data", download=True)
ds = ray.data.from_torch(dataset)
ds = ds.map(extract_and_process_image)
print(ds.take(1))
Issue Severity
High: It blocks me from completing my task.
Activity