Open
Description
What happened + What you expected to happen
I run sort with large scaling data (1000mil row x 1000 col data, total 1TB), with test code as:
import ray
ray.init()
ctx = ray.data.DataContext.get_current()
ctx.use_push_based_shuffle = True
data_path = "hdfs://ip:port/home/data/testdata/1y_rows_1000_columns/"
data = ray.data.read_csv(data_path)
data = data.sort("id")
data = data.materialize()
print(data.count())
print(data.schema())
print(data.take(10))
running in my ray cluster (16cpu64gb x 40worker), it always fail with error of worker dead. It is there something config I missing to config for sort with large data ?
Versions / Dependencies
ray == 2.10.0
Reproduction script
import ray
ray.init()
ctx = ray.data.DataContext.get_current()
ctx.use_push_based_shuffle = True
data_path = "hdfs://ip:port/home/data/testdata/1y_rows_1000_columns/"
data = ray.data.read_csv(data_path)
data = data.sort("id")
data = data.materialize()
print(data.count())
print(data.schema())
print(data.take(10))
Issue Severity
None
Activity