[data] sort large dataset by ray.data.Dataset always fail

### What happened + What you expected to happen

I run sort with large scaling data (1000mil row x 1000 col data, total 1TB),  with test code as:
```python 
import ray

ray.init()

ctx = ray.data.DataContext.get_current()
ctx.use_push_based_shuffle = True

data_path = "hdfs://ip:port/home/data/testdata/1y_rows_1000_columns/"

data = ray.data.read_csv(data_path)

data = data.sort("id")

data = data.materialize()
print(data.count())
print(data.schema())
print(data.take(10))
```
running in my ray cluster (16cpu64gb x 40worker)， it always fail with error of worker dead.  It is there something config I missing to config for sort with large data ?
<img width="2398" alt="截屏2025-01-07 18 59 26" src="https://github.com/user-attachments/assets/617fc489-e537-42ae-913d-60fba2097642" />



### Versions / Dependencies

ray == 2.10.0

### Reproduction script

```python 
import ray

ray.init()

ctx = ray.data.DataContext.get_current()
ctx.use_push_based_shuffle = True

data_path = "hdfs://ip:port/home/data/testdata/1y_rows_1000_columns/"

data = ray.data.read_csv(data_path)

data = data.sort("id")

data = data.materialize()
print(data.count())
print(data.schema())
print(data.take(10))
```

### Issue Severity

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] sort large dataset by ray.data.Dataset always fail #49679

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development