-
Notifications
You must be signed in to change notification settings - Fork 130
fix: Continue to prefer List over ListView by default in ToArrow #4982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DataFusion lacks complete support for ListView. Until that is resolved, for now the default to_arrow_preferred should prefer returning List over ListView. Signed-off-by: Andrew Duffy <andrew@a10y.dev>
|
I'm actually somewhat surprised that only 1 test failed since just changing the arrow dtype should also require that we need to make the conversions properly? Edit: Oh right we match on the vortex and arrow type and use the correct conversion function |
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
|
This is just the preferred conversion. Anything that has an explicit conversion should still go to ListView |
| // (32-bit), Large List View (64-bit). We cannot both guarantee zero-copy and commit to an | ||
| // Arrow dtype because we do not how large our offsets are. | ||
| DType::List(elem_dtype, _) => DataType::ListView(FieldRef::new(Field::new_list_field( | ||
| DType::List(elem_dtype, _) => DataType::List(FieldRef::new(Field::new_list_field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves a comment noting that we are using List over ListView because datafusion doesn't really support it yet?
Codecov Report✅ All modified and coverable lines are covered by tests. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
For posterity, it falls through here https://github.com/apache/arrow-rs/blob/main/arrow-select/src/filter.rs#L309 (where there could be an optimised impl for listviews) and then panics in https://github.com/apache/arrow-rs/blob/main/arrow-data/src/transform/mod.rs#L269 |
|
Yep, I'll push a fix to arrow soon |
|
Fix to arrow: apache/arrow-rs#8645 |
Turns out, arrow-rs/DataFusion lacks complete support for ListView. Until that is resolved, for now the default to_arrow_preferred should prefer returning List over ListView.
See https://github.com/vortex-data/vortex/actions/runs/18591096514/job/53006051293?pr=4977 for example failure