-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Use DataFusionError instead of ArrowError in FileOpenFuture #17397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
582a1c2 to
75d54cd
Compare
| ) | ||
| .map_err(|err| { | ||
| arrow::error::ArrowError::ParseError(format!( | ||
| datafusion_common::DataFusionError::Execution(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| datafusion_common::DataFusionError::Execution(format!( | |
| exec_err!( |
| Ok(futures::future::ready(internal_err!("error opening")).boxed()) | ||
| } else if self.error_scanning_idx.contains(&idx) { | ||
| let error = futures::future::ready(Err(ArrowError::IpcError( | ||
| let error = futures::future::ready(Err(DataFusionError::Execution( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let error = futures::future::ready(Err(DataFusionError::Execution( | |
| let error = futures::future::ready(exec_err!( |
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result<RecordBatch, DataFusionError> can be replaced with pub type Result<T, E = DataFusionError> = result::Result<T, E>; declared in datafusion/common/src/error.rs
…ub type Result<T, E = DataFusionError> = result::Result<T, E>;` declared in `datafusion/common/src/error.rs`
|
Thanks @comphead ! Addressed your feedback |
| /// A fallible future that resolves to a stream of [`RecordBatch`] | ||
| pub type FileOpenFuture = | ||
| BoxFuture<'static, Result<BoxStream<'static, Result<RecordBatch, ArrowError>>>>; | ||
| BoxFuture<'static, Result<BoxStream<'static, Result<RecordBatch>>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing how rustc can distinguish Result implementations 😮
|
|
||
| impl Stream for TestStream { | ||
| type Item = Result<RecordBatch, ArrowError>; | ||
| type Item = Result<RecordBatch, datafusion_common::DataFusionError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the Item also reuse DF Result?
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @adriangb
|
I also spoke with @alamb about this change, he agreed it's a positive change and that the breakage should be minimal because (1) not many people are implementing |
|
Also, I think the File opener trait is somewhat specialized / not used all that widely, so this change will likely not affect that many people |
I would like to re-use
ProjectionStreamfor #14993:datafusion/datafusion/physical-plan/src/projection.rs
Lines 362 to 388 in 0a9dcec
A barrier to this is that the FileOpenerFuture stream uses ArrowError while the ExecutionPlan::execute stream uses DataFusionError. I don't see any reason for this so I propose we change it.
Within DataFusion we should not really be creating ArrowErrors, I believe we should almost always use DataFusionError.
Upgrade guide note aside this is a -3 LOC diff -> reducing complexity (less error types, less lines of code).