Skip to content

Improve codebase readability and error messages by and consistently handle downcasting #3152

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is a common code pattern in DataFusion to downcast an ArrayRef to a concrete arrow array type.

There are actually at least three patterns to do so in the datafusion codebase https://github.com/apache/arrow-datafusion/search?l=Rust&q=downcast

I think this leads to

  1. Poor user experience (as a DataFusion bug might panic and quit the process, which is impolite)
  2. less readable code
  3. less efficient feature development and review (as people have to figure out which of the three patterns to use and they can't just look at what convention is used in the rest of the codebase

Pattern 1: downcast_ref()

let array = array.as_any().downcast_ref::<Date32Array>().unwrap();

panics on error

Pattern 2: use as_XXX_array functions from arrow

let array = as_boolean_array(&array);

panics on error

Pattern 3: check type and return an error:

let array = array.as_any().downcast_ref::<Date32Array>().ok_or(
    DataFusionError::Execution("Cast Date32Array error".to_string()),
)?;

Returns an error to the user (a better behavior I think)

Describe the solution you'd like
I would like to use one consistent pattern throughout the codebase. Specifically I propose functions that do the error checking / error message generation:

/// Return `array` cast to `Int32Array` otherwise an error
pub fn as_int32_array(array: &dyn Array) -> Result<&Int32Array, DataFusionError> {
  array.as_any().downcast_ref::<Int32Array>().ok_or(
    DataFusionError::Execution(format!("Expected a Int32Array, got: ", array.data_type()),
  )
}

/// And similar functions for the remaining array types

And then change all dynamic casts in the codebase:

  1. If they are fallable, to use the new functions that make a nice error message
  2. If they are infallible / somewhere were passing out the result is hard, use arrow functions

Describe alternatives you've considered
Leave the status quo

Additional context
Inspired by @avantgardnerio 's suggestion https://github.com/apache/arrow-datafusion/pull/3110/files#r945847871 (but this is not the first time it has come up)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions