-
Couldn't load subscription status.
- Fork 3.9k
Description
In the wake of ARROW-8792, this issue is to serve as an umbrella issue for follow up work and associated "buildout" which includes things like:
-
Implementation of many new function types and adding new kernel cases to existing functions
-
Adding implicit casting functionality to function execution
-
Creation of "bound" physical array expressions and execution thereof
-
Pipeline execution (executing multiple kernels while eliminating temporary allocation)
-
Parallel execution of scalar and aggregate kernels (including parallel execution of pipelined kernels)
There's quite a few existing JIRAs in the project that I'll attach to this issue and I'll open plenty more issues as things occur to me to help organize the work.
Reporter: Wes McKinney / @wesm
Related issues:
- [C++] String algorithm library for StringArray/BinaryArray (is a parent of)
- [C++] Add casting option to set unsafe casts to null rather than some garbage value (is a parent of)
- [C++] Implement kernel function that converts a dense array to dictionary given known dictionary (is a parent of)
- [C++] Parallelize execution of ScalarAggregateFunction (is a parent of)
- [C++] Implement hashing, dictionary-encoding for StructArray (is a parent of)
- [C++] Add function to "conform" a dictionary array to a target new dictionary (is a parent of)
- [C++] Support temporal arithmetic ({time,date}{32,64}, timestamp, interval) (is a parent of)
- [C++] Kernel functions for determining monotonicity (ascending or descending) for well-ordered types (is a parent of)
- [C++] Implement "fill null" kernels that replace null values with some scalar replacement value (is a parent of)
- [C++] Implement "drop null" kernels that return array without nulls (is a parent of)
- [C++] Forward, backward fill kernel functions (is a parent of)
- [C++] Implement "any" reduction kernel for boolean data (is a parent of)
- [C++] Implement casts from one struct type to another (with same field names and number of fields) (is a parent of)
- [R] Add bindings for sum and mean compute kernels (is a parent of)
- [C++] Support lossy casts from decimal128 to float32 and float64/double (is a parent of)
- [C++] Implement casts from float/double to decimal128 (is a parent of)
- [C++] Implement example string scalar kernel function to assist with string kernels buildout per ARROW-555 (is a parent of)
- [C++] Add timestamp subtract kernel aliased to int64 subtract implementation (is a parent of)
- [C++] Add "parse_strptime" function for string to timestamp conversions using the kernels framework (is a parent of)
- [R] Provide binding for arrow::compute::CallFunction (is a parent of)
- [C++][Compute] Add strftime kernel (is a parent of)
- [C++] Refactor filter/take kernels to use Datum instead of overloads (is a parent of)
- [C++] Incremental Variance, Standard Deviation aggregators (is a parent of)
- [C++] Cast to/from halffloat not implemented (is a parent of)
- [C++] Implement support for using selection vectors in scalar aggregate function kernels (is a parent of)
- [C++] Add options to ValueCount/Unique/DictEncode kernel to toggle null behavior (is a parent of)
- [C++][Python] Support ExtensionType arrays in more kernels (is a parent of)
- [C++] Allow automatic String -> LargeString promotions when concatenating tables (is a parent of)
- [C++] Determine strategy for propagating failures in initializing built-in function registry in arrow/compute (is a parent of)
- [C++] Determine desirable maximum length for ExecBatch in pipelined and parallel execution of kernels (is a parent of)
- [C++] Add "TypeResolver" class interface to replace current OutputType::Resolver pattern (is a parent of)
- [C++] Parallelize execution of arrow::compute::ScalarFunction (is a parent of)
- [C++] Add VectorFunction wrapping arrow::Concatenate (is a parent of)
- [C++] Deprecate or remove Scalar::Parse and Scalar::CastTo (is a parent of)
- [C++] Arithmetic kernels for numeric arrays (is a parent of)
- [C++][Compute] Extract preallocation logic from KernelExecutor (is a parent of)
- [C++][Compute] Dispatch* should examine options as well as input types (is a parent of)
- [C++] Implement hash_aggregate kernels (umbrella issue) (is a parent of)
- [C++/Python] Implement Array.isvalid/notnull/isnull as scalar functions (is a parent of)
- [Python/C++] Add index() method to find first occurence of Python scalar (is a parent of)
- [Python] Expose compare kernels on Array class (is a parent of)
- [C++] Possible to reduce object code generated in compute/kernels/take.cc? (is a parent of)
- [R] Add bindings for compare and boolean kernels (is a parent of)
- [C++] Refactor AddKernel to support other operations and types (is a parent of)
- [C++][Compute] Consolidate fill_null and coalesce (is a parent of)
- [C++] Implement cast to Binary and FixedSizeBinary (is a parent of)
- [C++] Use selection vectors in Filter implementation for record batches, tables (is a parent of)
- [C++] Implement casts from date types to Timestamp (is a parent of)
- [C++] Add C++ unit tests for filter and take functions on temporal type inputs, including timestamps (is a parent of)
- [C++] Reimplement dictionary unpacking in Cast kernels using Take (is a parent of)
- [C++] Reduce number of take kernels (is a parent of)
- [C++] Implement optimized "unsafe take" for use with selection vectors for kernel execution (is a parent of)
- [C++][Compute] Formalize "metafunction" concept (is a parent of)
- [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching to appropriate type-specific CastFunction (is a parent of)
- [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that may require implicit casts to invoke (is a parent of)
- [C++] Improve usability of arrow::compute::CallFunction by moving ExecContext* argument to end and adding default (is a parent of)
- [C++] Improve docstrings in new public APIs in arrow/compute and fix miscellaneous typos (is a parent of)
- [C++] Measure microperformance associated with ExecBatchIterator (is a parent of)
- [C++] Change compute::Arity:VarArgs min_args default to 0 (is a parent of)
- [C++] Reduce generated code in vector_hash.cc (is a parent of)
- [C++] Reduce generated code in compute/kernels/scalar_compare.cc (is a parent of)
- [C++] compute::CallFunction can't Filter/Take with ChunkedArray (is a parent of)
- [C++] Implement BitBlockCounter interface for blockwise popcounts of validity bitmaps (is a parent of)
- [C++] Improve and expand Take/Filter benchmarks (is a parent of)
- [C++] Add sum/mean kernels for Boolean type (is a parent of)
- [C++] Support scalar aggregation over scalars (is a parent of)
- [C++][Compute] Add ExecNode hierarchy (is a parent of)
- [C++][Compute] Promote Expression to the compute namespace (is a parent of)
- [C++][Dataset][Compute] Refactor Dataset scans to use an ExecNode graph (is a parent of)
- [C++][Compute][R] Add ScalarAggregateOptions to Any and All kernels (is a parent of)
- [C++] Kernels to extract datetime components should be timezone aware (is a parent of)
- [C++] Support filter/take for union data type. (is a parent of)
- [C++][Compute] Enhance FunctionOptions with equality, debug representability, and serializability (is a parent of)
- [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time) (is a parent of)
- [C++] Add option to specify the first day of the week for the "day_of_week" temporal kernel (is a parent of)
- [C++] Add a general "if, ifelse, ..., else" kernel ("CASE WHEN") (is a parent of)
- [C++] Add a 'choose' kernel/scalar compute function (is a parent of)
- [C++] Support variable-width types in case_when function (is a parent of)
- [C++] Implement datediff kernel (is a parent of)
- [C++] Implement timestamp to date/time cast that extracts value (is a parent of)
- [C++] Implement casting Binary <-> LargeBinary (is a parent of)
- [C++] Implement casting List <-> LargeList (is a parent of)
- [C++] SortToIndices kernel must support FixedSizeBinary (is a parent of)
- [C++] ArgSort kernel should not materialize the output internal (is a parent of)
- [C++] Option for Filter kernel how to handle nulls in the selection vector (is a parent of)
- [C++] Determine the feasibility and build a prototype to replace compute/kernels with gandiva kernels (is a parent of)
- [Python] Add relevant glue for implementing each kind of FunctionOptions (is a parent of)
- [C++] Implement aggregate compute functions for decimal datatypes (is a parent of)
- [C++] Refactor temporal casts to work with Scalar inputs (is a parent of)
- [C++] Improved declarative compute function / kernel development framework, normalize calling conventions (relates to)
- [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior (is related to)
- [C++] Arrow-native C++ Data Frame-style programming interface for analytics (umbrella issue) (is related to)
- [C++] Implement List Flatten kernel (is related to)
- [C++] Sketch out design for kernels and "query" execution in compute layer (is related to)
- [C++] Optimize Take implementation (is related to)
- [C++] Add utf8proc library to toolchain (is related to)
- [C++] Split non-cast compute kernels into a separate shared library (is related to)
- [Python] Expose more compute kernels (is related to)
- [C++] Move arrow::ArrayData to a separate header file (is related to)
- [C++] Document available functions in compute::FunctionRegistry (is related to)
- [C++] Optimize Filter implementation (is related to)
- [C++] Expand SumKernel benchmark to more types (is related to)
- [C++] Support casting between decimal types with compatible precision/scales (is related to)
- [C++] Flatbuffers based serialization protocol for Expressions (is related to)
- [C++] Collapse Take APIs from 8 to 1 or 2 (is related to)
- [C++] [Python] Proposal for several Array utility functions (is related to)
Note: This issue was originally created as ARROW-8894. Please see the migration documentation for further details.