feat[array]: use execute to return Canonical array not Vector#5895
feat[array]: use execute to return Canonical array not Vector#5895joseph-isaacs merged 10 commits intodevelopfrom
Canonical array not Vector#5895Conversation
Merging this PR will improve performance by 10.71%
Performance Changes
Comparing |
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
da7bd88 to
409aa6a
Compare
Canonical array not Vector
| /// | ||
| /// This is the core operation for dictionary decoding - it expands the dictionary | ||
| /// by looking up each code in the values array. | ||
| pub fn take_canonical(values: Canonical, codes: &PrimitiveArray) -> Canonical { |
There was a problem hiding this comment.
we don't need this, but I think we will want this once take compute fn is gone
| } | ||
|
|
||
| /// Filter a canonical array by a mask, returning a new canonical array. | ||
| pub fn filter_canonical(canonical: Canonical, mask: &Mask) -> Canonical { |
There was a problem hiding this comment.
same as take and mask
a107ea7 to
46424ab
Compare
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
193570d to
dc0e9c9
Compare
|
@claude analyze this pr deeply |
|
Claude finished @0ax1's task in 2m 52s —— View job Deep PR Analysis: "feat[array]: use execute to return
|
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
|
How would you want this split up? |
gatesn
left a comment
There was a problem hiding this comment.
Lots of follow ups, notably I think we want the API to be execute::<E>(ctx), but we have to interleave that with some other fixes
| // Method 3: Using the execute() method (this is what would be used in production). | ||
| let executed = bitpacked.into_array().execute_vector(&SESSION).unwrap(); | ||
| let executed = { | ||
| let mut ctx = SESSION.create_execution_ctx(); |
There was a problem hiding this comment.
Should we add SESSION.execute(bitpacked.into_array()) as an option?
There was a problem hiding this comment.
I think think so, I think the explicitness is okay
| let codes = array.codes().execute(ctx)?.into_primitive(); | ||
| Ok(values.take(&codes)) | ||
| fn execute(array: &Self::Array, ctx: &mut ExecutionCtx) -> VortexResult<Canonical> { | ||
| // TODO(joe): remove take compute fn |
There was a problem hiding this comment.
I think this is better worded as "inline Canonical::take functionality alongside DictArray"
| mod list_view; | ||
| mod null; | ||
| mod primitive; | ||
| pub mod null; |
There was a problem hiding this comment.
To share logic with the old too arrow
| Ok(match self { | ||
| Canonical::Null(a) => Vector::Null(NullVector::new(a.len())), | ||
| Canonical::Bool(a) => { | ||
| Vector::Bool(BoolVector::new(a.bit_buffer().clone(), a.validity_mask())) |
There was a problem hiding this comment.
.validity_mask() performs compute!
Remove the `VORTEX_OPERATORS` remove the old `to_canonical` method and using `execute::<E>` in its place. This cause CPU compute to return canonical (Not `ArrayRef` as before). see #5895 --------- Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Remove the `VORTEX_OPERATORS` remove the old `to_canonical` method and using `execute::<E>` in its place. This cause CPU compute to return canonical (Not `ArrayRef` as before). see #5895 --------- Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Use
execute_canonicalinstead ofexecute[likely no one will need to fix anything since the API is very new