Skip to content

perf[vortex-array]: don't materialize const arrays in to_arrow_dictionary#6265

Merged
joseph-isaacs merged 1 commit intodevelopfrom
asubiotto/constantdict
Feb 3, 2026
Merged

perf[vortex-array]: don't materialize const arrays in to_arrow_dictionary#6265
joseph-isaacs merged 1 commit intodevelopfrom
asubiotto/constantdict

Conversation

@asubiotto
Copy link
Contributor

Executing a Constant array into an arrow dictionary would materialize the full array and then perform the cast. Since we only have one value, we can just cast that value and create the codes array. This results in a 15% performance improvement on our real-world queries.

@asubiotto asubiotto added the changelog/performance A performance improvement label Feb 3, 2026
Copy link
Contributor

@joseph-isaacs joseph-isaacs Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that 1 its correct (equal to the expected value) and 2 that the values count of the array dict is 1

Ok(dict) => return dict_to_dict(dict, codes_type, values_type, ctx),
Err(array) => array,
};
let array = match array.try_into::<ConstantVTable>() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to match on vtables? The other thing I though of doing is an if/else chain on array.is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not now, but we should add this.

Copy link
Contributor

@joseph-isaacs joseph-isaacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to merge once you have those tests

@asubiotto
Copy link
Contributor Author

TFTR! Will work on adding some tests

@asubiotto asubiotto force-pushed the asubiotto/constantdict branch from 933e1fc to 3484ff1 Compare February 3, 2026 11:07
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 3, 2026

Merging this PR will degrade performance by 32.7%

❌ 1 regressed benchmark
✅ 1137 untouched benchmarks
⏩ 1265 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_into_canonical[(1000, 10)] 43 µs 63.9 µs -32.7%

Comparing asubiotto/constantdict (00403b6) with develop (0d0fc63)

Open in CodSpeed

Footnotes

  1. 1265 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Ok(())
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind writing these test in terms of assert_eq(expected, actual)

#[test]
fn dict_to_dict_with_null_codes() -> VortexResult<()> {
    let dict = DictArray::try_new(
        PrimitiveArray::from_option_iter(vec![Some(0u8), None, Some(1)]).into_array(),
        VarBinViewArray::from_iter_str(["a", "b"]).into_array(),
    )?;
    let actual = execute(dict.into_array(), &dict_type(DataType::UInt8, DataType::Utf8))?;

    let expected: DictionaryArray<UInt8Type> =
        vec![Some("a"), None, Some("b")].into_iter().collect();

    assert_eq!(expected, actual.as_dictionary::<UInt8Type>().clone());
    Ok(())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its less code and tests more stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will also use rstest

@asubiotto asubiotto force-pushed the asubiotto/constantdict branch from 3484ff1 to 3e0dd74 Compare February 3, 2026 12:16
…nary

Executing a Constant array into an arrow dictionary would materialize the full
array and then perform the cast. Since we only have one value, we can just cast
that value and create the codes array. This results in 15-20% performance
improvement on our real-world queries.

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
@asubiotto asubiotto force-pushed the asubiotto/constantdict branch from 3e0dd74 to 00403b6 Compare February 3, 2026 12:44
@joseph-isaacs joseph-isaacs merged commit 78b472b into develop Feb 3, 2026
43 of 46 checks passed
@joseph-isaacs joseph-isaacs deleted the asubiotto/constantdict branch February 3, 2026 13:29
danking pushed a commit that referenced this pull request Feb 6, 2026
…nary (#6265)

Executing a Constant array into an arrow dictionary would materialize
the full array and then perform the cast. Since we only have one value,
we can just cast that value and create the codes array. This results in
a 15% performance improvement on our real-world queries.

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants