Skip to content

Conversation

@connortsui20
Copy link
Contributor

@connortsui20 connortsui20 commented Sep 3, 2025

Tracking Issue: #4372

Implements the same compute functions as ListArray, but for FixedSizeListArray.

  • Take: Select elements by indices with optimized paths for nullable/non-nullable data
  • Filter: Apply boolean masks with density-based expansion strategy (not sure it this is a premature optimization)
  • Cast
  • IsConstant: Check if all lists in the array are identical (this is super expensive and it might be worth revisiting the compute function characteristics itself, cc @gatesn)
  • Mask
  • Fuzzing: Extended arbitrary data generation to support FixedSizeListArray
  • Arrow conversion
  • Refactored ListArray filter compute function with a similar density check

The two complicated compute functions here are filter and take, with take being significantly more complicated. Those are accompanied by tests since the logic is a lot harder.

Nice-to-have-but-I'm-too-tired:

  • Tests for each compute function (missing for cast, is_constant, and mask)
  • Refactor the ListArray take compute function

@connortsui20 connortsui20 added the changelog/feature A new feature label Sep 3, 2025
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 marked this pull request as ready for review September 3, 2025 21:17
@codecov
Copy link

codecov bot commented Sep 3, 2025

Codecov Report

❌ Patch coverage is 88.47385% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.80%. Comparing base (4caa917) to head (ad56a75).
⚠️ Report is 23 commits behind head on develop.

Files with missing lines Patch % Lines
...rtex-array/src/arrow/compute/to_arrow/canonical.rs 6.06% 31 Missing ⚠️
...x-array/src/arrays/fixed_size_list/compute/cast.rs 0.00% 22 Missing ⚠️
.../src/arrays/fixed_size_list/compute/is_constant.rs 0.00% 16 Missing ⚠️
...x-array/src/arrays/fixed_size_list/compute/mask.rs 0.00% 11 Missing ⚠️
encodings/runend/src/compute/filter.rs 68.18% 7 Missing ⚠️
vortex-array/src/arrays/list/compute/filter.rs 92.06% 5 Missing ⚠️
vortex-dtype/src/dtype.rs 54.54% 5 Missing ⚠️
...ortex-array/src/compute/conformance/consistency.rs 0.00% 4 Missing ⚠️
...array/src/arrays/fixed_size_list/compute/filter.rs 94.23% 3 Missing ⚠️
...x-array/src/arrays/fixed_size_list/compute/take.rs 98.03% 2 Missing ⚠️
... and 2 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@connortsui20 connortsui20 requested a review from gatesn September 3, 2025 21:23
Copy link
Contributor

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify things a little bit

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Comment on lines 14 to 21
let Some(target_element_type) = dtype.as_list_element_opt() else {
return Ok(None);
};

// The list elements could technically be from either `FixedSizeList` or `List`.
if dtype.is_list() {
return Ok(None);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems somewhat error-prone, should I create a as_fixed_size_list_element_opt that only returns the type if it is DType::FixedSizeList and have as_list_element_opt return None in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think s we should maybe just make this work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this more I think you want to separate this two out. If FixedSizeList is a dtype then it's elements are distinct from the list elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated them out:

    /// Get the inner element dtype if `self` is a [`DType::List`], otherwise returns `None`.
    ///
    /// Note that this does _not_ return `Some` if `self` is a [`DType::FixedSizeList`].
    pub fn as_list_element_opt(&self) -> Option<&Arc<DType>> {
        if let List(edt, _) = self {
            Some(edt)
        } else {
            None
        }
    }

    /// Get the inner element dtype if `self` is a [`DType::FixedSizeList`], otherwise returns
    /// `None`.
    ///
    /// Note that this does _not_ return `Some` if `self` is a [`DType::List`].
    pub fn as_fixed_size_list_element_opt(&self) -> Option<&Arc<DType>> {
        if let FixedSizeList(edt, ..) = self {
            Some(edt)
        } else {
            None
        }
    }

    /// Get the inner element dtype if `self` is **either** a [`DType::List`] or a
    /// [`DType::FixedSizeList`], otherwise returns `None`
    pub fn as_any_size_list_element_opt(&self) -> Option<&Arc<DType>> {
        if let FixedSizeList(edt, ..) = self {
            Some(edt)
        } else if let List(edt, ..) = self {
            Some(edt)
        } else {
            None
        }
    }

Comment on lines +25 to +32
// TODO(connor): There must be a more efficient way to do this. Each `scalar_at()` call
// makes several allocations...
for i in 1..array.len() {
let current_scalar = array.scalar_at(i);
if current_scalar != first_scalar {
return Ok(Some(false));
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is pretty bad. I'm also realizing that this will also check the dtypes are equal every single time. Maybe a eq_value or something is needed? I'll probably leave this for later.

Copy link
Contributor

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only have some nits

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 enabled auto-merge (squash) September 5, 2025 10:37
@connortsui20 connortsui20 merged commit f49b408 into develop Sep 5, 2025
39 checks passed
@connortsui20 connortsui20 deleted the ct/fsl-compute branch September 5, 2025 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants