Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new APIs to AddressableNonNullValueList to copy/append a stream of bytes #8653

Closed
wants to merge 1 commit into from

Conversation

aditi-pandit
Copy link
Collaborator

@aditi-pandit aditi-pandit commented Feb 2, 2024

This is the first in a set of PRs to add support for spilling distinct aggregations (see full version in #7791).

Spilling distinct aggregations needs support to spill SetAccumulators in which input values are cumulated. ComplexTypeSetAccumulators use AddressableNonNullValueList to serialize complex types. This PR adds new APIs to AddressableNonNullValueList so that it can copy/append a stream of bytes corresponding to a ComplexType value (array, map, struct).

Copy link

netlify bot commented Feb 2, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit a3b0469
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65c2c0a5eb18fa000867fd5e

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit Aditi, can you remind me how are you planning to serialize distinct accumulators? Will you extract each accumulator into a single blob (VARBINARY values) or an array of blobs (ARRAY(VARBINARY)) or something else?

Would it be more efficient to serialize / deserialize all entries in bulk rather than one at a time?

velox/exec/AddressableNonNullValueList.cpp Outdated Show resolved Hide resolved
velox/exec/AddressableNonNullValueList.cpp Outdated Show resolved Hide resolved
@aditi-pandit
Copy link
Collaborator Author

aditi-pandit commented Feb 2, 2024

@mbasmanova : Answers inline

@aditi-pandit Aditi, can you remind me how are you planning to serialize distinct accumulators? Will you extract each accumulator into a single blob (VARBINARY values) or an array of blobs (ARRAY(VARBINARY)) or something else?

The implementation extracts each accumulator into a single blob. #8660 has the SetAccumulator serialization code.

Would it be more efficient to serialize / deserialize all entries in bulk rather than one at a time?

Do you mean serialize multiple ComplexTypes in one go ? Or do you mean something else ?

One issue for serializing multiple ComplexTypes is that we need a hash of each ComplexType value at a time for ComplexTypeSetAccumulator (using BaseVector::hashValueAt() logic). If there is a way to reconstruct such a hash from the ComplexType value one at a time, then doing a bulk write of values and reconstructing hash on read is possible. Right now, I didn't see a way to do this in the code. But its possible to do some refactoring for it.

wdyt ?

@aditi-pandit
Copy link
Collaborator Author

@mbasmanova : Answers inline

@aditi-pandit Aditi, can you remind me how are you planning to serialize distinct accumulators? Will you extract each accumulator into a single blob (VARBINARY values) or an array of blobs (ARRAY(VARBINARY)) or something else?

The implementation extracts each accumulator into a single blob. #8660 has the SetAccumulator serialization code.

Would it be more efficient to serialize / deserialize all entries in bulk rather than one at a time?

Do you mean serialize multiple ComplexTypes in one go ? Or do you mean something else ?

One issue for serializing multiple ComplexTypes is that we need a hash of each ComplexType value at a time for ComplexTypeSetAccumulator (using BaseVector::hashValueAt() logic). If there is a way to reconstruct such a hash from the ComplexType value one at a time, then doing a bulk write of values and reconstructing hash on read is possible. Right now, I didn't see a way to do this in the code. But its possible to do some refactoring for it.

wdyt ?

Confirmed with Masha about this limitation for bulk serialize/deserialize.

@aditi-pandit
Copy link
Collaborator Author

@mbasmanova : PTAL. Thanks.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit Overall looks good % some comments.

velox/exec/AddressableNonNullValueList.h Outdated Show resolved Hide resolved
velox/exec/AddressableNonNullValueList.h Outdated Show resolved Hide resolved
velox/exec/tests/AddressableNonNullValueListTest.cpp Outdated Show resolved Hide resolved
velox/exec/tests/AddressableNonNullValueListTest.cpp Outdated Show resolved Hide resolved
velox/exec/tests/AddressableNonNullValueListTest.cpp Outdated Show resolved Hide resolved
velox/exec/tests/AddressableNonNullValueListTest.cpp Outdated Show resolved Hide resolved
velox/exec/tests/AddressableNonNullValueListTest.cpp Outdated Show resolved Hide resolved
@aditi-pandit aditi-pandit force-pushed the non_null_value_list_api branch 2 times, most recently from 6705014 to be6a221 Compare February 6, 2024 22:35
@aditi-pandit
Copy link
Collaborator Author

@mbasmanova : PTAL.

@aditi-pandit
Copy link
Collaborator Author

@mbasmanova : Thanks for your review comments. PTAL.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@aditi-pandit
Copy link
Collaborator Author

@mbasmanova : There are failures in Facebook Linter for this. Do you need any help from me ? Would be great to get the error messages.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 02a4f6c.

Copy link

Conbench analyzed the 1 benchmark run on commit 02a4f6c3.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@aditi-pandit aditi-pandit deleted the non_null_value_list_api branch February 7, 2024 20:11
FelixYBW pushed a commit to FelixYBW/velox that referenced this pull request Feb 10, 2024
…f bytes (facebookincubator#8653)

Summary:
This is the first in a set of PRs to add support for spilling distinct aggregations (see full version in facebookincubator#7791).

Spilling distinct aggregations needs support to spill SetAccumulators in which input values are cumulated. ComplexTypeSetAccumulators use AddressableNonNullValueList to serialize complex types. This PR adds new APIs to AddressableNonNullValueList so that it can copy/append a stream of bytes corresponding to a ComplexType value (array, map, struct).

Pull Request resolved: facebookincubator#8653

Reviewed By: Yuhta

Differential Revision: D53497000

Pulled By: mbasmanova

fbshipit-source-id: 66d44d02a2c3bd5775725c8b8559feaed17c0813
FelixYBW pushed a commit to FelixYBW/velox that referenced this pull request Feb 10, 2024
…f bytes (facebookincubator#8653)

Summary:
This is the first in a set of PRs to add support for spilling distinct aggregations (see full version in facebookincubator#7791).

Spilling distinct aggregations needs support to spill SetAccumulators in which input values are cumulated. ComplexTypeSetAccumulators use AddressableNonNullValueList to serialize complex types. This PR adds new APIs to AddressableNonNullValueList so that it can copy/append a stream of bytes corresponding to a ComplexType value (array, map, struct).

Pull Request resolved: facebookincubator#8653

Reviewed By: Yuhta

Differential Revision: D53497000

Pulled By: mbasmanova

fbshipit-source-id: 66d44d02a2c3bd5775725c8b8559feaed17c0813
FelixYBW pushed a commit to FelixYBW/velox that referenced this pull request Feb 12, 2024
…f bytes (facebookincubator#8653)

Summary:
This is the first in a set of PRs to add support for spilling distinct aggregations (see full version in facebookincubator#7791).

Spilling distinct aggregations needs support to spill SetAccumulators in which input values are cumulated. ComplexTypeSetAccumulators use AddressableNonNullValueList to serialize complex types. This PR adds new APIs to AddressableNonNullValueList so that it can copy/append a stream of bytes corresponding to a ComplexType value (array, map, struct).

Pull Request resolved: facebookincubator#8653

Reviewed By: Yuhta

Differential Revision: D53497000

Pulled By: mbasmanova

fbshipit-source-id: 66d44d02a2c3bd5775725c8b8559feaed17c0813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants