Skip to content

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 10, 2021

Closes #8159, #7126

Current implementation uses an array to hold the exact size of each encoded element before the encode step. This allows us to simplify the encoding (each element encode is independent) and to allocate streams of exact size instead of the worst-case. The process is different from other types because decimal data streams do not use RLE encoding.

Will add benchmarks once data generator can produce decimal data.

vuule added 28 commits April 22, 2021 15:24
@vuule vuule self-assigned this May 10, 2021
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 10, 2021
@vuule vuule requested review from jrhemstad, shwina, isVoid, rgsl888prabhu and devavret and removed request for ttnghia and jrhemstad May 12, 2021 23:29
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest lgtm

Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to say partial review but I know everything else is very orc and implementation specific.

@vuule vuule requested review from devavret and rgsl888prabhu May 14, 2021 22:37
Copy link
Contributor

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment, apart from that looks good.

@vuule
Copy link
Contributor Author

vuule commented May 17, 2021

rerun tests

@vuule vuule changed the title Add support for decimal types in ORC writer Add support for decimal types in ORC writer May 18, 2021
@vuule vuule changed the title Add support for decimal types in ORC writer Add support for decimal types in ORC writer May 18, 2021
@vuule
Copy link
Contributor Author

vuule commented May 18, 2021

rerun tests

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 18, 2021
@vuule
Copy link
Contributor Author

vuule commented May 18, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 72e017b into rapidsai:branch-21.06 May 18, 2021
@vuule vuule deleted the fea-orc-writer-decimal branch May 18, 2021 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add support for writing dataframes containing decimal columns to orc writer
4 participants