-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-17932: [C++] Implement streaming RecordBatchReader for JSON #14355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
A couple notes on this:
|
|
Some answers to the points above:
|
c579b03 to
cba7028
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still learning the async utilities in Arrow, and this PR was a very helpful read. 😄 Had two questions on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an interesting read! The general structure looks sound, here are some comments.
|
By the way, can you also:
|
418677b to
1c60039
Compare
|
@pitrou I still have to do the docs, but feel free to bring up any further points on the code in the meantime. |
|
I just opened a small PR that addresses the chunker issue: #14843. If that gets merged, I have a test I can add for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding docs!
8ba2260 to
4c7a5a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
|
@pitrou Just pushed a fix. Also, not sure if my rationale on the |
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
18e5d3a to
71db481
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks really good now. Great work @benibus !
|
Benchmark runs are scheduled for baseline = 375372b and contender = fe5c9fe. fe5c9fe is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
See: ARROW-17932.
Adds a
json::StreamingReaderclass (modeled aftercsv::StreamingReader) with an async-reentrant interface and support for parallel block decoding.Some parts of the existing
TableReaderimplementation have been refactored to utilize the new facilities.