ARROW-11972: [C++][R][Python][Dataset] Extract IPC/Parquet fragment scan options #9790

lidavidm · 2021-03-24T15:11:32Z

No description provided.

lidavidm · 2021-03-24T15:12:58Z

python/pyarrow/_dataset.pyx

This is mostly, but not fully, backwards compatible (something like ds.ReadOptions(use_buffered_stream=True) no longer works) - is that an issue?

github-actions · 2021-03-24T17:04:05Z

https://issues.apache.org/jira/browse/ARROW-11972

lidavidm · 2021-03-24T18:57:20Z

(I am working on the CI failures. It appears to only occur in release mode under Windows.)

lidavidm · 2021-03-25T11:53:50Z

cpp/src/arrow/dataset/file_ipc.h

Note I didn't bother exposing this to Python/R since the IPC options in general aren't really exposed.

lidavidm · 2021-03-25T13:56:15Z

@ursabot please benchmark

ursabot · 2021-03-25T13:56:19Z

Benchmark runs are scheduled for baseline = 9262a5d and contender = 01895d4cb3b4c8b47b744cc0339a6940a1f5cd5b. Results will be available as each benchmark for each run completes:
[Finished] ursa-dgx1: https://conbench.ursa.dev/compare/runs/45bf8238-f80f-4350-a5ca-20f4c4bce4c5...e76d8f35-b7f1-41cd-9258-da4e987fe93e/
[Finished] ursa-i9-9960x: https://conbench.ursa.dev/compare/runs/990971da-1f3c-4e7c-9260-55d3df2f2afa...13e5c6a9-38f4-4b77-80ce-a7dbbb62b970/
[Finished] ec2-t3-large-us-east-2: https://conbench.ursa.dev/compare/runs/ac060baf-fd23-4883-be2a-0ef39aafe390...a88bd410-2a04-4a42-b067-b5a33aa0b0ea/
[Finished] ec2-t3-xlarge-us-east-2: https://conbench.ursa.dev/compare/runs/918b9d31-06b4-4a84-a020-09ad9d1ca8f3...ed881f27-439f-49a9-8d1f-f538b38edbc3/

nealrichardson

Ok with me on the R side, looks like effectively no change for the usual R use case.

cpp/src/arrow/dataset/file_ipc.cc

cpp/src/arrow/dataset/file_parquet.h

cpp/src/arrow/dataset/type_fwd.h

ursabot · 2021-03-25T19:15:08Z

Benchmark runs are scheduled for baseline = 7692461 and contender = 2b1a6a6ec9e1e8cf92bb490e6a93ede3300e21c7. Results will be available as each benchmark for each run completes:
[Failed] ursa-dgx1: https://conbench.ursa.dev/compare/runs/30771e19-27cd-4aaf-b624-f13e2865e39d...87aec4ff-bc07-4670-a984-44657ac305b7/
[Failed] ursa-i9-9960x: https://conbench.ursa.dev/compare/runs/581ff5c5-ffe0-4f7e-964a-a7d5182f6754...e1eafb67-1b34-434e-96f0-29313eda621c/
[Failed] ec2-t3-large-us-east-2: https://conbench.ursa.dev/compare/runs/87b8fa5e-0c0b-443f-959c-5d632710ee1a...6bf067d8-9664-4336-92e9-cd0da079d1f3/
[Failed] ec2-t3-xlarge-us-east-2: https://conbench.ursa.dev/compare/runs/d9eed87e-63c6-4e39-a871-661e981faab3...a734519a-d29a-4ff6-8cb5-f6d9dc7cce97/

ursabot · 2021-03-25T20:45:08Z

Benchmark runs are scheduled for baseline = 7692461 and contender = 1396d4699a07fe1449b57d91c80d6dd8cd67ac34. Results will be available as each benchmark for each run completes:
[Failed] ursa-dgx1: https://conbench.ursa.dev/compare/runs/30771e19-27cd-4aaf-b624-f13e2865e39d...337e6a7d-dedf-4946-a4f1-ef808847f20f/
[Failed] ursa-i9-9960x: https://conbench.ursa.dev/compare/runs/581ff5c5-ffe0-4f7e-964a-a7d5182f6754...dbff0049-8e3b-4c28-93b5-9ae66f957f47/
[Failed] ec2-t3-large-us-east-2: https://conbench.ursa.dev/compare/runs/87b8fa5e-0c0b-443f-959c-5d632710ee1a...ce8c86f0-f3ba-4b70-a902-f471d938cd64/
[Failed] ec2-t3-xlarge-us-east-2: https://conbench.ursa.dev/compare/runs/d9eed87e-63c6-4e39-a871-661e981faab3...7fd02c93-0ded-4e16-b068-b20d85942c31/

ursabot · 2021-03-25T22:30:12Z

Benchmark runs are scheduled for baseline = 7692461 and contender = fbdf30f4856b99137dfb768fa4173e0a962ef250. Results will be available as each benchmark for each run completes:
[Finished] ursa-dgx1: https://conbench.ursa.dev/compare/runs/30771e19-27cd-4aaf-b624-f13e2865e39d...d6100ade-5f79-471b-8d89-713f520cad27/
[Finished] ursa-i9-9960x: https://conbench.ursa.dev/compare/runs/581ff5c5-ffe0-4f7e-964a-a7d5182f6754...e85390ab-7a83-4b41-a7dd-d9da33ef3b19/
[Finished] ec2-t3-large-us-east-2: https://conbench.ursa.dev/compare/runs/87b8fa5e-0c0b-443f-959c-5d632710ee1a...e8f67ecc-0293-489a-a01d-5e4520e120d5/
[Finished] ec2-t3-xlarge-us-east-2: https://conbench.ursa.dev/compare/runs/d9eed87e-63c6-4e39-a871-661e981faab3...01243b8a-6f05-45c0-a0aa-2bd79cf460fd/

ursabot · 2021-04-07T13:15:15Z

Benchmark runs are scheduled for baseline = d95c72f and contender = ebc7c60. Results will be available as each benchmark for each run completes:
[Finished] ursa-i9-9960x: https://conbench.ursa.dev/compare/runs/5b187ec1-ad7b-4506-a966-c68b161e0484...4cf3a81d-fa84-46d9-80f9-830b22cc275f/
[Finished] ursa-thinkcentre-m75q: https://conbench.ursa.dev/compare/runs/ed79f796-a744-4711-9e12-b74206ea4076...b5d429eb-de8b-4fbe-b560-ef5b9e21dd2b/
[Finished] ec2-t3-large-us-east-2: https://conbench.ursa.dev/compare/runs/55881d4c-27a5-40e5-94f5-262d15e2fb73...21a13a2c-e27a-4ba1-97b6-9644b6a126a8/
[Finished] ec2-t3-xlarge-us-east-2: https://conbench.ursa.dev/compare/runs/819a5aa0-8906-4d98-8934-b5d499cbdd25...bf66904c-019c-486f-99d6-3bf6cf41684b/

bkietz

Thanks for doing this!

lidavidm added Component: R Component: C++ Component: Python labels Mar 24, 2021

lidavidm commented Mar 24, 2021

View reviewed changes

lidavidm marked this pull request as draft March 24, 2021 18:57

lidavidm force-pushed the arrow-11972 branch from 460e477 to 01895d4 Compare March 25, 2021 11:52

lidavidm commented Mar 25, 2021

View reviewed changes

lidavidm marked this pull request as ready for review March 25, 2021 13:26

lidavidm requested a review from bkietz March 25, 2021 13:26

nealrichardson approved these changes Mar 25, 2021

View reviewed changes

bkietz requested changes Mar 25, 2021

View reviewed changes

lidavidm force-pushed the arrow-11972 branch from 01895d4 to 2b1a6a6 Compare March 25, 2021 19:04

lidavidm requested a review from bkietz March 26, 2021 19:38

lidavidm added 9 commits April 7, 2021 08:59

ARROW-11972: [C++][Dataset] Add IpcFragmentScanOptions

165298a

ARROW-11972: [C++][Dataset] Add ParquetFragmentScanOptions

ee45743

ARROW-11972: [Python][Dataset] Add ParquetFragmentScanOptions

921029d

ARROW-11972: [R][Dataset] Add ParquetFragmentScanOptions

4aabd3b

Try to fix Windows tests by using less memory

74c0192

Try to fix Windows tests

3cc2a3b

Directly embed Parquet properties

ed2c4dc

Remove unnecessary test

987aec1

Add missing include

ebc7c60

lidavidm force-pushed the arrow-11972 branch from fbdf30f to ebc7c60 Compare April 7, 2021 13:00

bkietz approved these changes Apr 12, 2021

View reviewed changes

bkietz closed this in 66e1d2b Apr 12, 2021

asfimport mentioned this pull request Apr 12, 2021

[C++][Dataset] Extract IpcFragmentScanOptions, ParquetFragmentScanOptions #27804

Closed

ARROW-11972: [C++][R][Python][Dataset] Extract IPC/Parquet fragment scan options #9790

ARROW-11972: [C++][R][Python][Dataset] Extract IPC/Parquet fragment scan options #9790

Uh oh!

Conversation

lidavidm commented Mar 24, 2021

Uh oh!

lidavidm Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 24, 2021

Uh oh!

lidavidm commented Mar 24, 2021

Uh oh!

lidavidm Mar 25, 2021

Choose a reason for hiding this comment

Uh oh!

lidavidm commented Mar 25, 2021

Uh oh!

ursabot commented Mar 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nealrichardson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ursabot commented Mar 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ursabot commented Mar 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ursabot commented Mar 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ursabot commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ursabot commented Mar 25, 2021 •

edited

Loading

ursabot commented Mar 25, 2021 •

edited

Loading

ursabot commented Mar 25, 2021 •

edited

Loading

ursabot commented Mar 25, 2021 •

edited

Loading

ursabot commented Apr 7, 2021 •

edited

Loading