Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Mar 12, 2021

  • ScanContext/ScanOptions have been merged, since they were essentially always passed together.
  • For scan options that are specific to a scan (e.g. CSV conversion options), a new FragmentScanOptions has been added to ScanOptions. Currently only CSV has this and it only wraps csv::ConvertOptions; follow up issues can tackle the rest.
  • GLib, Python, R, and Ruby bindings have been updated.

@github-actions
Copy link

@lidavidm lidavidm changed the title ARROW-9749: [C++][GLib][Python][R][Ruby][Dataset] WIP: Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions ARROW-9749: [C++][GLib][Python][R][Ruby][Dataset] Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions Mar 12, 2021
@lidavidm
Copy link
Member Author

As it stands implementing fragment scan options for a union dataset is a bit tricky, because the fragment doesn't know which dataset it originally belonged to (and it would be weird for every scan task implementation to have to know about a hypothetical UnionFragmentScanOptions). We could add a FragmentScanOptions field to Fragment, allowing UnionDataset::GetFragmentsImpl to do the work of matching up options to fragments there. (That'd also let an advanced caller manually specify per-fragment options if they really wanted to.)

However given we might just delete UnionDataset I'm going to leave this out of this PR for now.

@lidavidm lidavidm marked this pull request as ready for review March 15, 2021 13:10
@lidavidm lidavidm requested a review from bkietz March 15, 2021 13:10
@lidavidm
Copy link
Member Author

@mrkn would you be willing to look at the GLib/Datasets bindings changes here again? This reduces the API surface that has to be bound by a bit, fortunately.

@mrkn
Copy link
Member

mrkn commented Mar 15, 2021

@lidavidm I will look at it tomorrow.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this cleanup, just a couple comments:

@bkietz
Copy link
Member

bkietz commented Mar 15, 2021

Could you add a follow up JIRAs for extracting ParquetFragmentScanOptions and IpcFragmentScanOptions?

@lidavidm
Copy link
Member Author

Could you add a follow up JIRAs for extracting ParquetFragmentScanOptions and IpcFragmentScanOptions?

Done, ARROW-11972.

@mrkn
Copy link
Member

mrkn commented Mar 16, 2021

+1 for GLib part.

@mrkn
Copy link
Member

mrkn commented Mar 16, 2021

I created a new issue to support CsvFragmentScanOption in GLib binding.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants