Skip to content

Add scan concurrency config for datafusion source#5945

Merged
AdamGS merged 6 commits intodevelopfrom
adamg/add-scan-concurrency
Feb 17, 2026
Merged

Add scan concurrency config for datafusion source#5945
AdamGS merged 6 commits intodevelopfrom
adamg/add-scan-concurrency

Conversation

@AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Jan 13, 2026

This PR adds a new configuration to the FileSource based API, allowing users to control the intra-partition scan concurrency.

@AdamGS AdamGS requested review from a10y and gatesn January 13, 2026 19:04
@AdamGS AdamGS added the changelog/feature A new feature label Jan 13, 2026
/// during footer parsing.
pub footer_initial_read_size_bytes: usize, default = DEFAULT_FOOTER_INITIAL_READ_SIZE_BYTES
/// The per-file Vortex scan concurrency.
pub scan_concurrency: Option<usize>, default = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't DF let you set this for an entire sessioncontext? do we want to override this on a per-source basis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the copy of VortexOptions that is part of VortexFormat, which propagates it downstream from there through either VortexFormatFactory::create or VortexFormat::file_source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if it makes sense to use the target_partitions config off the environment, but I realize that's different.

Maybe we can make it clear in the doc comment that this is the intra-partition concurrency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intra-partition was the term I was looking for!

@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 60.71429% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.39%. Comparing base (d757be1) to head (f1a5852).
⚠️ Report is 390 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-datafusion/src/persistent/source.rs 30.00% 7 Missing ⚠️
vortex-datafusion/src/persistent/format.rs 62.50% 3 Missing ⚠️
vortex-datafusion/src/persistent/opener.rs 90.00% 1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AdamGS AdamGS force-pushed the adamg/add-scan-concurrency branch from 25c697a to f1a5852 Compare January 14, 2026 12:18
@github-actions
Copy link
Contributor

This PR has been marked as stale because it has been open for 30 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days

@github-actions github-actions bot added stale This PR is stale and will be auto-closed soon and removed stale This PR is stale and will be auto-closed soon labels Feb 14, 2026
@a10y
Copy link
Contributor

a10y commented Feb 16, 2026

I really forgot about this. Think we still want this @AdamGS ?

@AdamGS
Copy link
Contributor Author

AdamGS commented Feb 17, 2026

yeah I think so, I'll try and unconflict it later today 😢

Copy link
Contributor

@a10y a10y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-approving

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS force-pushed the adamg/add-scan-concurrency branch from f1a5852 to 3ff16ab Compare February 17, 2026 16:14
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS added the ext/datafusion Relates to the DataFusion integration label Feb 17, 2026
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@AdamGS AdamGS enabled auto-merge (squash) February 17, 2026 16:22
@AdamGS AdamGS disabled auto-merge February 17, 2026 16:22
@AdamGS AdamGS enabled auto-merge (squash) February 17, 2026 16:22
@AdamGS AdamGS merged commit 9ea526d into develop Feb 17, 2026
46 checks passed
@AdamGS AdamGS deleted the adamg/add-scan-concurrency branch February 17, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature ext/datafusion Relates to the DataFusion integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants