Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark3 structured streaming micro_batch read support #1

Merged
merged 28 commits into from
Jun 2, 2021

Conversation

SreeramGarlapati
Copy link
Owner

This work is an extension of the idea in issue apache#179 & the Spark2 work done in PR apache#2272 - only that - this is for Spark3.

In the current implementation:

  • Iceberg Snapshot is the upper bound for MicroBatch. A given MicroBatch will only Span within a Snapshot. It will not be composed of multiple Snapshots. BatchSize - is used to limit the number of files with in a given snapshot.
  • The streaming reader - will error out if it encounters any Snapshot of type NOT EQUAL to type APPEND.
  • Handling DELETES, REPLACE & OVERWRITES is something for future.
  • Columnar reads are not enabled. Something for future.

cc: @aokolnychyi & @RussellSpitzer & @holdenk @rdblue @rdsr

@github-actions github-actions bot added the SPARK label Jun 2, 2021
@SreeramGarlapati SreeramGarlapati merged commit 41041f3 into spark3.stream.read.baseline Jun 2, 2021
@SreeramGarlapati SreeramGarlapati deleted the spark3.stream.read.1 branch June 2, 2021 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant