Skip to content

[SPARK-16430][SQL][STREAMING] Fixed bug in the maxFilesPerTrigger in FileStreamSource #14143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Jul 11, 2016

What changes were proposed in this pull request?

Incorrect list of files were being allocated to a batch. This caused a file to read multiple times in the multiple batches.

How was this patch tested?

Added unit tests

@SparkQA
Copy link

SparkQA commented Jul 12, 2016

Test build #62126 has finished for PR 14143 at commit a810dd4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Jul 12, 2016

@zsxwing
I am merging this critical bug fix to master and 2.0. Feel free to leave reviews and I will address them in follow up PRs.

@asfgit asfgit closed this in e50efd5 Jul 12, 2016
asfgit pushed a commit that referenced this pull request Jul 12, 2016
…FileStreamSource

## What changes were proposed in this pull request?

Incorrect list of files were being allocated to a batch. This caused a file to read multiple times in the multiple batches.

## How was this patch tested?

Added unit tests

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #14143 from tdas/SPARK-16430-1.

(cherry picked from commit e50efd5)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@zsxwing
Copy link
Member

zsxwing commented Jul 12, 2016

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants