Skip to content

[SPARK-11225]Prevent generate empty file #9191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

[SPARK-11225]Prevent generate empty file #9191

wants to merge 2 commits into from

Conversation

viper-kun
Copy link
Contributor

If no data will be written into the bucket, it will be generate empty files. So open() must be called in the first write(key,value).

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@JoshRosen
Copy link
Contributor

Hey @viper-kun,

I played around with this optimization myself at one point. If I remember correctly, I think that you might have to update other parts of the code to account for the fact that empty partitions' files will now be missing instead of empty.

Do you have performance benchmarking results that motivated this change? Just curious to know how much of a speedup / benefit this gives.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44033 has finished for PR 9191 at commit 9887ef1.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viper-kun
Copy link
Contributor Author

Thanks @JoshRosen

Sorry, I don't do performacne test. As I know, it will reduce number of open file. When there too much empty file, it will get some benefit.

@SparkQA
Copy link

SparkQA commented Oct 21, 2015

Test build #44053 has finished for PR 9191 at commit 2dd4d1e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * class BinaryClassificationEvaluator @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)\n * class MulticlassClassificationEvaluator @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)\n * final class RegressionEvaluator @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)\n * abstract class ColumnarIterator extends Iterator[InternalRow]\n * class SpecificColumnarIterator extends $\n

@viper-kun
Copy link
Contributor Author

@JoshRosen
In my test environment, it do not have this error. Pls retest it.Thanks!

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #44118 has finished for PR 9191 at commit 2dd4d1e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viper-kun
Copy link
Contributor Author

Hi @davies,
Sorry, I do not understand Python. Can you help me fix it?

@davies
Copy link
Contributor

davies commented Oct 22, 2015

This test is flaky, just re-test it.

@SparkQA
Copy link

SparkQA commented Oct 22, 2015

Test build #1941 has finished for PR 9191 at commit 2dd4d1e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viper-kun
Copy link
Contributor Author

@davies This test is flaky, pls re-test it.

@SparkQA
Copy link

SparkQA commented Oct 24, 2015

Test build #1949 has finished for PR 9191 at commit 2dd4d1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viper-kun
Copy link
Contributor Author

@JoshRosen

Is it ok? If it doesn't work, I will close this pr.

@srowen
Copy link
Member

srowen commented Nov 5, 2015

Note this duplicates #5622

@viper-kun
Copy link
Contributor Author

ok. close it

@viper-kun viper-kun closed this Nov 5, 2015
@srowen
Copy link
Member

srowen commented Nov 5, 2015

OK, Josh also closed the other one. One JIRA is open now. At least have <= 1 PR.

@viper-kun viper-kun deleted the apache branch January 18, 2017 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants