-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-11225]Prevent generate empty file #9191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Jenkins, this is ok to test. |
Hey @viper-kun, I played around with this optimization myself at one point. If I remember correctly, I think that you might have to update other parts of the code to account for the fact that empty partitions' files will now be missing instead of empty. Do you have performance benchmarking results that motivated this change? Just curious to know how much of a speedup / benefit this gives. |
Test build #44033 has finished for PR 9191 at commit
|
Thanks @JoshRosen Sorry, I don't do performacne test. As I know, it will reduce number of open file. When there too much empty file, it will get some benefit. |
Test build #44053 has finished for PR 9191 at commit
|
@JoshRosen |
Jenkins, retest this please. |
Test build #44118 has finished for PR 9191 at commit
|
Hi @davies, |
This test is flaky, just re-test it. |
Test build #1941 has finished for PR 9191 at commit
|
@davies This test is flaky, pls re-test it. |
Test build #1949 has finished for PR 9191 at commit
|
Is it ok? If it doesn't work, I will close this pr. |
Note this duplicates #5622 |
ok. close it |
OK, Josh also closed the other one. One JIRA is open now. At least have <= 1 PR. |
If no data will be written into the bucket, it will be generate empty files. So open() must be called in the first write(key,value).