-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-31471][TESTS][BUILD] Add a python script to run multiple benchmarks #28244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@HyukjinKwon @dongjoon-hyun I think this script is useful for anyone who cares about Spark performance. |
from sparktestsupport.shellutils import run_cmd | ||
|
||
benchmarks = [ | ||
['sql/test', 'org.apache.spark.sql.execution.benchmark.AggregateBenchmark'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max, can we do testOnly
or listing up files in some specific directories so we don't need to add here when we add a new benchmark if this targets to list up all benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does runMain
of specific class. It is recommended way of running benchmarks, look at all benchmarks comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listing up files in some specific directories so we don't need to add here when we add a new benchmark if this targets to list up all benchmarks?
I don't like this automation because I usually run sub-set of benchmarks and for me it is easier to just comment unneeded benchmarks in the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My use case is running some of benchmarks during night, let's say date-time related + parquet/orc. So, I don't need to run all of them because it can take a day to complete. I put other benchmarks (its project and class names) because I had already tested them in the PR #27078
Test build #121411 has finished for PR 28244 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also made and used a private one. So, an official script to run all benchmarks might be helpful for release managers and someone who want. However, if this PR doesn't aim for all benchmark, I'm -1 for maintaining this partial one.
For personal use cases, I'd like to recommend to maintain your custom one privately.
I don't understand your position. Let me ask you how often you run all benchmarks? My use case, and I believe this is most common use case is to run a few benchmarks and don't track each of them during night. And this is main purpose of the script. Each time, I have to find my PR with the script and copy paste it to new EC2 instance.
but why? why you don't want to make life of other developers easier?
Maintaining? Don't think, the cost of maintaining of few lines is so high. |
I'm not sure why you don't automate your environment. I'm using terraform/ansible to create the same EC2 benchmark machine and clone Spark and checkout PR(or dev branch) and uploading my benchmark script to the machine for a long time. It's a pretty standard way of automation. |
cc @gatorsmile . |
Thank you @dongjoon-hyun @HyukjinKwon for review. I am closing this. |
What changes were proposed in this pull request?
Added script from the PR #27078 which allows to run multiple benchmarks.
The script should be executed from Spark's home dir:
Why are the changes needed?
Currently, I have to track when one benchmark finishes to launch the next one. This is inconvenient, especially when need to run many benchmarks 3-5 or more. Need to periodically check that current benchmark completed already.
Does this PR introduce any user-facing change?
No
How was this patch tested?
By running the script manually