[SPARK-31471][TESTS][BUILD] Add a python script to run multiple benchmarks #28244

MaxGekk · 2020-04-17T13:02:40Z

What changes were proposed in this pull request?

Added script from the PR #27078 which allows to run multiple benchmarks.

The script should be executed from Spark's home dir:

$ ./dev/run-benchmarks.py

Why are the changes needed?

Currently, I have to track when one benchmark finishes to launch the next one. This is inconvenient, especially when need to run many benchmarks 3-5 or more. Need to periodically check that current benchmark completed already.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the script manually

MaxGekk · 2020-04-17T13:04:09Z

@HyukjinKwon @dongjoon-hyun I think this script is useful for anyone who cares about Spark performance.

HyukjinKwon · 2020-04-17T14:00:07Z

dev/run-benchmarks.py

+from sparktestsupport.shellutils import run_cmd
+
+benchmarks = [
+    ['sql/test', 'org.apache.spark.sql.execution.benchmark.AggregateBenchmark'],


Max, can we do testOnly or listing up files in some specific directories so we don't need to add here when we add a new benchmark if this targets to list up all benchmarks?

It does runMain of specific class. It is recommended way of running benchmarks, look at all benchmarks comments.

listing up files in some specific directories so we don't need to add here when we add a new benchmark if this targets to list up all benchmarks?

I don't like this automation because I usually run sub-set of benchmarks and for me it is easier to just comment unneeded benchmarks in the script.

My use case is running some of benchmarks during night, let's say date-time related + parquet/orc. So, I don't need to run all of them because it can take a day to complete. I put other benchmarks (its project and class names) because I had already tested them in the PR #27078

SparkQA · 2020-04-17T15:49:06Z

Test build #121411 has finished for PR 28244 at commit 2afdc11.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

I also made and used a private one. So, an official script to run all benchmarks might be helpful for release managers and someone who want. However, if this PR doesn't aim for all benchmark, I'm -1 for maintaining this partial one.

For personal use cases, I'd like to recommend to maintain your custom one privately.

MaxGekk · 2020-04-17T18:24:22Z

However, if this PR doesn't aim for all benchmark, I'm -1 for maintaining this partial one.

I don't understand your position. Let me ask you how often you run all benchmarks? My use case, and I believe this is most common use case is to run a few benchmarks and don't track each of them during night. And this is main purpose of the script. Each time, I have to find my PR with the script and copy paste it to new EC2 instance.

For personal use cases, I'd like to recommend to maintain your custom one privately.

but why? why you don't want to make life of other developers easier?

for maintaining this partial one.

Maintaining? Don't think, the cost of maintaining of few lines is so high.

dongjoon-hyun · 2020-04-17T19:10:06Z

I'm not sure why you don't automate your environment. I'm using terraform/ansible to create the same EC2 benchmark machine and clone Spark and checkout PR(or dev branch) and uploading my benchmark script to the machine for a long time. It's a pretty standard way of automation.

dongjoon-hyun · 2020-04-17T19:11:32Z

cc @gatorsmile .

MaxGekk · 2020-04-21T06:40:57Z

Thank you @dongjoon-hyun @HyukjinKwon for review. I am closing this.

Add script

2afdc11

probot-autolabeler bot added the BUILD label Apr 17, 2020

MaxGekk changed the title ~~[SPARK-31471][TESTS] Add a python script to run multiple benchmarks~~ [SPARK-31471][TESTS][BUILD] Add a python script to run multiple benchmarks Apr 17, 2020

HyukjinKwon added the TESTS label Apr 17, 2020

HyukjinKwon reviewed Apr 17, 2020

View reviewed changes

dongjoon-hyun requested changes Apr 17, 2020

View reviewed changes

MaxGekk closed this Apr 21, 2020

MaxGekk deleted the dev-run-benchmarks branch June 5, 2020 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-31471][TESTS][BUILD] Add a python script to run multiple benchmarks #28244

[SPARK-31471][TESTS][BUILD] Add a python script to run multiple benchmarks #28244

Uh oh!

MaxGekk commented Apr 17, 2020 •

edited

Loading

Uh oh!

MaxGekk commented Apr 17, 2020

Uh oh!

HyukjinKwon Apr 17, 2020

Uh oh!

MaxGekk Apr 17, 2020

Uh oh!

MaxGekk Apr 17, 2020

Uh oh!

MaxGekk Apr 17, 2020

Uh oh!

SparkQA commented Apr 17, 2020

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

MaxGekk commented Apr 17, 2020 •

edited

Loading

Uh oh!

dongjoon-hyun commented Apr 17, 2020 •

edited

Loading

Uh oh!

dongjoon-hyun commented Apr 17, 2020

Uh oh!

MaxGekk commented Apr 21, 2020

Uh oh!

Uh oh!

[SPARK-31471][TESTS][BUILD] Add a python script to run multiple benchmarks #28244

[SPARK-31471][TESTS][BUILD] Add a python script to run multiple benchmarks #28244

Uh oh!

Conversation

MaxGekk commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MaxGekk commented Apr 17, 2020

Uh oh!

HyukjinKwon Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

MaxGekk Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

MaxGekk Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

MaxGekk Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 17, 2020

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Apr 17, 2020

Uh oh!

MaxGekk commented Apr 21, 2020

Uh oh!

Uh oh!

MaxGekk commented Apr 17, 2020 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

MaxGekk commented Apr 17, 2020 •

edited

Loading

dongjoon-hyun commented Apr 17, 2020 •

edited

Loading