Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions #5153

kbendick · 2022-06-28T21:46:10Z

We now have 30 Github Actions that run as part of the CI test suite, and it's starting to have a noticeable impact on CI runners.

We test Spark with a large number of combinations of Java versions and Scala versions.

We previously only tested the "latest" Spark version (i.e. Spark 3.2) with Scala 2.13.

We are now testing:

Spark 2 with Java 8 (1 workflow)
Spark 3.0, 3.1, 3.2, 3.3 with Java 8 and Scala 2.12 (4 workflows)
Spark 3.0, 3.1, 3.2, 3.3 with Java 11 and Scala 2.12 (4 workflows)
Spark 3.2, 3.3 with Java 8 and Scala 2.13 (2 workflows)
Spark 3.2, 3.3 with Java 11 and Scala 2.13 (2 workflows)

That brings a total of 13 Spark specific CI variants that run on every PR that touches core or spark.

We should consider reducing the large number of combinations of JRE versions with Scala versions that are run for the various Spark versions, as CI is starting to take a good while longer.

We should also look into (again) refactoring out CI test suites to using callable workflows, such that all tests stem from one root test (very much like an Airflow DAG), so that if any one test fails, they all stop. We get this for free at present for any set of CI suites generated out of one matrix (such as java 11 and java 8 with scala 12).

This will reduce the number of CI slots that are running for tests that will have to be run again (as something else failed).

We can also set up the faster tests first, to ensure they pass, before then calling out to the more expensive tests (such as Spark / Flink etc).

I tried before with the callable workflow, but at the time it wasn't worth the effort. I think now it probably is.

The text was updated successfully, but these errors were encountered:

kbendick · 2022-06-28T21:50:00Z

There's a few small changes I can make so that more Spark workflows will terminate early if one fails, but that won't necessarily resolve the problem entirely by any means.

singhpk234 · 2022-07-09T09:08:59Z

+1 on the above, along with this we can also leverage the GitHub Actions resources from the forked repositories instead of using the resources in ASF organisation at GitHub.

This is what Apache Spark does presently :

We create a PR and our repository triggers the workflow. Our PR uses the resources allocated to us for testing.
Apache Spark repository finds our workflow, and links it in a comment in our PR

relevant PR in spark :

[SPARK-35048][INFRA] Distribute GitHub Actions workflows to fork repositories to share the resources spark#32092

Would love to know your thoughts on the same :)

HyukjinKwon · 2022-12-13T23:54:12Z

I happened to read the related links. Thanks @singhpk234 for elaborating Spark's CI. To be more clear, apache/spark#32092 implemented the logic you explained. After that, I also implemented the logic to leverage GitHub check status (apache/spark#32193).

See one example of test results:

HyukjinKwon · 2022-12-13T23:55:36Z

In this way, we can remove all the overhead in the current repo, and leverage the resources from the forked repositories.
Spark was one of the projects that uses the GitHub resources most in ASF, and now it's one of the lowest after this change :-).

I am willing to help and review if someone tries to pick this changes to Iceberg :-).

github-actions · 2024-08-16T00:13:47Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2024-09-11T00:14:08Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

singhpk234 mentioned this issue Apr 21, 2023

Build: Run Iceberg with JDK 17 #7391

Merged

github-actions bot added the stale label Aug 16, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions #5153

Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions #5153

kbendick commented Jun 28, 2022 •

edited

Loading

kbendick commented Jun 28, 2022

singhpk234 commented Jul 9, 2022 •

edited

Loading

HyukjinKwon commented Dec 13, 2022

HyukjinKwon commented Dec 13, 2022 •

edited

Loading

github-actions bot commented Aug 16, 2024

github-actions bot commented Sep 11, 2024

Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions #5153

Reduce CI Workload by Removing Some Spark Variants and Using Callable Workflows for Github Actions #5153

Comments

kbendick commented Jun 28, 2022 • edited Loading

kbendick commented Jun 28, 2022

singhpk234 commented Jul 9, 2022 • edited Loading

HyukjinKwon commented Dec 13, 2022

HyukjinKwon commented Dec 13, 2022 • edited Loading

github-actions bot commented Aug 16, 2024

github-actions bot commented Sep 11, 2024

kbendick commented Jun 28, 2022 •

edited

Loading

singhpk234 commented Jul 9, 2022 •

edited

Loading

HyukjinKwon commented Dec 13, 2022 •

edited

Loading