-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem or challenge?
- Part of [EPIC] A collection of items to improve developer / CI speed #13813
- Related to CI: Windows flow takes 1.5h #13726
There is a tension between adding more tests and and PRs and code velocity (more tests --> longer CI)
DataFusion runs a many tests on every change to every PR. For example my most recent PR ran 24 tests (link) consuming over an hour of worker time.
This has several challenges
- Increases CI cycle time and thus how fast contributors get feedback on their PRs
- Limits new test we can add: As we contemplate adding even more testing (which we should) such as the sqllogictest suite adding them to CI would make the time even worse
- wastes worker credits (though we don't run out of credits largely because the Apache Software Foundation has approximately unlimited time from github, but there are broader concerns, like the amount of CO2 generated by that waste 🙈 )
Another observation is that there are several tests also rarely fail in PRs, but offer important coverage such as the Windows and mac tests. We even disabled the Windows test due to its (lack of) speed.
Describe the solution you'd like
I would like to improve the efficiency of existing CI jobs as well as have a mechanism to run both new and existing tests that offer important coverage but take too long to run on each CI
Describe alternatives you've considered
Here are some options from my past lives:
Option 1: Change some tests to only run on merges to main
In this option, we could change some checks to only run on merges to main
. For example, we could run the windows tests only on merges to main rather than also on PRs.
Instead of all jobs being triggered like
# trigger for all PRs and changes to main
on:
push:
branches:
- main
pull_request:
we would change some tests to run like
# trigger for only changes to main
on:
push:
branches:
- main
pros:
- simple to implement (change github branch matching rules)
cons: - PRs could merge to main that broke the tests, so the tests would be brokenand someone would have to chase / fix the tests
- Someone (maintainers?) would have to traige and fix these tests
We could probably add some sort of job that would automatically make revert PRs for any code that broke the main tests to help triage
Option 2: Implement more sophisticated merge flow
In this option, I would imagine a workflow like
- Open a PR
- Subset of tests pass
- PR is "marked for merge" somehow
- Additional longer CI checks run and if pass PR is merged, if fail PR is "unmarked for merge"
This might already be supported by github in Merge Queues
There are probably bots like https://github.com/rust-lang/homu that could automate something like this
pros:
- more automation / less human intervention
cons: - More complicated to setup / maintain
Option 3: Your idea here
Additional context
No response