-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] Split up Jenkins into platform-specific jobs #13337
Comments
This breaks up the Jenkinsfile into ones for GPU, CPU, etc. This removes a false dependency between the build and test steps (e.g. before the GPU tests had to wait on the Hexagon build to complete) and makes the Jenkins UI a bit better since there's not 30 tests to scroll through to find a failure. An example can be found in my fork here: driazati#38 in the checks box. Before this is merged https://github.com/tlc-pack/ci/blob/main/jenkins/jenkins-jobs/prod/tvm.yaml will need to be updated to accept webhooks from apache/tvm instead of my fork. See #13337 for more context
See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs
See #13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs
This is done now |
See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs
@driazati I know this question is probably quite late, but I see that in many PRs, when CI lint fails, the other jobs keep running. I see that there was a task above to fail other jobs when one fails ( I wonder if this is possible to be enabled now, as I see PRs that have one job failed, but others are still running, and they take up resources causing many other PRs to be waiting (especially for GPU resources). Would it be possible to kill the other jobs when one fails, or could we enable that at least for lint failures? Could there be other issues because of doing something like this? |
This breaks up the Jenkinsfile into ones for GPU, CPU, etc. This removes a false dependency between the build and test steps (e.g. before the GPU tests had to wait on the Hexagon build to complete) and makes the Jenkins UI a bit better since there's not 30 tests to scroll through to find a failure. An example can be found in my fork here: driazati#38 in the checks box. Before this is merged https://github.com/tlc-pack/ci/blob/main/jenkins/jenkins-jobs/prod/tvm.yaml will need to be updated to accept webhooks from apache/tvm instead of my fork. See apache#13337 for more context
See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs
Right now all CI (excluding GitHub Actions, which this doesn't address at all) goes through 1 indirection to get any useful information outside of a basic pass/fail, the
tvm-ci/pr-head
job here:The
tvm-ci/pr-head
job then mixes all the tests together, which with sharding means dozens of tests in a long vertical column where it's hard to find exactly what failed. driazati#38 shows an alternative where each platform (cpu, gpu, arm, etc) has its own job that reports from Jenkins to GitHub independently. To implement this, (1) Jenkins needs to be configured with job definitions for each of these platforms and (2) theJenkinsfile
in apache/tvm needs to be broken uptvm-ci/pr-head
requirement.tvm-ci/pr-head
will stop it reporting to GitHub: tlc-pack/ci@20bdc59Follow up fixes:
This will remove the
Jenkinsfile
at the top level, so Docker image updates would happen now in theci/jenkins/data.py
file which has the source data for the Jenkinsfile templates.cc @Mousius @areusch @gigiblender @leandron
The text was updated successfully, but these errors were encountered: