Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Split up Jenkins into platform-specific jobs #13337

Closed
6 of 8 tasks
driazati opened this issue Nov 9, 2022 · 2 comments
Closed
6 of 8 tasks

[ci] Split up Jenkins into platform-specific jobs #13337

driazati opened this issue Nov 9, 2022 · 2 comments
Assignees
Labels
type:ci Relates to TVM CI infrastructure

Comments

@driazati
Copy link
Member

driazati commented Nov 9, 2022

Right now all CI (excluding GitHub Actions, which this doesn't address at all) goes through 1 indirection to get any useful information outside of a basic pass/fail, the tvm-ci/pr-head job here:

image

The tvm-ci/pr-head job then mixes all the tests together, which with sharding means dozens of tests in a long vertical column where it's hard to find exactly what failed. driazati#38 shows an alternative where each platform (cpu, gpu, arm, etc) has its own job that reports from Jenkins to GitHub independently. To implement this, (1) Jenkins needs to be configured with job definitions for each of these platforms and (2) the Jenkinsfile in apache/tvm needs to be broken up

Follow up fixes:

This will remove the Jenkinsfile at the top level, so Docker image updates would happen now in the ci/jenkins/data.py file which has the source data for the Jenkinsfile templates.

cc @Mousius @areusch @gigiblender @leandron

@driazati driazati added the type:ci Relates to TVM CI infrastructure label Nov 9, 2022
@driazati driazati self-assigned this Nov 9, 2022
driazati added a commit that referenced this issue Dec 6, 2022
This breaks up the Jenkinsfile into ones for GPU, CPU, etc. This removes a false dependency between the build and test steps (e.g. before the GPU tests had to wait on the Hexagon build to complete) and makes the Jenkins UI a bit better since there's not 30 tests to scroll through to find a failure. An example can be found in my fork here: driazati#38 in the checks box. Before this is merged https://github.com/tlc-pack/ci/blob/main/jenkins/jenkins-jobs/prod/tvm.yaml will need to be updated to accept webhooks from apache/tvm instead of my fork.

See #13337 for more context
driazati added a commit to driazati/tvm that referenced this issue Dec 7, 2022
See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with
the new jobs
Mousius pushed a commit that referenced this issue Dec 7, 2022
See #13337 for more context, this fixes `@tvm-bot rerun` to work with
the new jobs
@driazati
Copy link
Member Author

driazati commented Jan 4, 2023

This is done now

@driazati driazati closed this as completed Jan 4, 2023
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this issue Mar 27, 2023
See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with
the new jobs
@quic-sanirudh
Copy link
Contributor

@driazati I know this question is probably quite late, but I see that in many PRs, when CI lint fails, the other jobs keep running. I see that there was a task above to fail other jobs when one fails (Cancel the remaining jobs when any fails (or if just lint fails)).

I wonder if this is possible to be enabled now, as I see PRs that have one job failed, but others are still running, and they take up resources causing many other PRs to be waiting (especially for GPU resources).

Would it be possible to kill the other jobs when one fails, or could we enable that at least for lint failures? Could there be other issues because of doing something like this?

mikeseven pushed a commit to mikeseven/tvm that referenced this issue Sep 27, 2023
This breaks up the Jenkinsfile into ones for GPU, CPU, etc. This removes a false dependency between the build and test steps (e.g. before the GPU tests had to wait on the Hexagon build to complete) and makes the Jenkins UI a bit better since there's not 30 tests to scroll through to find a failure. An example can be found in my fork here: driazati#38 in the checks box. Before this is merged https://github.com/tlc-pack/ci/blob/main/jenkins/jenkins-jobs/prod/tvm.yaml will need to be updated to accept webhooks from apache/tvm instead of my fork.

See apache#13337 for more context
mikeseven pushed a commit to mikeseven/tvm that referenced this issue Sep 27, 2023
See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with
the new jobs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:ci Relates to TVM CI infrastructure
Projects
None yet
Development

No branches or pull requests

2 participants