[ci] Split up Jenkins into platform-specific jobs #13337

driazati · 2022-11-09T22:27:04Z

Right now all CI (excluding GitHub Actions, which this doesn't address at all) goes through 1 indirection to get any useful information outside of a basic pass/fail, the tvm-ci/pr-head job here:

The tvm-ci/pr-head job then mixes all the tests together, which with sharding means dozens of tests in a long vertical column where it's hard to find exactly what failed. driazati#38 shows an alternative where each platform (cpu, gpu, arm, etc) has its own job that reports from Jenkins to GitHub independently. To implement this, (1) Jenkins needs to be configured with job definitions for each of these platforms and (2) the Jenkinsfile in apache/tvm needs to be broken up

[ci] Split Jenkinsfile into platform-specific jobs #13300 splits up the Jenkinsfile into multiple ones per platform, which each of the jobs from (1) are set up to read. This doesn't do anything on its own.
for (1) that happens in https://github.com/tlc-pack/ci with the jenkins jobs here: https://github.com/tlc-pack/ci/blob/main/jenkins/jenkins-jobs/prod/tvm.yaml. Move temp jenkins jobs to apache/tvm tlc-pack/ci#58 has the changes to move them over from my test repo to the main repo. Once merged this will start reporting the new statuses to GitHub
Once the previous step is working we can merge [ci] Remove Jenkinsfile for migration to platform-specific jobs #13316 which will make the new statuses required and get rid of the old tvm-ci/pr-head requirement.
Another PR in tlc-pack/ci to remove the tvm-ci/pr-head will stop it reporting to GitHub: tlc-pack/ci@20bdc59

Follow up fixes:

Make tvm-bot rerun aware of the new jobs: [ci] Make tvm-bot aware of platform specific jobs #13571
Cancel the remaining jobs when any fails (or if just lint fails)
Re-balance shards (e.g. for hexagon / cortexm)
Fix the docs deploy: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/main/3/ [ci] Fix docs deploy #13570

This will remove the Jenkinsfile at the top level, so Docker image updates would happen now in the ci/jenkins/data.py file which has the source data for the Jenkinsfile templates.

cc @Mousius @areusch @gigiblender @leandron

The text was updated successfully, but these errors were encountered:

This breaks up the Jenkinsfile into ones for GPU, CPU, etc. This removes a false dependency between the build and test steps (e.g. before the GPU tests had to wait on the Hexagon build to complete) and makes the Jenkins UI a bit better since there's not 30 tests to scroll through to find a failure. An example can be found in my fork here: driazati#38 in the checks box. Before this is merged https://github.com/tlc-pack/ci/blob/main/jenkins/jenkins-jobs/prod/tvm.yaml will need to be updated to accept webhooks from apache/tvm instead of my fork. See #13337 for more context

See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

See #13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

driazati · 2023-01-04T18:32:09Z

This is done now

See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

quic-sanirudh · 2023-07-07T08:39:07Z

@driazati I know this question is probably quite late, but I see that in many PRs, when CI lint fails, the other jobs keep running. I see that there was a task above to fail other jobs when one fails (Cancel the remaining jobs when any fails (or if just lint fails)).

I wonder if this is possible to be enabled now, as I see PRs that have one job failed, but others are still running, and they take up resources causing many other PRs to be waiting (especially for GPU resources).

Would it be possible to kill the other jobs when one fails, or could we enable that at least for lint failures? Could there be other issues because of doing something like this?

This breaks up the Jenkinsfile into ones for GPU, CPU, etc. This removes a false dependency between the build and test steps (e.g. before the GPU tests had to wait on the Hexagon build to complete) and makes the Jenkins UI a bit better since there's not 30 tests to scroll through to find a failure. An example can be found in my fork here: driazati#38 in the checks box. Before this is merged https://github.com/tlc-pack/ci/blob/main/jenkins/jenkins-jobs/prod/tvm.yaml will need to be updated to accept webhooks from apache/tvm instead of my fork. See apache#13337 for more context

See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

driazati added the type:ci Relates to TVM CI infrastructure label Nov 9, 2022

driazati self-assigned this Nov 9, 2022

driazati mentioned this issue Nov 9, 2022

[ci] Split Jenkinsfile into platform-specific jobs #13300

Merged

driazati added a commit to driazati/tvm that referenced this issue Dec 7, 2022

[ci] Make tvm-bot aware of platform specific jobs

97dcdf9

See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

driazati mentioned this issue Dec 7, 2022

[ci] Make tvm-bot aware of platform specific jobs #13571

Merged

Mousius pushed a commit that referenced this issue Dec 7, 2022

[ci] Make tvm-bot aware of platform specific jobs (#13571)

acef2ed

See #13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

driazati closed this as completed Jan 4, 2023

fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this issue Mar 27, 2023

[ci] Make tvm-bot aware of platform specific jobs (apache#13571)

aaa80e1

See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

driazati mentioned this issue Apr 13, 2023

[ci] last-successful job is disabled #14618

Closed

mikeseven pushed a commit to mikeseven/tvm that referenced this issue Sep 27, 2023

[ci] Make tvm-bot aware of platform specific jobs (apache#13571)

c1c8e81

See apache#13337 for more context, this fixes `@tvm-bot rerun` to work with the new jobs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] Split up Jenkins into platform-specific jobs #13337

[ci] Split up Jenkins into platform-specific jobs #13337

driazati commented Nov 9, 2022 •

edited

Loading

driazati commented Jan 4, 2023

quic-sanirudh commented Jul 7, 2023

[ci] Split up Jenkins into platform-specific jobs #13337

[ci] Split up Jenkins into platform-specific jobs #13337

Comments

driazati commented Nov 9, 2022 • edited Loading

driazati commented Jan 4, 2023

quic-sanirudh commented Jul 7, 2023

driazati commented Nov 9, 2022 •

edited

Loading