Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move performance testing YAML from dotnet/runtime to dotnet/performance #4639

Merged

Conversation

caaavik-msft
Copy link
Contributor

@caaavik-msft caaavik-msft commented Jan 15, 2025

This PR implements the dotnet/performance portion of the work to move the performance testing YAML to do the dotnet/performance repository. You can find the corresponding change for the dotnet/runtime repository here: dotnet/runtime#111454.

Summary

The goal of this work is to move as much of the CI logic as possible out of the dotnet/runtime repository and into the dotnet/performance repository. The reason we want to make this change is to decouple what is being tested from the code that does the testing.

We have had many occurrences over the last few years where there would be a bug in the performance testing logic that caused invalid/skipped data or we have had to make a breaking change. Having this logic live in the dotnet/runtime repository meant we were unable to re-run performance tests when the bug was present or before a breaking change was made. With the changes from this PR, it means that if we ever have a bug that causes invalid/skipped data, we can make the change in the dotnet/performance repo and then re-run the performance tests with those fixes and correct any data issues.

Implementation

It is not practical to move everything out of the dotnet/runtime repository as there are some steps of our performance tests that may be strongly coupled to the version of the runtime. In particular, these would be any jobs that involve building the runtime or building test applications. To handle this, we have made extensive use of cross-repository template references.

If you look at the dotnet/runtime portion of this PR, you will see that eng/pipelines/coreclr/perf.yml will include the following lines:

resources:
  repositories:
    - repository: performance
      type: git
      name: internal/dotnet-performance

- template: /eng/pipelines/runtime-perf-jobs.yml@performance

There is a repository resource named performance which describes where the performance repository is located. For more information about repository resources, see the documentation here. Then we are able to reference a template from the performance repository by putting @performance at the end. When manually running the pipeline in Azure DevOps, under "Advanced options" there is a section called Resources which allows you to customise a branch/commit of the performance repository you want to run against, which is useful when testing changes that need to be made against both the runtime and performance repositories.

If you look at /eng/pipelines/runtime-perf-jobs.yml in this PR, you can see the following line:

- template: /eng/pipelines/coreclr/templates/perf-build-jobs.yml@${{ parameters.runtimeRepoAlias }}

This line is how the performance repository is able to reference back into the runtime repository and call the perf-build-jobs.yml template inside it. In the dotnet/runtime repository PR, it sets runtimeRepoAlias to self which means that it will reference the exact version of the runtime repository that called it. In a future PR, we will be able to add an additional pipeline to the dotnet/performance repository which will has a runtime repository resource defined so that we can verify that a PR isn't going to cause the runtime performance tests to break.

YAML Convergence

Before this PR, there was a lot of duplicated but slightly differing YAML files that did similar things. As part of this PR we are able to unify all this duplicate logic so that everything is only defined once and doesn't require making changes in lots of files. In particular I want to point out the following two files which are worth paying close attention to (both in the /eng/pipelines/templates/ directory):

  • run-performance-job.yml: Defines a job that clones the runtime and performance repositories, runs the performance job python script, and sends the helix job. Every performance test will go through this YAML file including those that run against the SDK or a build of the runtime.
  • runtime-perf-job.yml: A wrapper around run-performance-job.yml which sets up all the necessary parameters and additional steps for running performance tests against a build of the runtime. Most of this is just downloading the build artifacts and arranging them into the correct directories so that they can be used by the python scripts.

Other Changes

  • Job Names and Display Names
    • Job names are now standardised and should be more human readable. As an example:
      • Before: Performance osx x64 release iOSMono JIT ios_scenarios perfiphone12mini NoJS True False True net10.0
      • After: Performance ios_scenarios iOSMono JIT iOSLlvmBuild osx x64 perfiphone12mini net10.0
    • The ordering of the parts of the job name were changed so it is easier to scan through to find a particular job
    • An additionalJobIdentifier parameter can be specified as extra information to include in the job name if needed to disambiguate two jobs.
  • Applying a parameter to every job
    • If you look at eng/pipelines/coreclr/perf.yml you will see I added a parameter called onlySanityCheck which uses the new jobParameters parameter to set the onlySanityCheck parameter on every job that gets run.
    • I have been using this in my testing and it has greatly sped up testing time because I don't have to block the helix queues up with a large amount of jobs when I don't care about performance test results.

Validation

Since this change is large, it is likely to introduce bugs. To validate that things have been ported correctly, I looked at the Run performance job script step of the job to see the command line arguments that were passed to the python script to ensure that they are identical. There are still potentially other classes of bugs due to variables being different which are harder to detect, but from my testing I expect any bugs to be minimal.

We should also address in a future PR ensuring that onlySanityCheck is properly implemented for all scenarios and test cases as it is only be active for our microbenchmarks. This should be fine for now though as our microbenchmarks are the ones that take the longest.

Copy link
Member

@LoopedBard3 LoopedBard3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General high level functionality check seems to look good to me, will take a closer look at changes individually soon.

@caaavik-msft
Copy link
Contributor Author

caaavik-msft commented Jan 21, 2025

I had a discussion with @e-kharion and there's one particular area that needs focus/discussion before this PR can be merged in and it is about what parts of the runtime-perf-job.yml file should be kept inside the runtime repository rather than moved here. The way we should decide this should be if we think something there is something in that file which is strongly coupled with the code in the dotnet/runtime repository, and has a high likelihood of requiring breaking changes over time, we should define it there. In this PR the only things that are still kept in the runtime repository are anything that builds the artifacts which we consume to run the performance tests. However we could argue the following might also be strongly coupled with the runtime repository:

  • The dependsOn which specifies the name of the build job that is a dependency for the performance test
    • It might be that in the future the name of the build job has to change or we have to have additional dependencies
  • Anything that calls download-artifact-step
    • The artifact names might change
    • We might add additional artifacts
  • Steps which take the downloaded artifact and put all the items in the right directories needed for the performance tests
    • The internal structure of the artifact may change and so this logic may need to change

While each of these have reasons that they may be strongly coupled to the runtime repository, you could also equally say they are strongly coupled to the performance repository. Perhaps we add a new configuration to test and we want to ensure it depends on a job which we have already defined. Or maybe we need to change which artifacts we depend on or we need to arrange the files from the downloaded artifacts in a different way.

I think it is more likely that breaking changes will happen in the performance repository than the runtime repository and that's why I defined all of them here. If we do have a breaking change like I described above in the runtime repository in the future, it might be possible for us to make it backwards compatible by using version numbers or something similar. The other important thing is that we want to be able to test old versions of the runtime repository but there is no need to test against old versions of the performance repository, that means if there is ever a bug in any of this logic we can always fix it and re-run it, but we can't do the same if this logic lives in the runtime repository.

@LoopedBard3 LoopedBard3 requested a review from Copilot January 30, 2025 19:11
LoopedBard3
LoopedBard3 previously approved these changes Jan 30, 2025
Copy link
Member

@LoopedBard3 LoopedBard3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, and everything seems to be accounted for. For the runtime-perf-job.yml question on what should go where, I don't have a strong preference for a particular approach. I think keeping as much in the performance repo, even if strongly coupled to dotnet/runtime, is preferable from a retesting capabilities standpoint.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 7 out of 19 changed files in this pull request and generated no comments.

Files not reviewed (12)
  • eng/performance/benchmark_jobs.yml: Language not supported
  • eng/performance/gc_jobs.yml: Language not supported
  • eng/performance/scenarios.yml: Language not supported
  • eng/performance/send-to-helix.yml: Language not supported
  • eng/pipelines/runtime-perf-jobs.yml: Evaluated as low risk
  • eng/pipelines/runtime-slow-perf-jobs.yml: Evaluated as low risk
  • eng/pipelines/runtime-wasm-perf-jobs.yml: Evaluated as low risk
  • eng/pipelines/templates/build-machine-matrix.yml: Evaluated as low risk
  • eng/pipelines/templates/download-artifact-step.yml: Evaluated as low risk
  • eng/pipelines/runtime-ios-scenarios-perf-jobs.yml: Evaluated as low risk
  • scripts/run_performance_job.py: Evaluated as low risk
  • eng/pipelines/templates/run-performance-job-script-step.yml: Evaluated as low risk
@DrewScoggins DrewScoggins self-requested a review February 3, 2025 17:15
@caaavik-msft caaavik-msft merged commit 304c424 into dotnet:main Feb 3, 2025
70 of 83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants