Improve CI speed

Our CI has been running slow recently.  Qing-Qing, Yu Yang, Helin, Chen Xi, Ya-ming, Yi-bing, and I discussed this issue and here are what we learned and what we are going to do:

#### A. Reduce the number of SM architectures

1. We are building many SM architectures in the CI: https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cuda.cmake.
1. According to the experiment of Qing-qing, https://github.com/PaddlePaddle/Paddle/issues/5491, nvcc could run faster if we generate less number of SM architectures.

Helin is going to configure the CI system to generate only one SM architecture when checking PRs, but generating all SM architecture code in the nightly build of the develop branch.

#### B. Migrate the CI system to two servers

We are running four TeamCity agents on four GPU desktops, each with one GPU and a desktop-level CPU (a few cores).  We have two idle servers, each with 6 GPUs and a powerful CPU with 56 cores.

Helin will migrate the CI system to the servers. 

#### C. Distribute unit tests to multiple GPUs

Our CI system runs unit tests by calling `ctest -j N`, where `N` is the number of processes that run unit tests in parallel. However, all these `N` processes are using the same GPU.

Qing-qing is going to study if we can make cmake/ctest to use more than one GPUs.

#### D. Add an environment variable to distinguish unit tests and regression tests.

Unit tests and regression tests are tested on CI server for every PR. They should be distinguished. Only unit tests should be run for every PR. Nightly builds should run all tests. We should add an environment flag to control it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve CI speed #7992

A. Reduce the number of SM architectures

B. Migrate the CI system to two servers

C. Distribute unit tests to multiple GPUs

D. Add an environment variable to distinguish unit tests and regression tests.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve CI speed #7992

Description

A. Reduce the number of SM architectures

B. Migrate the CI system to two servers

C. Distribute unit tests to multiple GPUs

D. Add an environment variable to distinguish unit tests and regression tests.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions