Skip to content

Commit 28baac9

Browse files
authored
ci : migrate ggml ci to self-hosted runners (#16116)
* ci : migrate ggml ci to a self-hosted runners * ci : add T4 runner * ci : add instructions for adding self-hosted runners * ci : disable test-backend-ops from debug builds due to slowness * ci : add AMD V710 runner (vulkan) * cont : add ROCM workflow * ci : switch to qwen3 0.6b model * cont : fix the context size
1 parent 1eeb523 commit 28baac9

File tree

4 files changed

+295
-372
lines changed

4 files changed

+295
-372
lines changed

.github/workflows/build.yml

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1247,3 +1247,195 @@ jobs:
12471247
-DGGML_CANN=on \
12481248
-DSOC_TYPE=${{ matrix.device }}
12491249
cmake --build build -j $(nproc)
1250+
1251+
# TODO: simplify the following workflows using a matrix
1252+
# TODO: run lighter CI on PRs and the full CI only on master (if needed)
1253+
ggml-ci-x64-cpu-low-perf:
1254+
runs-on: [self-hosted, Linux, X64, CPU, low-perf]
1255+
1256+
steps:
1257+
- name: Clone
1258+
id: checkout
1259+
uses: actions/checkout@v4
1260+
1261+
- name: Test
1262+
id: ggml-ci
1263+
run: |
1264+
bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1265+
1266+
ggml-ci-arm64-cpu-low-perf:
1267+
runs-on: [self-hosted, Linux, ARM64, CPU, low-perf]
1268+
1269+
steps:
1270+
- name: Clone
1271+
id: checkout
1272+
uses: actions/checkout@v4
1273+
1274+
- name: Test
1275+
id: ggml-ci
1276+
run: |
1277+
bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1278+
1279+
ggml-ci-x64-cpu-high-perf:
1280+
runs-on: [self-hosted, Linux, X64, CPU, high-perf]
1281+
1282+
steps:
1283+
- name: Clone
1284+
id: checkout
1285+
uses: actions/checkout@v4
1286+
1287+
- name: Test
1288+
id: ggml-ci
1289+
run: |
1290+
bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1291+
1292+
ggml-ci-arm64-cpu-high-perf:
1293+
runs-on: [self-hosted, Linux, ARM64, CPU, high-perf]
1294+
1295+
steps:
1296+
- name: Clone
1297+
id: checkout
1298+
uses: actions/checkout@v4
1299+
1300+
- name: Test
1301+
id: ggml-ci
1302+
run: |
1303+
bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1304+
1305+
ggml-ci-x64-nvidia-v100-cuda:
1306+
runs-on: [self-hosted, Linux, X64, NVIDIA, V100]
1307+
1308+
steps:
1309+
- name: Clone
1310+
id: checkout
1311+
uses: actions/checkout@v4
1312+
1313+
- name: Test
1314+
id: ggml-ci
1315+
run: |
1316+
nvidia-smi
1317+
GG_BUILD_CUDA=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1318+
1319+
ggml-ci-x64-nvidia-v100-vulkan:
1320+
runs-on: [self-hosted, Linux, X64, NVIDIA, V100]
1321+
1322+
steps:
1323+
- name: Clone
1324+
id: checkout
1325+
uses: actions/checkout@v4
1326+
1327+
- name: Test
1328+
id: ggml-ci
1329+
run: |
1330+
vulkaninfo
1331+
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1332+
1333+
ggml-ci-x64-nvidia-t4-cuda:
1334+
runs-on: [self-hosted, Linux, X64, NVIDIA, T4]
1335+
1336+
steps:
1337+
- name: Clone
1338+
id: checkout
1339+
uses: actions/checkout@v4
1340+
1341+
- name: Test
1342+
id: ggml-ci
1343+
run: |
1344+
nvidia-smi
1345+
GG_BUILD_CUDA=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1346+
1347+
ggml-ci-x64-nvidia-t4-vulkan:
1348+
runs-on: [self-hosted, Linux, X64, NVIDIA, T4]
1349+
1350+
steps:
1351+
- name: Clone
1352+
id: checkout
1353+
uses: actions/checkout@v4
1354+
1355+
- name: Test
1356+
id: ggml-ci
1357+
run: |
1358+
vulkaninfo
1359+
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1360+
1361+
ggml-ci-x64-nvidia-t4-vulkan-coopmat1:
1362+
runs-on: [self-hosted, Linux, X64, NVIDIA, T4]
1363+
1364+
steps:
1365+
- name: Clone
1366+
id: checkout
1367+
uses: actions/checkout@v4
1368+
1369+
- name: Test
1370+
id: ggml-ci
1371+
run: |
1372+
vulkaninfo
1373+
GG_BUILD_VULKAN=1 GGML_VK_DISABLE_COOPMAT2=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1374+
1375+
ggml-ci-x64-cpu-amx:
1376+
runs-on: [self-hosted, Linux, X64, CPU, AMX]
1377+
1378+
steps:
1379+
- name: Clone
1380+
id: checkout
1381+
uses: actions/checkout@v4
1382+
1383+
- name: Test
1384+
id: ggml-ci
1385+
run: |
1386+
bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1387+
1388+
ggml-ci-x64-amd-v710-vulkan:
1389+
runs-on: [self-hosted, Linux, X64, AMD, V710]
1390+
1391+
steps:
1392+
- name: Clone
1393+
id: checkout
1394+
uses: actions/checkout@v4
1395+
1396+
- name: Test
1397+
id: ggml-ci
1398+
run: |
1399+
vulkaninfo
1400+
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1401+
1402+
ggml-ci-x64-amd-v710-rocm:
1403+
runs-on: [self-hosted, Linux, X64, AMD, V710]
1404+
1405+
steps:
1406+
- name: Clone
1407+
id: checkout
1408+
uses: actions/checkout@v4
1409+
1410+
- name: Test
1411+
id: ggml-ci
1412+
run: |
1413+
vulkaninfo
1414+
GG_BUILD_ROCM=1 GG_BUILD_AMDGPU_TARGETS="gfx1101" bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp
1415+
1416+
ggml-ci-mac-metal:
1417+
runs-on: [self-hosted, macOS, ARM64]
1418+
1419+
steps:
1420+
- name: Clone
1421+
id: checkout
1422+
uses: actions/checkout@v4
1423+
1424+
- name: Test
1425+
id: ggml-ci
1426+
run: |
1427+
GG_BUILD_METAL=1 bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp
1428+
1429+
# TODO: install vulkan drivers
1430+
# ggml-ci-mac-vulkan:
1431+
# runs-on: [self-hosted, macOS, ARM64]
1432+
#
1433+
# steps:
1434+
# - name: Clone
1435+
# id: checkout
1436+
# uses: actions/checkout@v4
1437+
#
1438+
# - name: Test
1439+
# id: ggml-ci
1440+
# run: |
1441+
# GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp ~/mnt/llama.cpp

ci/README-MUSA.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
## Running MUSA CI in a Docker Container
2+
3+
Assuming `$PWD` is the root of the `llama.cpp` repository, follow these steps to set up and run MUSA CI in a Docker container:
4+
5+
### 1. Create a local directory to store cached models, configuration files and venv:
6+
7+
```bash
8+
mkdir -p $HOME/llama.cpp/ci-cache
9+
```
10+
11+
### 2. Create a local directory to store CI run results:
12+
13+
```bash
14+
mkdir -p $HOME/llama.cpp/ci-results
15+
```
16+
17+
### 3. Start a Docker container and run the CI:
18+
19+
```bash
20+
docker run --privileged -it \
21+
-v $HOME/llama.cpp/ci-cache:/ci-cache \
22+
-v $HOME/llama.cpp/ci-results:/ci-results \
23+
-v $PWD:/ws -w /ws \
24+
mthreads/musa:rc4.2.0-devel-ubuntu22.04-amd64
25+
```
26+
27+
Inside the container, execute the following commands:
28+
29+
```bash
30+
apt update -y && apt install -y bc cmake ccache git python3.10-venv time unzip wget
31+
git config --global --add safe.directory /ws
32+
GG_BUILD_MUSA=1 bash ./ci/run.sh /ci-results /ci-cache
33+
```
34+
35+
This setup ensures that the CI runs within an isolated Docker environment while maintaining cached files and results across runs.

ci/README.md

Lines changed: 10 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,10 @@
11
# CI
22

3-
In addition to [Github Actions](https://github.com/ggml-org/llama.cpp/actions) `llama.cpp` uses a custom CI framework:
3+
This CI implements heavy-duty workflows that run on self-hosted runners. Typically the purpose of these workflows is to
4+
cover hardware configurations that are not available from Github-hosted runners and/or require more computational
5+
resource than normally available.
46

5-
https://github.com/ggml-org/ci
6-
7-
It monitors the `master` branch for new commits and runs the
8-
[ci/run.sh](https://github.com/ggml-org/llama.cpp/blob/master/ci/run.sh) script on dedicated cloud instances. This allows us
9-
to execute heavier workloads compared to just using Github Actions. Also with time, the cloud instances will be scaled
10-
to cover various hardware architectures, including GPU and Apple Silicon instances.
11-
12-
Collaborators can optionally trigger the CI run by adding the `ggml-ci` keyword to their commit message.
13-
Only the branches of this repo are monitored for this keyword.
14-
15-
It is a good practice, before publishing changes to execute the full CI locally on your machine:
7+
It is a good practice, before publishing changes to execute the full CI locally on your machine. For example:
168

179
```bash
1810
mkdir tmp
@@ -29,40 +21,13 @@ GG_BUILD_SYCL=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
2921

3022
# with MUSA support
3123
GG_BUILD_MUSA=1 bash ./ci/run.sh ./tmp/results ./tmp/mnt
32-
```
33-
34-
## Running MUSA CI in a Docker Container
3524

36-
Assuming `$PWD` is the root of the `llama.cpp` repository, follow these steps to set up and run MUSA CI in a Docker container:
37-
38-
### 1. Create a local directory to store cached models, configuration files and venv:
39-
40-
```bash
41-
mkdir -p $HOME/llama.cpp/ci-cache
25+
# etc.
4226
```
4327

44-
### 2. Create a local directory to store CI run results:
45-
46-
```bash
47-
mkdir -p $HOME/llama.cpp/ci-results
48-
```
49-
50-
### 3. Start a Docker container and run the CI:
51-
52-
```bash
53-
docker run --privileged -it \
54-
-v $HOME/llama.cpp/ci-cache:/ci-cache \
55-
-v $HOME/llama.cpp/ci-results:/ci-results \
56-
-v $PWD:/ws -w /ws \
57-
mthreads/musa:rc4.2.0-devel-ubuntu22.04-amd64
58-
```
59-
60-
Inside the container, execute the following commands:
61-
62-
```bash
63-
apt update -y && apt install -y bc cmake ccache git python3.10-venv time unzip wget
64-
git config --global --add safe.directory /ws
65-
GG_BUILD_MUSA=1 bash ./ci/run.sh /ci-results /ci-cache
66-
```
28+
# Adding self-hosted runners
6729

68-
This setup ensures that the CI runs within an isolated Docker environment while maintaining cached files and results across runs.
30+
- Add a self-hosted `ggml-ci` workflow to [[.github/workflows/build.yml]] with an appropriate label
31+
- Request a runner token from `ggml-org` (for example, via a comment in the PR or email)
32+
- Set-up a machine using the received token ([docs](https://docs.github.com/en/actions/how-tos/manage-runners/self-hosted-runners/add-runners))
33+
- Optionally update [ci/run.sh](https://github.com/ggml-org/llama.cpp/blob/master/ci/run.sh) to build and run on the target platform by gating the implementation with a `GG_BUILD_...` env

0 commit comments

Comments
 (0)