-
Notifications
You must be signed in to change notification settings - Fork 80
Description
What's the issue, what's expected?:
Issue: When running the mem-bw benchmark using the official Docker image superbench/superbench:v0.12.0-cuda12.9 in offline environments, it fails with error: The binary does not exist - benchmark: mem-bw, binary name: bandwidthTest. The Docker image contains CUDA 12.9 but lacks the CUDA samples package which includes the required bandwidthTest tool.
Expected: The official Docker image should either include all necessary dependencies for running benchmarks, or provide clear documentation for offline setup, or implement fallback mechanisms for missing binaries.
How to reproduce it?:
In an offline environment, pull the official Docker image: docker pull superbench/superbench:v0.12.0-cuda12.9
Run a privileged container: docker run --privileged --rm -it superbench/superbench:v0.12.0-cuda12.9 bash
Inside the container, execute: sb run --no-docker -l localhost -c superbench_test.yaml
superbench:
enable: ['mem-bw']
The mem-bw benchmark will fail with the missing bandwidthTest error.
Log message or shapshot?:
[2025-12-10 02:27:29,029 HOSTNAME:326471][executor.py:251][INFO] Executor is going to execute mem-bw.
[2025-12-10 02:27:34,189 HOSTNAME:326471][micro_base.py:138][ERROR] The binary does not exist - benchmark: mem-bw, binary name: bandwidthTest, binary directory: None.
[2025-12-10 02:27:34,189 HOSTNAME:326471][executor.py:133][INFO] benchmark: mem-bw, return code: 31, result: {'return_code': [31]}.
[2025-12-10 02:27:34,189 HOSTNAME:326471][executor.py:140][ERROR] Executor failed in mem-bw.
[2025-12-10 02:27:35,261 HOSTNAME:323963][ansible.py:80][INFO] Run succeed, return code 0.
[2025-12-10 02:27:35,262 HOSTNAME:323963][runner.py:449][INFO] Runner is going to run mem-bw in local mode, proc rank 2.
[2025-12-10 02:27:35,262 HOSTNAME:323963][ansible.py:110][INFO] Run bash -c 'set -o allexport && source /tmp/sb.env && set +o allexport && cd$SB_WORKSPACE && PROC_RANK=2 CUDA_VISIBLE_DEVICES=2 numactl -N $ ((2/4)) sb exec --output-dir outputs/2025-12-10_02-26-55 -c sb.config.yaml -C superbench.enable=mem-bw' on remote ...
[WARNING]: Platform linux on host localhost is using the discovered Python interpreter at /usr/bin/python3.12, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-core/2.18/reference_appendices/interpreter_discovery.html for more information.
localhost | CHANGED | rc=0 >>
[2025-12-10 02:27:38,092 HOSTNAME:326760][executor.py:251][INFO] Executor is going to execute mem-bw.
[2025-12-10 02:27:43,221 HOSTNAME:326760][micro_base.py:138][ERROR] The binary does not exist - benchmark: mem-bw, binary name: bandwidthTest, binary directory: None.
[2025-12-10 02:27:43,221 HOSTNAME:326760][executor.py:133][INFO] benchmark: mem-bw, return code: 31, result: {'return_code': [31]}.
[2025-12-10 02:27:43,221 HOSTNAME:326760][executor.py:140][ERROR] Executor failed in mem-bw.
[2025-12-10 02:27:44,294 HOSTNAME:323963][ansible.py:80][INFO] Run succeed, return code 0.
[2025-12-10 02:27:44,294 HOSTNAME:323963][runner.py:449][INFO] Runner is going to run mem-bw in local mode, proc rank 3.
[2025-12-10 02:27:44,295 HOSTNAME:323963][ansible.py:110][INFO] Run bash -c 'set -o allexport && source /tmp/sb.env && set +o allexport && cd$SB_WORKSPACE && PROC_RANK=3 CUDA_VISIBLE_DEVICES=3 numactl -N $ ((3/4)) sb exec --output-dir outputs/2025-12-10_02-26-55 -c sb.config.yaml -C superbench.enable=mem-bw' on remote ...
[WARNING]: Platform linux on host localhost is using the discovered Python interpreter at /usr/bin/python3.12, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-core/2.18/reference_appendices/interpreter_discovery.html for more information.
localhost | CHANGED | rc=0 >>
[2025-12-10 02:27:47,117 HOSTNAME:327049][executor.py:251][INFO] Executor is going to execute mem-bw.
[2025-12-10 02:27:52,280 HOSTNAME:327049][micro_base.py:138][ERROR] The binary does not exist - benchmark: mem-bw, binary name: bandwidthTest, binary directory: None.
[2025-12-10 02:27:52,280 HOSTNAME:327049][executor.py:133][INFO] benchmark: mem-bw, return code: 31, result: {'return_code': [31]}.
[2025-12-10 02:27:52,280 HOSTNAME:327049][executor.py:140][ERROR] Executor failed in mem-bw.
[2025-12-10 02:27:53,361 HOSTNAME:323963][ansible.py:80][INFO] Run succeed, return code 0.
[2025-12-10 02:27:53,362 HOSTNAME:323963][runner.py:449][INFO] Runner is going to run mem-bw in local mode, proc rank 4.
[2025-12-10 02:27:53,362 HOSTNAME:323963][ansible.py:110][INFO] Run bash -c 'set -o allexport && source /tmp/sb.env && set +o allexport && cd$SB_WORKSPACE && PROC_RANK=4 CUDA_VISIBLE_DEVICES=4 numactl -N $ ((4/4)) sb exec --output-dir outputs/2025-12-10_02-26-55 -c sb.config.yaml -C superbench.enable=mem-bw' on remote ...
[WARNING]: Platform linux on host localhost is using the discovered Python interpreter at /usr/bin/python3.12, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-core/2.18/reference_appendices/interpreter_discovery.html for more information.
localhost | CHANGED | rc=0 >>
[2025-12-10 02:27:56,253 HOSTNAME:327338][executor.py:251][INFO] Executor is going to execute mem-bw.
[2025-12-10 02:28:01,476 HOSTNAME:327338][micro_base.py:138][ERROR] The binary does not exist - benchmark: mem-bw, binary name: bandwidthTest, binary directory: None.
[2025-12-10 02:28:01,476 HOSTNAME:327338][executor.py:133][INFO] benchmark: mem-bw, return code: 31, result: {'return_code': [31]}.
[2025-12-10 02:28:01,476 HOSTNAME:327338][executor.py:140][ERROR] Executor failed in mem-bw.
[2025-12-10 02:28:02,575 HOSTNAME:323963][ansible.py:80][INFO] Run succeed, return code 0.
[2025-12-10 02:28:02,575 HOSTNAME:323963][runner.py:449][INFO] Runner is going to run mem-bw in local mode, proc rank 5.
[2025-12-10 02:28:02,575 HOSTNAME:323963][ansible.py:110][INFO] Run bash -c 'set -o allexport && source /tmp/sb.env && set +o allexport && cd$SB_WORKSPACE && PROC_RANK=5 CUDA_VISIBLE_DEVICES=5 numactl -N $ ((5/4)) sb exec --output-dir outputs/2025-12-10_02-26-55 -c sb.config.yaml -C superbench.enable=mem-bw' on remote ...
[WARNING]: Platform linux on host localhost is using the discovered Python interpreter at /usr/bin/python3.12, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-core/2.18/reference_appendices/interpreter_discovery.html for more information.
localhost | CHANGED | rc=0 >>
Additional information:
- Environment: Ubuntu 22.04 x86_64, offline (no internet access)
- Following official TIP: Using privileged container with --no-docker mode as suggested in SuperBench documentation
- Workaround limitations: In offline environments, cannot download source code from GitHub, cannot use alternative package sources