-
Notifications
You must be signed in to change notification settings - Fork 142
Description
Hi, I am trying to solve a large-scale SDP using SCS, which converges too slowly on the CPU. So, I want to use the GPU version to get some speedup. I first tested the GPU version with a small problem on my laptop (Windows 11) with an RTX 4080 GPU, which works perfectly:
------------------------------------------------------------------
SCS v3.2.5 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem: variables n: 4875, constraints m: 594271
cones: z: primal zero / dual free vars: 31
l: linear vars: 30
q: soc vars: 0, qsize: 1
s: psd vars: 594210, ssize: 182
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
max_iters: 100000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 10, acceleration_interval: 10
lin-sys: sparse-indirect GPU
nnz(A): 138701, nnz(P): 0
------------------------------------------------------------------
iter | pri res | dua res | gap | obj | scale | time (s)
------------------------------------------------------------------
0| 7.11e+00 6.71e+01 4.04e+02 -3.49e+02 1.00e-01 2.06e+00
250| 7.03e-04 5.26e-04 1.46e-03 4.16e-02 3.14e-02 1.11e+02
475| 5.59e-05 2.10e-04 7.97e-05 4.22e-02 3.14e-02 1.90e+02
------------------------------------------------------------------
status: solved
timings: total: 1.90e+02s = setup: 2.54e-01s + solve: 1.90e+02s
lin-sys: 1.26e+02s, cones: 5.84e+01s, accel: 2.88e-01s
------------------------------------------------------------------
objective = 0.042202
------------------------------------------------------------------
This is calling SCS via YALMIP in MATLAB R2023b. The CUDA version is 12.9. Insider MATLAB it shows
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 4080 Laptop GPU'
Index: 1
ComputeCapability: '8.9'
SupportsDouble: 1
GraphicsDriverVersion: '576.02'
DriverModel: 'WDDM'
ToolkitVersion: 11.8000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 12878086144 (12.88 GB)
AvailableMemory: 11573702656 (11.57 GB)
CachePolicy: 'balanced'
MultiprocessorCount: 58
ClockRateKHz: 1665000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
However, when I tried the same procedure on a cluster with an A100 GPU, the solver didn't even run (just showed "-------------"). I just replaced the path to the CUDA folder (compile_gpu.m from scs-matlab) with the corresponding one on the cluster.
System information:
$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.9 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.9 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.9"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.9"
I am using MATLAB R2022b and
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'NVIDIA A100-SXM4-40GB MIG 3g.20gb'
Index: 1
ComputeCapability: '8.0'
SupportsDouble: 1
DriverVersion: 12.5000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 21072183296 (21.07 GB)
AvailableMemory: 20629553152 (20.63 GB)
MultiprocessorCount: 42
ClockRateKHz: 1410000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
I have the following modules loaded on the cluster
module list
Currently Loaded Modules:
1) gmp/6.3.0-fasrc01 2) mpfr/4.2.1-fasrc01 3) mpc/1.3.1-fasrc02 4) cuda/12.4.1-fasrc01 5) gcc/14.2.0-fasrc01
I already tried using multi-core CPUs, which take far too long to converge. The possible GPU acceleration might be the only way for me to go.