-
Notifications
You must be signed in to change notification settings - Fork 26
Closed
Description
For my 4GB VMEM NVIDIA 3050 Laptop GPU in the opencl runtime would only return 1GB of vmem (a quarter of that)^1, which became the bottleneck for the entire compute session (~60 GPU Core and ~60W power).
As a comparison, I simply multiplied the returned memory size by 3 and subsequently got the correct full load (~75W).
At the same time I modified the post program so that it returns every generation time (AS I see the size) . Down from ~25s to 15s
But I'm not sure how to identify this problem and fix it, like the minimum amount of memory that can reach full load.
reflink: https://forums.developer.nvidia.com/t/cl-device-max-mem-alloc-size-incorrect/19381/11
modified repo: https://github.com/LXY1226/post-rs
before mod:
2024-03-06T21:54:37.741+0800 INFO Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3050 Laptop GPU {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 367}
2024-03-06T21:54:37.741+0800 INFO device memory: 4095 MB, max_mem_alloc_size: 1023 MB, max_compute_units: 16, max_wg_size: 1024 {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 153}
2024-03-06T21:54:37.813+0800 INFO preferred_wg_size_multiple: 32, kernel_wg_size: 256 {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 192}
2024-03-06T21:54:37.813+0800 INFO Using: global_work_size: 2016, local_work_size: 32 {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 205}
2024-03-06T21:55:10.314+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 78643200, "remaining": 189792256, "time": 32.6346623}
2024-03-06T21:55:35.546+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 79691776, "remaining": 188743680, "time": 25.2094854}
2024-03-06T21:56:01.238+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 80740352, "remaining": 187695104, "time": 25.6902011}
...
after mod:
2024-03-06T21:34:54.195+0800 INFO Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3050 Laptop GPU {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 367}
2024-03-06T21:34:54.195+0800 INFO device memory: 4095 MB, max_mem_alloc_size: 3071 MB, max_compute_units: 16, max_wg_size: 1024 {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 153}
2024-03-06T21:34:54.282+0800 INFO preferred_wg_size_multiple: 32, kernel_wg_size: 256 {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 192}
2024-03-06T21:34:54.282+0800 INFO Using: global_work_size: 6112, local_work_size: 32 {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 205}
2024-03-06T21:35:16.459+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 0, "remaining": 268435456, "time": 22.3284093}
2024-03-06T21:35:31.951+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 1048576, "remaining": 267386880, "time": 15.4726663}
2024-03-06T21:35:47.454+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 2097152, "remaining": 266338304, "time": 15.501459}
2024-03-06T21:36:02.920+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 3145728, "remaining": 265289728, "time": 15.4633641}
2024-03-06T21:36:18.466+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 4194304, "remaining": 264241152, "time": 15.5435014}
2024-03-06T21:36:34.016+0800 INFO initialization: status {"fileIndex": 27, "currentPosition": 5242880, "remaining": 263192576, "time": 15.5477153}
...
LXY1226
Metadata
Metadata
Assignees
Labels
No labels

