Skip to content

Underutilized on small vram nvidia cards #192

@LXY1226

Description

@LXY1226

For my 4GB VMEM NVIDIA 3050 Laptop GPU in the opencl runtime would only return 1GB of vmem (a quarter of that)^1, which became the bottleneck for the entire compute session (~60 GPU Core and ~60W power).
As a comparison, I simply multiplied the returned memory size by 3 and subsequently got the correct full load (~75W).
At the same time I modified the post program so that it returns every generation time (AS I see the size) . Down from ~25s to 15s
But I'm not sure how to identify this problem and fix it, like the minimum amount of memory that can reach full load.

reflink: https://forums.developer.nvidia.com/t/cl-device-max-mem-alloc-size-incorrect/19381/11
modified repo: https://github.com/LXY1226/post-rs

before mod:

2024-03-06T21:54:37.741+0800    INFO    Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3050 Laptop GPU    {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 367}
2024-03-06T21:54:37.741+0800    INFO    device memory: 4095 MB, max_mem_alloc_size: 1023 MB, max_compute_units: 16, max_wg_size: 1024   {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 153}
2024-03-06T21:54:37.813+0800    INFO    preferred_wg_size_multiple: 32, kernel_wg_size: 256     {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 192}
2024-03-06T21:54:37.813+0800    INFO    Using: global_work_size: 2016, local_work_size: 32      {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 205}
2024-03-06T21:55:10.314+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 78643200, "remaining": 189792256, "time": 32.6346623}
2024-03-06T21:55:35.546+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 79691776, "remaining": 188743680, "time": 25.2094854}
2024-03-06T21:56:01.238+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 80740352, "remaining": 187695104, "time": 25.6902011}
...
image

after mod:

2024-03-06T21:34:54.195+0800    INFO    Using provider: [GPU] NVIDIA CUDA/NVIDIA GeForce RTX 3050 Laptop GPU    {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 367}
2024-03-06T21:34:54.195+0800    INFO    device memory: 4095 MB, max_mem_alloc_size: 3071 MB, max_compute_units: 16, max_wg_size: 1024   {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 153}
2024-03-06T21:34:54.282+0800    INFO    preferred_wg_size_multiple: 32, kernel_wg_size: 256     {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 192}
2024-03-06T21:34:54.282+0800    INFO    Using: global_work_size: 6112, local_work_size: 32      {"module": "scrypt_ocl", "file": "scrypt-ocl\\src\\lib.rs", "line": 205}
2024-03-06T21:35:16.459+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 0, "remaining": 268435456, "time": 22.3284093}
2024-03-06T21:35:31.951+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 1048576, "remaining": 267386880, "time": 15.4726663}
2024-03-06T21:35:47.454+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 2097152, "remaining": 266338304, "time": 15.501459}
2024-03-06T21:36:02.920+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 3145728, "remaining": 265289728, "time": 15.4633641}
2024-03-06T21:36:18.466+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 4194304, "remaining": 264241152, "time": 15.5435014}
2024-03-06T21:36:34.016+0800    INFO    initialization: status  {"fileIndex": 27, "currentPosition": 5242880, "remaining": 263192576, "time": 15.5477153}
...
image

opencl info in gpu-z:
image

card info in gpu-z:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions