Open
Description
Describe the bug
I have a dataset of 800k items (768 dim vectors). UMAP will work with the full 800k dataset, and with smaller (randomly sampled) datasets of around 150k, but medium-sized datasets of size ~300k, 350k etc crash with this error.
Traceback (most recent call last):
File "/opt/project/callbacks.py", line 744, in on_click
umap_data_3D = umap_for_clustering.fit_transform(embedding_matrix)
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/internals/api_decorators.py", line 549, in inner_set_get
ret_val = func(*args, **kwargs)
File "cuml/manifold/umap.pyx", line 659, in cuml.manifold.umap.UMAP.fit_transform
File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/internals/api_decorators.py", line 409, in inner_with_setters
return func(*args, **kwargs)
File "cuml/manifold/umap.pyx", line 600, in cuml.manifold.umap.UMAP.fit
RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::initializeForDevice(int) at /home/conda/feedstock_root/build_artifacts/faiss-split_1618468126454/work/faiss/gpu/StandardGpuResources.cpp:285: Error: 'err == cudaSuccess' failed: failed to cudaHostAlloc 268435456 bytes for CPU <-> GPU async copy buffer (error 2 out of memory)
I'm using a Titan RTX GPU with 24GB memory and nvidia-smi is showing more than enough free memory for this operation before applying fit_transform:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.10 Driver Version: 510.10 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN RTX WDDM | 00000000:01:00.0 Off | N/A |
| 41% 29C P8 10W / 280W | 3678MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 4 C Insufficient Permissions N/A |
+-----------------------------------------------------------------------------+
This is using parameters (n_components=3, n_neighbors=15, min_dist=0.0)
to create the UMAP model and fit_transform operation to apply it.
Using rapidsai/rapidsai:21.10-cuda11.2-base-ubuntu18.04-py3.8
with torch==1.9.1+cu111
applied on top of the environment.
Any idea why this works for the large dataset and not intermediate sized datasets please?