1BRC in GPU Python with Dask and cuDF #487
jacobtomlinson
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Inspired by #450 and #62 I thought I would see how things perform with the cuDF GPU accelerated backend for Dask. Not an official submission as it's not Java and also uses GPUs.
This was run on a workstation with a 12-core CPU and two NVIDIA RTX 8000 GPUs using the
rapidsai/notebooks:23.12-cuda12.0-py3.10
container image. I know this is an extravagant hardware setup compared to the MacBooks used in other tests so take that into consideration. It would be fun to run this on GPU servers in the cloud as well as gaming PCs and gaming laptops to see how different GPUs perform.🔥 4.5 s ± 27.7 ms per loop (mean ± std. dev. of 4 runs, 3 loops each)
I needed to cast the DataFrame back to Pandas before calling
sort_values()
because of a bug in cuDF. But sorting a ~400 row DataFrame is a small effort anyway so not sure if fixing the bug would improve the time much.I also experimented with rewriting the data generation for the GPU and managed to get that down to 24 seconds. It starts with the station seed data and then uses cuDF and CuPy to generate the random data in big chunks and append them to the file. This way we can generate even larger datasets like 10 Billion Rows quickly.
I tried this data generation method with just Pandas and Numpy but it's way slower than the Java implementation.
Beta Was this translation helpful? Give feedback.
All reactions