Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report your benchmark results here! #8

Open
ProjectPhysX opened this issue Sep 22, 2022 · 163 comments
Open

Report your benchmark results here! #8

ProjectPhysX opened this issue Sep 22, 2022 · 163 comments
Labels
help wanted extra attention is needed

Comments

@ProjectPhysX
Copy link
Owner

ProjectPhysX commented Sep 22, 2022

You are welcome to report your benchmark results for the FP32/FP16S/FP16C accuracy levels here.
Especially numbers for AMD GPUs are desired for GCN/RDNA/RDNA2 architectures.
Thank you!

@ProjectPhysX ProjectPhysX added the help wanted extra attention is needed label Sep 22, 2022
@ibonito1
Copy link

ibonito1 commented Sep 23, 2022

I'd love to add to the benchmarks list. I've got two questions:

  1. I want to benchmark a dual Epyc system (so specifically the CPUs actually). How would I do that (under Windows, but Linux would also be fine), if I have a GPU installed? It always automatically detects the GPU when running the benchmark “releases”.
  2. How to post the benchmarks? Just copy the console output in here?

Cheers!

@ProjectPhysX
Copy link
Owner Author

ProjectPhysX commented Sep 24, 2022

Hi ibonito1,

OpenCL support on EPYC CPUs is a bit difficult as these are not officially supported by AMD. Being x86-64, they should work with the Intel OpenCL CPU Runtime though, or alternatively with POCL. Fingers crossed!
To run on a specific device, in the console run ./FluidX3D.exe 2 (on Linux) or FluidX3D.exe 2 (on Windows), to select device with ID 2 for example.
You can just copy the console output here.

Regards,
Moritz

@C-Dub2022
Copy link

AMD Radeon RX 580:
image

@ProjectPhysX
Copy link
Owner Author

C-Dub2022 thank you very much for the RX 580 benchmark! If you can post the FP16S and FP16C benchmarks as well, I'll add them to the readme!

@C-Dub2022
Copy link

Hopefully this is helpful. Let me know if there is anything else I can do.

image
image

@MarcoAurelioFerrari
Copy link

MarcoAurelioFerrari commented Oct 7, 2022

RTX 3060 12GB - v1.1

FP32-FP16C
FP32-FP16C

FP32-FP16S
FP32-FP16S

FP32-FP32
FP32-FP32

@ProjectPhysX
Copy link
Owner Author

MarcoAurelioFerrari thank you!

@dongwang22
Copy link

Could you please tell me how to open the visualized interface of the flow domain as you said in the readme file? You said input the 2 can turn on the velocity field, but it does not work in the benchmark case. How can I generate pictures like you prensent on twitter ?
image

@ProjectPhysX
Copy link
Owner Author

Hi dongwang22,

thanks for the benchmark! For the visual interface, uncomment #define WINDOWS_GRAPHICS and comment #define BENCHMARK in src/defines.hpp, and uncomment for example the Taylor-Green setup in src/setup.cpp. Then compile and you should see the graphical interface where you can toggle rendering modes with keys 1/2/3/4. To generate videos, see the other setups: basically make a C++ loop and repeatedly do some LBM time steps and render images with the corresponding methods of the LBM class.

Regards,
Moritz

@fkay1
Copy link

fkay1 commented Oct 17, 2022

AMD 5700 XT

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx1010:xnack- |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx1010:xnack- |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3444.0 (PAL,LC) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 20 at 1905 MHz (2560 cores, 9.754 TFLOPs/s) |
| Memory, Cache | 8176 MB, 16 KB global / 64 KB local |
| Buffer Limits | 6949 MB global, 7116390 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 1366 | 209 GB/s | 81 | 9996 60% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1368 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx1010:xnack- |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx1010:xnack- |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3444.0 (PAL,LC) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 20 at 1905 MHz (2560 cores, 9.754 TFLOPs/s) |
| Memory, Cache | 8176 MB, 16 KB global / 64 KB local |
| Buffer Limits | 6949 MB global, 7116390 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3253 | 250 GB/s | 194 | 9988 80% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3253 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx1010:xnack- |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx1010:xnack- |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3444.0 (PAL,LC) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 20 at 1905 MHz (2560 cores, 9.754 TFLOPs/s) |
| Memory, Cache | 8176 MB, 16 KB global / 64 KB local |
| Buffer Limits | 6949 MB global, 7116390 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16C) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3044 | 234 GB/s | 181 | 9992 20% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3049 |

@funlennysub
Copy link

FP32/FP16C
FluidX3D-Benchmark-FP32-FP16C-Windows_GvHd6N7oB6

FP32/FP16S
FluidX3D-Benchmark-FP32-FP16S-Windows_90ejyVLfVG

FP32/FP32
FluidX3D-Benchmark-FP32-FP32-Windows_W9hOfLroLA

@nicandris
Copy link

nicandris commented Oct 18, 2022

RTX 2080 SUPER
image
image
image

@ProjectPhysX ProjectPhysX pinned this issue Oct 20, 2022
@gittigittibangbang
Copy link

gittigittibangbang commented Oct 22, 2022

I tried a 6900XT, but the score is lower than anticipated. The max bandwidth seems to be limited to 300GB/s, although GPUZ says it's connected via PCIe 4.0 16x and should top out at 512GB/s. The GPU clock is at 2540MHz and the memory clock at 2000MHz. GPU and memory controller loads are at 100%.

image
image
image

With the 3D Taylor-Green model and FP32/FP16S, the MLUPs/s and the bandwidth go through the roof. I'll try some other models, too. FP32/FP32 goes up to 2400 MLUPs/s and 370GB/s, with FP32/FP16C it's 9000 MLUPs/s and 700GB/s.
image

@ProjectPhysX
Copy link
Owner Author

Hi gittigittibangbang, thanks for the benchmarks! Efficiency is ~60% which is typical for the AMD GPUs. Performance is limited by VRAM bandwidth only, and the RX 6800 would presumably perform exactly the same. The benchmark setup is a 256³ box, that fills 1.5GB (FP32) or 0.9GB (FP16) of VRAM. The large infinity cache (128MB) is only an insignificant fraction of that so does not significantly boost performance.
With a smaller 128³ box however, which only fills 186MB (FP32) or 76MB (FP16), almost the entire grid fits in the cache and effective bandwidth is much larger.

@HighDoping
Copy link

Vega 8 in R7 4750G
|----------------.------------------------------------------------------------|
| Device ID 0 | gfx90c |
| Device ID 1 | gfx90c |
| Device ID 2 | gfx90c |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx90c |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3380.6 (PAL,HSAIL) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 8 at 2100 MHz (512 cores, 2.150 TFLOPs/s) |
| Memory, Cache | 26899 MB, 16 KB global / 32 KB local |
| Buffer Limits | 19382 MB global, 19847731 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 246 | 38 GB/s | 15 | 9999 90% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 263 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx90c |
| Device ID 1 | gfx90c |
| Device ID 2 | gfx90c |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx90c |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3380.6 (PAL,HSAIL) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 8 at 2100 MHz (512 cores, 2.150 TFLOPs/s) |
| Memory, Cache | 26899 MB, 16 KB global / 32 KB local |
| Buffer Limits | 19382 MB global, 19847731 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 505 | 39 GB/s | 30 | 9998 80% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 511 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx90c |
| Device ID 1 | gfx90c |
| Device ID 2 | gfx90c |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx90c |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3380.6 (PAL,HSAIL) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 8 at 2100 MHz (512 cores, 2.150 TFLOPs/s) |
| Memory, Cache | 26899 MB, 16 KB global / 32 KB local |
| Buffer Limits | 19382 MB global, 19847731 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16C) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 466 | 36 GB/s | 28 | 9998 80% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 501 |

@edmond1992
Copy link

Is it possible to add ready-to-run benchmark for MacOS so we can get more result on Mac?
Especially the test is bandwidth limited and Apple silicon should be good at this.
Not to mention relatively cheap 64GB+ VRAM as they share the same main memory.

@edmond1992
Copy link

RTX3060 Laptop GPU with 12700H on ASUS ROG M16 Turbo mode (120W GPU TDP) and external laptop fan
PS C:\Software\FluidX3D> .\FluidX3D-Benchmark-FP32-FP32-Windows.exe
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ _.-" | | "-._/ / |
| \ .-" _ "-. / |
| .-" .-" "-. "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / |
| ' ╕ Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device ID 1 | Intel(R) Iris(R) Xe Graphics |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 512.78 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 30 at 1425 MHz (3840 cores, 10.944 TFLOPs/s) |
| Memory, Cache | 6143 MB, 840 KB global / 48 KB local |
| Buffer Limits | 1535 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 2014 | 308 GB/s | 120 | 9999 90% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2019 |

PS C:\Software\FluidX3D> .\FluidX3D-Benchmark-FP32-FP16C-Windows.exe
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ _.-" | | "-._/ / |
| \ .-" _ "-. / |
| .-" .-" "-. "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / |
| ' ╕ Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device ID 1 | Intel(R) Iris(R) Xe Graphics |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 512.78 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 30 at 1425 MHz (3840 cores, 10.944 TFLOPs/s) |
| Memory, Cache | 6143 MB, 840 KB global / 48 KB local |
| Buffer Limits | 1535 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16C) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3523 | 271 GB/s | 210 | 9996 60% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3572 |

PS C:\Software\FluidX3D> .\FluidX3D-Benchmark-FP32-FP16S-Windows.exe
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ _.-" | | "-._/ / |
| \ .-" _ "-. / |
| .-" .-" "-. "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / |
| ' ╕ Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device ID 1 | Intel(R) Iris(R) Xe Graphics |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 512.78 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 30 at 1425 MHz (3840 cores, 10.944 TFLOPs/s) |
| Memory, Cache | 6143 MB, 840 KB global / 48 KB local |
| Buffer Limits | 1535 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3991 | 307 GB/s | 238 | 9989 90% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4012 |

PS C:\Software\FluidX3D>

@ProjectPhysX
Copy link
Owner Author

ProjectPhysX commented Oct 23, 2022

@HAL9000COM thanks for the Vega 8 benchmarks! Quick question: Is your RAM is 2x16GB DDR4-3200MT/s? And do you have an idea why the GPU shows up 3 times?

@ProjectPhysX
Copy link
Owner Author

Is it possible to add ready-to-run benchmark for MacOS so we can get more result on Mac? Especially the test is bandwidth limited and Apple silicon should be good at this. Not to mention relatively cheap 64GB+ VRAM as they share the same main memory.

@edmond1992 unfortunately I don't have a Mac, so I can't compile add the executables for MacOS. But the code should work as-is; just compile it as-is with the third line in make.sh and you'll get the FP32 benchmark. Uncomment FP16S/FP16C in src/defines.hpp and recompile to get the other 2 benchmarks.

@edmond1992
Copy link

edmond1992 commented Oct 23, 2022 via email

@HighDoping
Copy link

@HAL9000COM thanks for the Vega 8 benchmarks! Quick question: Is your RAM is 2x16GB DDR4-3200MT/s? And do you have an idea why the GPU shows up 3 times?

2x32GB DDR4-3200 OC to 3533. No idea why GPU shows up multiple times. After some reboot, it now shows up as two devices.

@skoz90
Copy link

skoz90 commented Oct 24, 2022

image
image
image

Nvidia Quadro RTX 5000

@SLGY
Copy link

SLGY commented Oct 25, 2022

GTX 1050 on an old gaming laptop. It's amazing I figured out how to even run this and get a benchmark. Now I'm going to try and figure out how to run the simulation on an stl (or similar) file. I know how to use Blender quite well, but this is my first time with visial studio or command line stuff. I'm so out of my depth here 😟

Screenshot (103)

@SLGY
Copy link

SLGY commented Oct 25, 2022

@ProjectPhysX have now added the FP16 benchmarks

RTX 3080 Ti

Updated FP32 (was concurrently baking a fluid in Blender when I ran the last one):
FP32

FP16S:
FP16S

FP16C:
FP16C

@ProjectPhysX
Copy link
Owner Author

Hi @SirWixy, thank you so much for the benchmarks! Can you post the FP16S and FP16C results too?

@gittigittibangbang
Copy link

gittigittibangbang commented Oct 25, 2022

Quadro RTX 4000 below. I also tried two Xeon Gold 5218 (2x16 cores), with the FP32/FP32 benchmark they top out at 126MLUPs/s, 20GB/s and 8 steps/s. I did not have the patience to run it to the end. The speedup with GPUs is really dramatic, damn.

image
image
image

@ProjectPhysX
Copy link
Owner Author

@gittigittibangbang thanks for the benchmarks! For the CPU you can just stop it with Ctrl+C after it has leveled at constant performance, and take the last MLUPs/s reading. Can you post the program header with the Xeon Gold for the specs, and performance values for FP16S and FP16C too for the Xeon? Thanks!

@gittigittibangbang
Copy link

|----------------.------------------------------------------------------------|
| Device ID 0 | Quadro RTX 4000 |
| Device ID 1 | Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 6.4.0.37 |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 32 at 2300 MHz (16 cores, 1.178 TFLOPs/s) |
| Memory, Cache | 261766 MB, 256 KB global / 32 KB local |
| Buffer Limits | 65441 MB global, 128 KB constant

FP32/FP32: 132MLUPs/s, 20GB/s bandwidth, 8 steps/s
FP32/FP16C: 270MLUPs/s, 21GB/s bandwidth, 16 steps/s
FP32/FP16S: 135MLUPs/s, 10GB/s bandwidth, 8 steps/s

@gryoung4727
Copy link

Results for the ASUS 4070 Ti Super 16GB card, non overclocked.

cmd_pwPtwWGKbE
cmd_qIN1aBHeNd
cmd_vlamptkJpO

@mckirkus
Copy link

RTX 3080 12GB edition - FP16S
image

RTX 3080 12GB edition - FP16C
image

RTX 3080 12GB edition - FP32
image

@SLGY
Copy link

SLGY commented Mar 8, 2024

Here's a multi GPU (technically) result for a Tesla K80 (2 core) GPU. There's a single core K80 (12GB) result in the benchmarks, but now that we have multi GPU functionality here's the 2 core K80 (24GB) result!
FP32-FP16C
FP32-FP16S
FP32-FP32

@chconnor
Copy link

|                                     \ /               FluidX3D Version 2.14 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce GTX 1060 6GB                                |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce GTX 1060 6GB                                |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 535.161.07 (Linux)                                         |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 10 at 1784 MHz (1280 cores, 4.567 TFLOPs/s)                |
| Memory, Cache  | 6064 MB, 480 KB global / 48 KB local                       |
| Buffer Limits  | 1516 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|     995 |    152 GB/s |        59 |         9997  70% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 997                                                    |

|                                     \ /               FluidX3D Version 2.14 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce GTX 1060 6GB                                |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce GTX 1060 6GB                                |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 535.161.07 (Linux)                                         |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 10 at 1784 MHz (1280 cores, 4.567 TFLOPs/s)                |
| Memory, Cache  | 6064 MB, 480 KB global / 48 KB local                       |
| Buffer Limits  | 1516 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1924 |    148 GB/s |       115 |         9994  40% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1925                                                   |
|                                     \ /               FluidX3D Version 2.14 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce GTX 1060 6GB                                |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce GTX 1060 6GB                                |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 535.161.07 (Linux)                                         |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 10 at 1784 MHz (1280 cores, 4.567 TFLOPs/s)                |
| Memory, Cache  | 6064 MB, 480 KB global / 48 KB local                       |
| Buffer Limits  | 1516 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1772 |    136 GB/s |       106 |         9994  40% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1785                                                   |

@matteocavestri
Copy link

Results on AMD Radeon RX590 8GB (Running on Clover-Mesa OpenCL 1.2)

FP32
FP32

FP16C
FP16C

FP16S
FP16S

@matteocavestri
Copy link

Results on AMD Radeon RX590 8GB (Running on Rusticl-Mesa OpenCL 1.2)

FP32
FP32-rusticl

FP16C
FP16C-rusticl

FP16S
FP16S-rusticl

So if you want to use an OpenSource OpenCL implementation (Clover or Rusticl) use Clover until Rusticl become better.

Clover by default is OpenCL 1.1 conformant, but you can export:

  • CLOVER_DEVICE_VERSION_OVERRIDE=1.2
  • CLOVER_DEVICE_CLC_VERSION_OVERRIDE=1.2

to use OpenCL 1.2

@gitcnd
Copy link

gitcnd commented May 20, 2024

RoG Strix Laptop:

|                                     \ /               FluidX3D Version 2.16 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3080 Ti Laptop GPU                      |
| Device ID    1 | Intel(R) UHD Graphics 770                                  |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 3080 Ti Laptop GPU                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 516.40 (Windows)                                           |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 58 at 1590 MHz (7424 cores, 23.608 TFLOPs/s)               |
| Memory, Cache  | 16383 MB, 1624 KB global / 48 KB local                     |
| Buffer Limits  | 4095 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    2972 |    455 GB/s |       177 |         9992  20% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2985                                                   |

pic_2024-05-20_22 00 23_569

Interesting how my Laptop 3080 Ti beats the other Laptops RTX 4080 !

@ProjectPhysX
Copy link
Owner Author

Hi @gitcnd, thanks a lot! Can you please add the FP16S and FP16C benchmarks too?
Almost all RTX 40 series GPUs have severely reduced memory bus width and memory bandwidth as compared to their RTX 30 predecessors, making them slower in compute applications.

@gitcnd
Copy link

gitcnd commented May 20, 2024

Sorry about that - here they are:

|                                     \ /               FluidX3D Version 2.16 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3080 Ti Laptop GPU                      |
| Device ID    1 | Intel(R) UHD Graphics 770                                  |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 3080 Ti Laptop GPU                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 516.40 (Windows)                                           |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 58 at 1590 MHz (7424 cores, 23.608 TFLOPs/s)               |
| Memory, Cache  | 16383 MB, 1624 KB global / 48 KB local                     |
| Buffer Limits  | 4095 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    5832 |    449 GB/s |       348 |         9993  30% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 5908                                                   |


|                                     \ /               FluidX3D Version 2.16 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3080 Ti Laptop GPU                      |
| Device ID    1 | Intel(R) UHD Graphics 770                                  |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce RTX 3080 Ti Laptop GPU                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 516.40 (Windows)                                           |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 58 at 1590 MHz (7424 cores, 23.608 TFLOPs/s)               |
| Memory, Cache  | 16383 MB, 1624 KB global / 48 KB local                     |
| Buffer Limits  | 4095 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    5759 |    443 GB/s |       343 |         9983  30% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 5780                                                   |

@gitcnd
Copy link

gitcnd commented May 20, 2024

And just for giggles... (the slowest benchmark here so far :-)

|                                     \ /               FluidX3D Version 2.16 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce RTX 3080 Ti Laptop GPU                      |
| Device ID    1 | Intel(R) UHD Graphics 770                                  |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Intel(R) UHD Graphics 770                                  |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 31.0.101.3962 (Windows)                                    |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 32 at 1550 MHz (256 cores, 0.794 TFLOPs/s)                 |
| Memory, Cache  | 12955 MB, 1920 KB global / 64 KB local                     |
| Buffer Limits  | 4095 MB global, 4194296 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|     243 |     19 GB/s |        14 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 246                                                    |

C:\Users\cnd\Downloads\FluidX3D>bin\FluidX3D.exe -h
Lattice Boltzmann CFD software by Dr. Moritz Lehmann
Usage:
  bin\FluidX3D.exe [OPTION...]

  -h, --help            Print help
  -x arg                X proportion factor (default: 1.0)
  -y arg                Y proportion factor (default: 1.0)
  -z arg                Z proportion factor (default: 1.0)
  -r, --resolution arg  Resolution (default: 4096)
      --re arg          Reynolds number (default: 100000.0)
  -u arg                Velocity (default: 0.1)
  -t, --time arg        Time (default: 10000)
      --scale arg       Scale (default: 0.9)
  -f, --file arg        Filename (default: input.stl)
  -a, --aoa arg         Angle of attack (default: -5.0)
      --camx arg        Camera X (default: 19.0)
      --camy arg        Camera Y (default: 19.1)
      --camz arg        Camera Z (default: 19.2)
      --camzoom arg     Camera Zoom (default: 1.0)
      --camrx arg       Camera Rotation X (default: 33.0)
      --camry arg       Camera Rotation Y (default: 42.0)
      --camfov arg      Camera Field of View (default: 68.0)
  -s, --secs arg        Seconds (default: 10.0)
  -w, --window          Enable window instead of fullscreen mode
      --wait            Wait for keypress befor ending
      --pause           Do not auto-start the simulation
  -d, --display arg     Display (default: 0,1)

@biergaizi
Copy link

@gitcnd Are both DIMM slots on the laptop populated for the Intel iGPU benchmark? If not, the results would be even slower... 😄

@gitcnd
Copy link

gitcnd commented May 22, 2024

Yes - everything is populated and replaced for max performance (including special low-latency RAM: I replaced the originals).

RoG Benchmark 2022-08-25

This was the fastest laptop in the world when I finished upgrading it :-)

@GiyuuTH
Copy link

GiyuuTH commented Jun 4, 2024

RTX6000ADA // Without-ECC

GPUFP16C
GPUFP16S
GPUFP32

and
Threadripper pro 7995WX// Not-OC

CPUFP16C
CPUFP16S
CPUFP32

@roktmansean
Copy link

Ryzen 7 7800X3D, FP16S
pic0

Ryzen 7 7800X3D, FP16C
image

Ryzen 7 7800X3D, FP32
image

@ProjectPhysX
Copy link
Owner Author

Hi @roktmanskip, thanks a lot! That's the AMD Radeon Graphics iGPU. What memory speed are you running, and is it 2x 8GB dual channel?

Can you please test the CPU itself as well? I'm curious how it performs. For this, install the Intel CPU Runtime for OpenCL, and then starting the executables from within CMD with the device ID:

  • in Windows Explorer, go to the bin folder (or the folder where you downloaded the benchmarks), and then type cmd in the address bar and hit Enter
  • then run
    FluidX3D-Benchmark-FP32-FP32-Windows.exe 2
    FluidX3D-Benchmark-FP32-FP16S-Windows.exe 2
    FluidX3D-Benchmark-FP32-FP16C-Windows.exe 2
    
    (you might need a different device index then 2 depending in which order your CPU is listed)

Thanks!

@roktmansean
Copy link

2x16Gb, 6400MHz

image

image

image

@gurkanctn
Copy link

Orange Pi 5, with Rockchip RK3588S, ARM based SOC.

image

@gurkanctn
Copy link

Orange Pi 5, with Rockchip RK3588S, ARM based SOC.

image

I don't know how to enable running on the CPU cores, instead of the default Mali GPU.

@squareSphere29
Copy link

squareSphere29 commented Aug 6, 2024

NVIDIA RTX A1000 6GB Laptop GPU

FP32-FP16C
image

FP32-FP32
image

FP32-FP16S
image

@ProjectPhysX
Copy link
Owner Author

@gurkanctn can you please post Orange Pi benchmarks for FP16S and FP16C too? (enable those in src/defined.hpp) Thanks!
I'm not sure if there is a way to run OpenCL on ARM CPU cores. Maybe self-compiled PoCL?

@squareSphere29
Copy link

NVIDIA RTX 3060 Laptop
FP32-FP32
image

FP32-16S
image

FP32-16C
image

@gurkanctn
Copy link

@ProjectPhysX , I'll try to do a fair benchmark (the previous one had some background applications running) for the cases and report.

Thanks for the software. I started playing with it, and liked it a lot.

It would be great if pressure visualization on the walls could be easier.

@gurkanctn
Copy link

Orange Pi 5 benchmark results for FP32, FP16C, and FP16S.

orangepi5_FluidX3D_benchmark_FP32

orangepi5_FluidX3D_benchmark_2

orangepi5_FluidX3D_benchmark_1

@alexcode9
Copy link

NVIDIA GeForce RTX 2070 SUPER

FP16C:
FP16C

FP16S:
FP16S

FP32
FP32

@dmaienza
Copy link

AMD Radeon RX 5700

FP16S:
FP32FP16S

FP16C:
FP32FP16C

FP32:
FP32FP32

@PavelBlend
Copy link

AMD Radeon RX 580:

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.19 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | Ellesmere                                                  |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Ellesmere                                                  |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3516.0 (PAL,HSAIL) (Windows)                               |
| OpenCL Version | OpenCL C 2.0                                               |
| Compute Units  | 32 at 1206 MHz (2048 cores, 4.940 TFLOPs/s)                |
| Memory, Cache  | 8192 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 7936 MB global, 8126464 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                     10000 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 256 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1209 |     93 GB/s |        72 |         9996 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1240                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.19 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | Ellesmere                                                  |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Ellesmere                                                  |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3516.0 (PAL,HSAIL) (Windows)                               |
| OpenCL Version | OpenCL C 2.0                                               |
| Compute Units  | 32 at 1206 MHz (2048 cores, 4.940 TFLOPs/s)                |
| Memory, Cache  | 8192 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 7936 MB global, 8126464 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                     10000 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 256 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1596 |    123 GB/s |        95 |         9995 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1622                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /               FluidX3D Version 2.19 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | Ellesmere                                                  |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Ellesmere                                                  |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3516.0 (PAL,HSAIL) (Windows)                               |
| OpenCL Version | OpenCL C 2.0                                               |
| Compute Units  | 32 at 1206 MHz (2048 cores, 4.940 TFLOPs/s)                |
| Memory, Cache  | 8192 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 7936 MB global, 8126464 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                     10000 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 256 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|     857 |    131 GB/s |        51 |         9998 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 868                                                    |

@aervnu
Copy link

aervnu commented Oct 6, 2024

AMD Ryzen 7 5700X3D:
FP32
cpu_fp32_32
FP16C
cpu_fp32_16c
FP16S
cpu_fp32_16s

@ProjectPhysX
Copy link
Owner Author

@aervnu thanks! What memory speed are you running? 3200 MT/s?

@aervnu
Copy link

aervnu commented Oct 7, 2024

@aervnu thanks! What memory speed are you running? 3200 MT/s?

Yeah, I originally wanted to buy a 3600 kit but realized they were a tad bit more expensive than 3200 and since I figured that I wouldn't need 3600 I ended up with this thing here. I can definitely try to overclock it to 3600 still, if the 5700X3D's IMC blesses me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted extra attention is needed
Projects
None yet
Development

No branches or pull requests