Skip to content

Conversation

@apuignav
Copy link
Contributor

First try at serious benchmarking.

Still, we're quite slow, also because creating multiple graphs takes a long time (total time is 202 seconds!):

(zfit36) [10:34]farm-gpu:~/zfit/tfphasespace/benchmark[benchmarks]$ CUDA_VISIBLE_DEVICES= python3 bench_tfphasespace.py
2019-03-12 10:35:18.778705: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-03-12 10:35:19.994733: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-03-12 10:35:19.994797: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: farm-gpu
2019-03-12 10:35:19.994807: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: farm-gpu
2019-03-12 10:35:19.994851: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 410.78.0
2019-03-12 10:35:19.994891: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 410.78.0
2019-03-12 10:35:19.994900: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 410.78.0
Initial run (may takes more time than consequent runs)
Elapsed time: 93251.58056803048 ms
starting benchmark
Total number of generated samples 100000100
Shape of one particle momentum (4, 1000001)
Elapsed time: 70495.63611880876 ms
Time per sample: 7.049556562324313e-07 ms
CUDA_VISIBLE_DEVICES= python3 bench_tfphasespace.py  202.81s user 12.83s system 99% cpu 3:37.76 total

vs

(zfit36) [10:38]farm-gpu:~/zfit/tfphasespace/benchmark[benchmarks]$ root -q bench_tgenphasespace.cxx+

Processing bench_tgenphasespace.cxx+...
(int) 0
root -l -q bench_tgenphasespace.cxx+  26.52s user 0.11s system 97% cpu 27.206 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants