Replies: 1 comment
-
Hi Bartosz, The energy of the isolated atoms depends on the specific pseudopotential used as well as the functional, so tabulating those would be quite an amount of work. In normal scenarios, these atomic energy calculations usually finish quite quickly (a few seconds -- a few minutes) so it's usually easier to do it on the fly. This is a bug that I've encountered once on a very specific cluster here in Belgium, and I haven't quite figured out what causes it. Heuristically, I've found that by adding / removing a few MPI flags, CP2K performance will go back to normal again, but I don't quite understand why this is the case given that everything is executed within a container. What is the host OS, and host container runtime (singularity/apptainer version)? Did you modify the default MPI command in the .yaml? |
Beta Was this translation helpful? Give feedback.
-
I'm trying to reproduce mof_phase_transition.py example and I'm facing issue where with increasing number of cores per worker my calculations gets prohibitively slow. In all cases
max_walltime: 20
results inAssertionError: atomic energy calculation of O failed
because none of the CP2K tasks for oxygen are completed in 20 minutes.I played a bit with different number of cores per worker and here are values of SCF steps reached in 20 minutes for oxygen task with multiplicity 5:
Finally I was able to finish this part by increasing max_walltime to 180 minutes and using only 1 core per worker but this will create another issue when ReferenceEvaluation is used for whole MOF in next steps.
I've never used CP2K but I feel that 180 minutes is far too long for single point of single atom. What else I observe is the surprisingly low CPU performance of slurm tasks, at levels of <10%. I checked timings in CP2K output but MPI timing doesn't seem to be such large (however, as I said, I have no experience so maybe I don't understand something). Here is an example:
I'm using psiflow 3.0.4 and container
oras://ghcr.io/molmod/psiflow:3.0.4_python3.10_cuda
.Any idea what I could check to find where the problem is? Also, wouldn't it be better to tabulate energy for all atoms in psiflow source files? Thanks in advance for any help!
Beta Was this translation helpful? Give feedback.
All reactions