-
Notifications
You must be signed in to change notification settings - Fork 1
Calculate SNAP descriptors #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally ok, but I left some nits. I do want a clearer separation between which functions are API and which functions are helpers and better (or any) docstrings on the API ones.
Oh, and I didn't understand the conda merge business. This is needed to have lammps installed for the tests but not as a dep? Can we not do it like in pyiron modules to have both environment files applied sequentially? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exhaustive or detailed, just some thoughts skimming through. Type hints would make it all easier to read.
# Conflicts: # .ci_support/environment.yml # .github/workflows/unittests.yml
… internal and external functions.
While it is technically possible to apply the files sequentially, this breaks the performance of the GitHub action. Still I agree that the |
I marked all the internal functions as private functions. |
This pull request is again ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings are in place, names make enough sense, and tests are in place and passing; I'm not super familiar with SNAP so I'm not going to dig further into details, and those things are enough to satisfy me. There remains a trivially inefficient for loop that I already complained about that needs to be fixed, but otherwise I have only the following non-blocking and details-agnostic suggestions for you to consider:
- Per my original review comment, the functions would benefit from type hints. Additionally, some of the docstring type hints can be expanded, e.g. is
atom_types (list):
actuallyatom_types (list[str]):
- I like that the function names have verbs in them, but the difference between
calc
andget
is implicit. Ifget
functions are cheap andcalc
functions are expensive, then this seems intuitive, otherwise maybe some other choice is needed. If both are going to be used, care will be needed to have the same distinction through the whole code base (in case this isn't already done) - I'm not sold on the necessity of including
snap
in function names in thesnap
module, e.g. I think I preferfrom ...snap import ...descriptors... as ...snap_descriptors...
vs the existingfrom ...snap import ...snap_drescriptors...
. This is a matter of taste and consistency with the rest of the modules is more important. - Using big static arrays of numbers in the tests is both hard to understand and fragile to maintain. Better would be to write the code for generating some arrays such that they are forced to obey some features (symmetry, diagonality, or maybe clearly a perfect bulk fcc, or whatever) and then test for some property (e.g. contrasting (non)-uniformity of per-atom descriptors in bulk vs vacancy structures, or similar).
assert np.all( | ||
lmp_atom_ids == 1 + np.arange(num_atoms) | ||
), "LAMMPS seems to have lost atoms" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There exists a condition under which the assert will get skipped -- are you sure you want this and not a try/except?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert statement is a way to fail early. If users run their python code with -O
to improve performance they do not get this nice error message but rather a more cryptic one when we try to read from the LAMMPS memory. So from my perspective it is a hint when things go wrong rather than a try/except condition.
I agree that we should add type hints to the whole
While the initial intention was to highlight computationally expensive functions, I agree this is inconsistent with the rest of the package so I renamed the function to always use get rather than calc.
At the moment the users are intended to use the function directly from the structure toolkit module: from ase.build import bulk
import structuretookit as stk
stk.analyse.calc_snap_descriptors_per_atom(structure=bulk("Cu", cubic=True), atom_types=['Cu']) This usage is also demonstrated in the tests in
The arrays only store the descriptor for a single atom while the structures use multiple atoms. But I am a bit reluctant to re-implement the SNAP descriptor in python just to validate the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO type hinting can be implemented one module at a time, so I suggest going for it here and now to start the ball moving rather than delaying it to a future PR; I also don't quite follow your argument re the arrays, but this was a "tests better" rather than a "needs tests" comment so I'm not too stressed about it. Otherwise lgtm; the if clause replacement is very elegantly expressed btw.
This pull request introduces an interface to LAMMPS: https://docs.lammps.org/compute_sna_atom.html to calculate the SNAP descriptors directly from python. It primarily implements three functions:
stk.analyse.get_snap_descriptor_names()
- to get the names of the SNAP components for a given 2j_max.stk.analyse.calc_snap_descriptors_per_atom()
- to calculate the per atom SNAP descriptors, for example to analyse crystal defects.stk.analyse.calc_snap_descriptor_derivatives()
- to calculate the per atom SNAP descriptors and their derivatives to fit a machine learning potential.