Skip to content

Calculate SNAP descriptors #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Mar 15, 2024
Merged

Calculate SNAP descriptors #71

merged 37 commits into from
Mar 15, 2024

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Sep 11, 2023

This pull request introduces an interface to LAMMPS: https://docs.lammps.org/compute_sna_atom.html to calculate the SNAP descriptors directly from python. It primarily implements three functions:

  • stk.analyse.get_snap_descriptor_names() - to get the names of the SNAP components for a given 2j_max.
  • stk.analyse.calc_snap_descriptors_per_atom() - to calculate the per atom SNAP descriptors, for example to analyse crystal defects.
  • stk.analyse.calc_snap_descriptor_derivatives() - to calculate the per atom SNAP descriptors and their derivatives to fit a machine learning potential.

Copy link
Contributor

@pmrv pmrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally ok, but I left some nits. I do want a clearer separation between which functions are API and which functions are helpers and better (or any) docstrings on the API ones.

@pmrv
Copy link
Contributor

pmrv commented Sep 13, 2023

Oh, and I didn't understand the conda merge business. This is needed to have lammps installed for the tests but not as a dep? Can we not do it like in pyiron modules to have both environment files applied sequentially?

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exhaustive or detailed, just some thoughts skimming through. Type hints would make it all easier to read.

# Conflicts:
#	.ci_support/environment.yml
#	.github/workflows/unittests.yml
@jan-janssen jan-janssen marked this pull request as draft February 14, 2024 13:24
@jan-janssen
Copy link
Member Author

Oh, and I didn't understand the conda merge business. This is needed to have lammps installed for the tests but not as a dep? Can we not do it like in pyiron modules to have both environment files applied sequentially?

While it is technically possible to apply the files sequentially, this breaks the performance of the GitHub action. Still I agree that the conda_merge.py script is outdated and we can simply append the two environment files using tail.

@jan-janssen
Copy link
Member Author

Generally ok, but I left some nits. I do want a clearer separation between which functions are API and which functions are helpers and better (or any) docstrings on the API ones.

I marked all the internal functions as private functions.

@jan-janssen jan-janssen marked this pull request as ready for review March 8, 2024 22:54
@jan-janssen
Copy link
Member Author

This pull request is again ready for review

@jan-janssen jan-janssen requested review from pmrv and liamhuber March 11, 2024 14:03
Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are in place, names make enough sense, and tests are in place and passing; I'm not super familiar with SNAP so I'm not going to dig further into details, and those things are enough to satisfy me. There remains a trivially inefficient for loop that I already complained about that needs to be fixed, but otherwise I have only the following non-blocking and details-agnostic suggestions for you to consider:

  • Per my original review comment, the functions would benefit from type hints. Additionally, some of the docstring type hints can be expanded, e.g. is atom_types (list): actually atom_types (list[str]):
  • I like that the function names have verbs in them, but the difference between calc and get is implicit. If get functions are cheap and calc functions are expensive, then this seems intuitive, otherwise maybe some other choice is needed. If both are going to be used, care will be needed to have the same distinction through the whole code base (in case this isn't already done)
  • I'm not sold on the necessity of including snap in function names in the snap module, e.g. I think I prefer from ...snap import ...descriptors... as ...snap_descriptors... vs the existing from ...snap import ...snap_drescriptors.... This is a matter of taste and consistency with the rest of the modules is more important.
  • Using big static arrays of numbers in the tests is both hard to understand and fragile to maintain. Better would be to write the code for generating some arrays such that they are forced to obey some features (symmetry, diagonality, or maybe clearly a perfect bulk fcc, or whatever) and then test for some property (e.g. contrasting (non)-uniformity of per-atom descriptors in bulk vs vacancy structures, or similar).

Comment on lines +472 to +474
assert np.all(
lmp_atom_ids == 1 + np.arange(num_atoms)
), "LAMMPS seems to have lost atoms"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There exists a condition under which the assert will get skipped -- are you sure you want this and not a try/except?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert statement is a way to fail early. If users run their python code with -O to improve performance they do not get this nice error message but rather a more cryptic one when we try to read from the LAMMPS memory. So from my perspective it is a hint when things go wrong rather than a try/except condition.

@jan-janssen
Copy link
Member Author

  • Per my original review comment, the functions would benefit from type hints. Additionally, some of the docstring type hints can be expanded, e.g. is atom_types (list): actually atom_types (list[str]):

I agree that we should add type hints to the whole structuretoolkit package as well as atomistics but I leave this to a separate pull request for now.

  • I like that the function names have verbs in them, but the difference between calc and get is implicit. If get functions are cheap and calc functions are expensive, then this seems intuitive, otherwise maybe some other choice is needed. If both are going to be used, care will be needed to have the same distinction through the whole code base (in case this isn't already done)

While the initial intention was to highlight computationally expensive functions, I agree this is inconsistent with the rest of the package so I renamed the function to always use get rather than calc.

  • I'm not sold on the necessity of including snap in function names in the snap module, e.g. I think I prefer from ...snap import ...descriptors... as ...snap_descriptors... vs the existing from ...snap import ...snap_drescriptors.... This is a matter of taste and consistency with the rest of the modules is more important.

At the moment the users are intended to use the function directly from the structure toolkit module:

from ase.build import bulk
import structuretookit as stk
stk.analyse.calc_snap_descriptors_per_atom(structure=bulk("Cu", cubic=True), atom_types=['Cu'])

This usage is also demonstrated in the tests in TestSNAP.

  • Using big static arrays of numbers in the tests is both hard to understand and fragile to maintain. Better would be to write the code for generating some arrays such that they are forced to obey some features (symmetry, diagonality, or maybe clearly a perfect bulk fcc, or whatever) and then test for some property (e.g. contrasting (non)-uniformity of per-atom descriptors in bulk vs vacancy structures, or similar).

The arrays only store the descriptor for a single atom while the structures use multiple atoms. But I am a bit reluctant to re-implement the SNAP descriptor in python just to validate the test.

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO type hinting can be implemented one module at a time, so I suggest going for it here and now to start the ball moving rather than delaying it to a future PR; I also don't quite follow your argument re the arrays, but this was a "tests better" rather than a "needs tests" comment so I'm not too stressed about it. Otherwise lgtm; the if clause replacement is very elegantly expressed btw.

@jan-janssen jan-janssen merged commit 708ad2a into main Mar 15, 2024
@jan-janssen jan-janssen deleted the snap branch March 15, 2024 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants