Calculate SNAP descriptors #71

jan-janssen · 2023-09-11T09:43:53Z

This pull request introduces an interface to LAMMPS: https://docs.lammps.org/compute_sna_atom.html to calculate the SNAP descriptors directly from python. It primarily implements three functions:

stk.analyse.get_snap_descriptor_names() - to get the names of the SNAP components for a given 2j_max.
stk.analyse.calc_snap_descriptors_per_atom() - to calculate the per atom SNAP descriptors, for example to analyse crystal defects.
stk.analyse.calc_snap_descriptor_derivatives() - to calculate the per atom SNAP descriptors and their derivatives to fit a machine learning potential.

pmrv

Generally ok, but I left some nits. I do want a clearer separation between which functions are API and which functions are helpers and better (or any) docstrings on the API ones.

structuretoolkit/analyse/snap.py

pmrv · 2023-09-13T10:18:40Z

Oh, and I didn't understand the conda merge business. This is needed to have lammps installed for the tests but not as a dep? Can we not do it like in pyiron modules to have both environment files applied sequentially?

liamhuber

Not exhaustive or detailed, just some thoughts skimming through. Type hints would make it all easier to read.

structuretoolkit/analyse/snap.py

# Conflicts: # .ci_support/environment.yml # .github/workflows/unittests.yml

… internal and external functions.

jan-janssen · 2024-03-08T21:50:10Z

Oh, and I didn't understand the conda merge business. This is needed to have lammps installed for the tests but not as a dep? Can we not do it like in pyiron modules to have both environment files applied sequentially?

While it is technically possible to apply the files sequentially, this breaks the performance of the GitHub action. Still I agree that the conda_merge.py script is outdated and we can simply append the two environment files using tail.

jan-janssen · 2024-03-08T22:26:15Z

Generally ok, but I left some nits. I do want a clearer separation between which functions are API and which functions are helpers and better (or any) docstrings on the API ones.

I marked all the internal functions as private functions.

jan-janssen · 2024-03-08T22:54:22Z

This pull request is again ready for review

liamhuber

Docstrings are in place, names make enough sense, and tests are in place and passing; I'm not super familiar with SNAP so I'm not going to dig further into details, and those things are enough to satisfy me. There remains a trivially inefficient for loop that I already complained about that needs to be fixed, but otherwise I have only the following non-blocking and details-agnostic suggestions for you to consider:

Per my original review comment, the functions would benefit from type hints. Additionally, some of the docstring type hints can be expanded, e.g. is atom_types (list): actually atom_types (list[str]):
I like that the function names have verbs in them, but the difference between calc and get is implicit. If get functions are cheap and calc functions are expensive, then this seems intuitive, otherwise maybe some other choice is needed. If both are going to be used, care will be needed to have the same distinction through the whole code base (in case this isn't already done)
I'm not sold on the necessity of including snap in function names in the snap module, e.g. I think I prefer from ...snap import ...descriptors... as ...snap_descriptors... vs the existing from ...snap import ...snap_drescriptors.... This is a matter of taste and consistency with the rest of the modules is more important.
Using big static arrays of numbers in the tests is both hard to understand and fragile to maintain. Better would be to write the code for generating some arrays such that they are forced to obey some features (symmetry, diagonality, or maybe clearly a perfect bulk fcc, or whatever) and then test for some property (e.g. contrasting (non)-uniformity of per-atom descriptors in bulk vs vacancy structures, or similar).

structuretoolkit/analyse/snap.py

liamhuber · 2024-03-12T19:08:59Z

structuretoolkit/analyse/snap.py

+    assert np.all(
+        lmp_atom_ids == 1 + np.arange(num_atoms)
+    ), "LAMMPS seems to have lost atoms"


There exists a condition under which the assert will get skipped -- are you sure you want this and not a try/except?

The assert statement is a way to fail early. If users run their python code with -O to improve performance they do not get this nice error message but rather a more cryptic one when we try to read from the LAMMPS memory. So from my perspective it is a hint when things go wrong rather than a try/except condition.

structuretoolkit/analyse/snap.py

jan-janssen · 2024-03-14T13:53:58Z

Per my original review comment, the functions would benefit from type hints. Additionally, some of the docstring type hints can be expanded, e.g. is atom_types (list): actually atom_types (list[str]):

I agree that we should add type hints to the whole structuretoolkit package as well as atomistics but I leave this to a separate pull request for now.

I like that the function names have verbs in them, but the difference between calc and get is implicit. If get functions are cheap and calc functions are expensive, then this seems intuitive, otherwise maybe some other choice is needed. If both are going to be used, care will be needed to have the same distinction through the whole code base (in case this isn't already done)

While the initial intention was to highlight computationally expensive functions, I agree this is inconsistent with the rest of the package so I renamed the function to always use get rather than calc.

I'm not sold on the necessity of including snap in function names in the snap module, e.g. I think I prefer from ...snap import ...descriptors... as ...snap_descriptors... vs the existing from ...snap import ...snap_drescriptors.... This is a matter of taste and consistency with the rest of the modules is more important.

At the moment the users are intended to use the function directly from the structure toolkit module:

from ase.build import bulk
import structuretookit as stk
stk.analyse.calc_snap_descriptors_per_atom(structure=bulk("Cu", cubic=True), atom_types=['Cu'])

This usage is also demonstrated in the tests in TestSNAP.

Using big static arrays of numbers in the tests is both hard to understand and fragile to maintain. Better would be to write the code for generating some arrays such that they are forced to obey some features (symmetry, diagonality, or maybe clearly a perfect bulk fcc, or whatever) and then test for some property (e.g. contrasting (non)-uniformity of per-atom descriptors in bulk vs vacancy structures, or similar).

The arrays only store the descriptor for a single atom while the structures use multiple atoms. But I am a bit reluctant to re-implement the SNAP descriptor in python just to validate the test.

liamhuber

IMO type hinting can be implemented one module at a time, so I suggest going for it here and now to start the ball moving rather than delaying it to a future PR; I also don't quite follow your argument re the arrays, but this was a "tests better" rather than a "needs tests" comment so I'm not too stressed about it. Otherwise lgtm; the if clause replacement is very elegantly expressed btw.

jan-janssen added 10 commits September 11, 2023 11:36

Add calculation of SNAP descriptors

d201661

renaming

4014f0c

add lammps as optional dependency - only unix

0619bdd

black formatting

a2aec83

Add LAMMPS dependency only for unix

8ab9a86

Add shell

1f93d78

Add yaml

204779e

add pyyaml

8fc6b2c

fix order

f1130ed

install pyyaml

da7df81

jan-janssen requested a review from pmrv September 11, 2023 11:33

jan-janssen mentioned this pull request Sep 13, 2023

Structure Descriptors for Machine Learning #75

Open

pmrv requested changes Sep 13, 2023

View reviewed changes

liamhuber reviewed Sep 13, 2023

View reviewed changes

jan-janssen added 2 commits January 14, 2024 18:17

Merge remote-tracking branch 'origin/main' into snap

71b214d

# Conflicts: # .ci_support/environment.yml # .github/workflows/unittests.yml

simplify the API

b26a795

jan-janssen marked this pull request as draft February 14, 2024 13:24

jan-janssen added 8 commits March 8, 2024 13:18

Add more docstrings, refactor the order of the functions and separate…

4f99419

… internal and external functions.

Merge remote-tracking branch 'origin/main' into snap

d9dcd91

black formatting

416a9cc

remove print functions

2c9f631

fix tests

0f881a4

Move LAMMPS import to _get_default_parameters() function

37b253e

add quadratic tests

61c15bd

add more tests

b4f0038

jan-janssen added the format_black label Mar 8, 2024

pyiron-runner and others added 3 commits March 8, 2024 21:32

Format black

acb4e28

execute lammps commands with for loop

6d47420

Merge remote-tracking branch 'origin/snap' into snap

01c71b6

jan-janssen added 2 commits March 8, 2024 15:47

remove outdated conda_merge.py script

ad4da41

update lammps version

2368240

jan-janssen added 4 commits March 8, 2024 15:52

fix windows tests

6290750

Add links to LAMMPS documentation

3f1a05b

rename get_apre()

5d3f804

rename variables

7ad13a8

jan-janssen added 2 commits March 8, 2024 16:34

fix imports in test

6934cc2

add sorting on the set

a5cde35

jan-janssen marked this pull request as ready for review March 8, 2024 22:54

compare with reference

4bb4e4a

jan-janssen requested review from pmrv and liamhuber March 11, 2024 14:03

liamhuber requested changes Mar 12, 2024

View reviewed changes

Add comment about ignoring LAMMPS crashes

f6d6cd5

jan-janssen added 2 commits March 14, 2024 09:18

rename calc_*() to get_*() for consistency with other descriptors

21dc2a3

remove if statement

054c53f

liamhuber approved these changes Mar 14, 2024

View reviewed changes

Add more docstrings and type hints

f5f191b

jan-janssen added format_black and removed format_black labels Mar 14, 2024

Format black

65ece1a

jan-janssen merged commit 708ad2a into main Mar 15, 2024

jan-janssen deleted the snap branch March 15, 2024 01:23

Calculate SNAP descriptors #71

Calculate SNAP descriptors #71

Uh oh!

Conversation

jan-janssen commented Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmrv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pmrv commented Sep 13, 2023

Uh oh!

liamhuber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jan-janssen commented Mar 8, 2024

Uh oh!

jan-janssen commented Mar 8, 2024

Uh oh!

jan-janssen commented Mar 8, 2024

Uh oh!

liamhuber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liamhuber Mar 12, 2024

Choose a reason for hiding this comment

Uh oh!

jan-janssen Mar 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jan-janssen commented Mar 14, 2024

Uh oh!

liamhuber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jan-janssen commented Sep 11, 2023 •

edited

Loading