Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimization_type property for structures #455

Open
wants to merge 35 commits into
base: develop
Choose a base branch
from
Open
Changes from 5 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3665014
Add structure_origin to structures endpoint
rartino Feb 17, 2023
a7b7316
Improved formulations for structure_origin
rartino Feb 20, 2023
a8d77cf
Adjust terminology to use property rather than field
rartino Feb 20, 2023
8aedb8b
Apply suggestions from code review
rartino Feb 20, 2023
14c2825
Suggestions from review
rartino Feb 21, 2023
529a65c
Apply suggestions from review
rartino Feb 22, 2023
da4c43a
Edited categories and formulations for structure_orgin based on revie…
rartino Feb 22, 2023
27cde5f
Remove redundant line about null values in structure_origin
rartino Feb 22, 2023
7d93bc8
Merge branch 'develop' into structure_origin
rartino Mar 2, 2023
391629d
Update according to feedback in web meeting; remove constraint on con…
rartino Mar 17, 2023
4d6d65d
Update optimade.rst
rartino Mar 17, 2023
d7c9918
Replacing processed with derived in structure_origin
rartino Mar 17, 2023
e1c2b43
Apply suggestions from review
rartino Mar 20, 2023
55e20d1
Attempt at clarifying non-required status of "predicted"
rartino Mar 20, 2023
dc198a4
Slightly adjust wording
rartino Mar 20, 2023
43d9799
Slightly adjust wording
rartino Mar 20, 2023
a885700
Remove "extreme conditions" also in the overview and clarifiy that so…
rartino Mar 20, 2023
69a54f8
Remove another reference to extreme conditions
rartino Mar 20, 2023
6fbb752
Fix formatting error
rartino Mar 20, 2023
2ca16b4
Apply suggestions from review
rartino Mar 20, 2023
cfaceab
Remove trailing whitespace
rartino Mar 22, 2023
94e07a6
Add "indeterminate" classification
rartino Mar 22, 2023
5b96c6e
Minor grammar fix
rartino Mar 22, 2023
ef40f6e
Minor grammar fix
rartino Mar 22, 2023
7fadfc2
Minor grammar fix
rartino Mar 22, 2023
6994d2e
Fix formatting
rartino Mar 22, 2023
3e0bb62
Remove incorrect sentence about null-valued structure_origin
rartino Mar 22, 2023
bcc44fa
Complete rewrite based on review feedback
rartino Jan 10, 2024
74b459c
Minor language and grammar corrections
rartino Jan 10, 2024
0747f46
Remove trailing whitespace
rartino Jan 10, 2024
e4549fc
Merge branch 'develop' into structure_origin
rartino Jan 10, 2024
69fdc58
Remove whitespace
rartino Jan 10, 2024
4a0dcd5
Slight adjustment of a formulation
rartino Jan 10, 2024
0e324d3
Fix punctuation
rartino Jan 10, 2024
3d76751
Fix punctuation
rartino Jan 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2846,6 +2846,48 @@ structure\_features

- A structure having implicit atoms and using assemblies: :val:`["assemblies", "implicit_atoms"]`

structure\_origin
~~~~~~~~~~~~~~~~~

- **Description**: A string that describes aspects of the origin of the structural data to indicate if it is based directly or indirectly on experimental evidence, or inferred from other sources, giving some information on whether the structure is believed to exist in nature or can be synthesized as a compound stable at non-extreme conditions.
rartino marked this conversation as resolved.
Show resolved Hide resolved

- **Type**: string
- **Requirements/Conventions**:

- **Support**: OPTIONAL support in implementations, i.e., MAY be :val:`null`.
rartino marked this conversation as resolved.
Show resolved Hide resolved
- **Query**: MUST be a queryable property with support for all mandatory filter features.
- SHOULD take one of the following values:

* :val:`experimental`: the structural information is a faithful representation of the outcome of an experimental technique for structure determination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :val:`experimental`: the structural information is a faithful representation of the outcome of an experimental technique for structure determination.
* :val:`observed`: the structure has been observed experimentally and there is consensus that this structure exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the current classification scheme to be too vague and too poorly undefined to be satisfactory as a standard for a database implementation, especially for the automated application. In particular, what is "observed experimentally" is very much dependent on the field of research (e.g., in what sense was the Ds element, with the half life below a millisecond and abundances of several atoms actually observed?), and "consensus" and "structure exists" are in the eye of the beholder.

An alternative suggestion comes from our discussion with Rickard (@rartino) during the lunch-time discussion of the 2023 CECAM workshop. We notice that most of the structures we are interested in comes from some sort of a minimization procedure, when the atomic coordinates are refined against a chosen target function. This target function essentially defined the physical meaning of the refined structure (and thus the conclusions we an draw from it); closeness to the optimum can be evaluated and serves as a quality criterion. The target function can be specified regardless to the minimization procedures (algorithms, programs) that were employed. The target function is unique for a given structure and specific to a given methods, thus it can be used for broad classification of the structures.

The current examples of the classes could be:

  • structures defined by diffraction methods (aka "experimental"): minimise discrepancy between the observed and predicted scattered amplitudes, min sum_h,k,l ||F_obs| - |F_calc||; see https://dictionary.iucr.org/R_factor. Subclasses can be introduced depending on the radiation type (X-rays, neutrons, electrons);
  • DFT calculated structures: minimize the computed ground state energy as a function of electron density (right?), min E(rho(x,y,z)); subclasses can be introduced depending on the exact form of the hamiltonian, and on the aproximations that were in place;
  • classical minimized structures: minimize empirical energy, min E(bonds, angles, dihedrals);
  • machine learning structures: minimise discrepancy between the actual and predicted coordinates in the training process, minimise some other target function of the ML, or maximise the likelihood function;

Finally, there might be structures that were not optimised in any way, and these can be designated as "fantsy structures"

Note that the classes introduced in this way will be unambiguous, non-overlapping, and structures withing a single class will be easier comparable between themselves as the structures from different classes.

Should we go this direction as a more tractable one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say that I'm not sure this differs much from our current proposal, though we could certainly adopt some of your wording. Is it the PR you disagree with in its current state or just the suggestion in the comment you are replying to?

The important distinction missing from you suggestion is the mechanism for a database to indicate that they believe a structure to be globally stable when compared to other structures in the same database; whilst we can handle this with the eventual "stability" namespace, I think it would be a shame to exclude e.g., generative structures with high predicted stability probability, or phases that cannot easily be exhaustively screened for stability from our definition (the "stability" namespace would strictly boil this down to a distance from a well-defined locally completed convex hull).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say that I'm not sure this differs much from our current proposal, though we could certainly adopt some of your wording. Is it the PR you disagree with in its current state or just the suggestion in the comment you are replying to?

To me the difference is fundamental.

The suggested definitions, unlike the current PR wording, are unambiguous (IMHO) and mathematically well defined. We just need to double check what exactly minima are meant by different methods. I am pretty sure for experimental crystallography, not so sure about other methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important distinction missing from you suggestion is the mechanism for a database to indicate that they believe a structure to be globally stable when compared to other structures in the same database; whilst we can handle this with the eventual "stability" namespace, I think it would be a shame to exclude e.g., generative structures with high predicted stability probability, or phases that cannot easily be exhaustively screened for stability from our definition (the "stability" namespace would strictly boil this down to a distance from a well-defined locally completed convex hull).

I wold strongly suggest that stability is conveyed by another, numeric property. As you suggest, it should give energy above the convex hull, and would give a possibility to search for quantitative results.


* :val:`processed`: the structural information originates from experimental data, but has undergone additional processing in such a way that the result is still recognizable as the experimental structure it was based on.
rartino marked this conversation as resolved.
Show resolved Hide resolved
For example, experimental structures relaxed using *ab initio* calculations are meant to qualify for this category.
Substituting one or more elements in a structure (while, e.g., keeping the experimental coordinates the same) are not meant to qualify for this category.
rartino marked this conversation as resolved.
Show resolved Hide resolved
The category definition involves a degree of subjectivity that has to be decided by the database provider.

* :val:`predicted`: the structural information is not directly related to the outcome of an experimental technique on an existing material, but has undergone theoretical processing to suggest it as a candidate for a potentially synthesizable structure.
rartino marked this conversation as resolved.
Show resolved Hide resolved
This category includes theoretically invented structures that have been relaxed using *ab initio* calculations and found to be close to the convex hull of stability, or structures generated from AI models with a demonstrated reasonable predictive power.
This category definition also involves a degree of subjectivity that has to be determined by the database provider.

* :val:`hypothetical`: the structural information is known to have been created in a way that provides no guarantees of producing synthesizable structures or structures found in nature.
This category is suitable for randomly placed atoms (e.g., meant to provide a starting point for further processing) or outcomes of AI models with predictive power deemed insufficient for the :val:`predicted` category.
rartino marked this conversation as resolved.
Show resolved Hide resolved

* :val:`other`: the origin of the structural information is not correctly described by any of the other categories.
rartino marked this conversation as resolved.
Show resolved Hide resolved

* :val:`unknown`: no information is available regarding these aspects of the origin of the structural information.

The experiments and predictions referred to in the above definitions of the categories refer to existence at non-extreme conditions (i.e., existence around NTP or at lower temperatures) and in a regular atmosphere.
Providers who want to communicate structural information about compounds that exist only at unusual or extreme conditions should categorize them as :val:`other` and, if desired, use another facility (e.g., a provider-specific property) to communicate more specific information.

rartino marked this conversation as resolved.
Show resolved Hide resolved
- If the property is omitted, set to an empty string, or `null` it means the same thing as :val:`unknown`.

- Database-specific strings using a database provider prefix (e.g., `_exmpl_experimental_at_extreme_pressure`) MAY be used but are strongly discouraged.
Clients encountering unrecognized strings SHOULD treat them to mean the same as :val:`unknown`.
rartino marked this conversation as resolved.
Show resolved Hide resolved

- **Examples**:

- For a structure entry directly encoding structural information obtained from a neutron diffraction experiment: :val:`"experimental"`.
rartino marked this conversation as resolved.
Show resolved Hide resolved

Calculations Entries
--------------------

Expand Down