Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections endpoint #386
base: develop
Are you sure you want to change the base?
Collections endpoint #386
Changes from 6 commits
120cf12
4adae80
d9312fe
abccba0
8308a50
9172374
38c46de
c7a00dd
e817790
614bc76
5e3dc8a
ad271ed
a60c4a7
8c2c7c8
f449350
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I would define collections here, as the list is primarily external defintions of terminology that we are going to use in OPTIMADE, not terminology we are really defining ourselves (e.g. structure is not in this list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to go down the route of some defined (optional) fields for e.g.
description
,name
, then let provider-specific fields do the rest of the work (which can then be described in/info/collections
), e.g._exmpl_dft_parameters
if the collection defines a consistent set of DFT calculations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
properties
instead offields
, or am I missing something?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need an example to see what this field is used for, this just aggregates field/property names but not values? Does every entry have to have a value of each field listed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had only said "fields" (or "properties" seems fine too) instead of also including values because:
A basic example of this, which we've been using for the OpenKIM/ColabFit project, is to have a "StructuresCollection" that aggregates all of the
attributes.elements
fields of the linked structures to provide a single set of elements present in the collection. Something likestructure1.attributes.elements = ['C', 'Fe']
,structure2.attributes.elements = ['Al']
,collection.attributes.elements = ['Al', 'C', 'Fe']
. Another simple example would be to aggregateattributes.nsites
to count the total number of sites in the collection.Though it's a bit restrictive, I think that I'd say yes, every entry should have a value for each of the aggregated fields. I think that a collection should be assumed to be homogenous, but perhaps that could use some discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should update the collection every time one of the entries gets updated.
If entries get updated regularly, you could place relationships in each entry pointing to the collections they belong to. If this rarely happens, you could probably query all the collections to check whether a particular entry is in that collection and then update it. We don't have to specify how to update the data belonging to the collections in the specification though, only that it SHOULD be updated.
In some cases it could however be worth while to create a new structure rather than to update the existing one. For example, when you want a collection you refer to in an article to stay the same.
Perhaps you could make a dictionary for each Optimade property.
Which could, depending on the property, hold a set or the minimum, average and maximum value in the collection.
When making the properties for these collections I think it would be good to think about how you would search for collections.
The number of entries in your collection would probably also be a good property to include.
There is also the info endpoint where you can specify which properties are shared for each endpoint
For collections, it would be /info/collections. You therefore do not have to specify which properties are available for the collections. (I do have a field like that in the trajectories endpoint because in that case the fields do not need to be queryable.) If they are queryable you could use the IS KNOWN query to check whether an entry has the particular field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have difficulty seeing the utility of aggregated_fields being an OPTIMADE standardized property. There is of course no problem for OpenKIM serving, e.g., an
_openkim_aggregated_elements
that aggregates the values of the elements, etc.; but it just seems the definition means this field anyway needs to be interpreted differently depending on which database is being queried.