-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORM: Use pydantic to specify a schema for each ORM entity #6255
base: main
Are you sure you want to change the base?
Conversation
3848afd
to
c253259
Compare
c253259
to
fc52f41
Compare
49379d9
to
d1659b4
Compare
a7a6065
to
069bd1f
Compare
655fcf0
to
d778da8
Compare
07f3001
to
054c43c
Compare
054c43c
to
1c65856
Compare
7c7ef64
to
1e7f536
Compare
@sphuber just wondering, have you checked how this affects performance? |
This is implemented following the pydantic PR for aiida-core: aiidateam/aiida-core#6255
def serialize_computer(self, computer: Computer, _info): | ||
return computer.label | ||
|
||
# @model_validator(mode='after') # type: ignore[misc] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like some code that was forgotten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking out this PR @agoscinski . For some background info: I commented this out on purpose for now. Ideally, this check should be there because when storing the code, this condition should be satisfied. Unfortunately, however, this now also prevents just instantiating this model that validates the check, even if we don't plan on storing it. This is problematic for the export behavior, because that would rely on instantiating a Model
that violates this uniqueness rules, even though in this case it doesn't apply.
The old CLI verdi code create core.code.installed
would also rely on something like this check to immediately return an error as soon as label
was specified in interactive mode and that label already existed for the same computer. It would then reprompt the user asking for a different label. I haven't been able to reimplement this feature in this PR so was keeping the comment as a reminder.
Safe to say that this PR is not completely done yet 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still "in progress"?
1e7f536
to
66f6096
Compare
@@ -132,6 +132,17 @@ requires-python = '>=3.9' | |||
'process.workflow.workchain' = 'aiida.orm.nodes.process.workflow.workchain:WorkChainNode' | |||
'process.workflow.workfunction' = 'aiida.orm.nodes.process.workflow.workfunction:WorkFunctionNode' | |||
|
|||
[project.entry-points.'aiida.orm'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to make orm pluggable, what is the use case of it?
a7c6c13
to
9816200
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6255 +/- ##
==========================================
+ Coverage 77.51% 77.97% +0.47%
==========================================
Files 560 566 +6
Lines 41444 42279 +835
==========================================
+ Hits 32120 32964 +844
+ Misses 9324 9315 -9 ☔ View full report in Codecov by Sentry. |
It would be nice to have some benchmarks, |
This is implemented following the pydantic PR for aiida-core: aiidateam/aiida-core#6255
* Introducing the kind checks. In principle, we can also not define the kinds, in that case it is up to the plugin to use the get_kinds() method of the StructureData to define kinds. * Starting to migrate to mkdocs * StructureData developer guide, missing: PropertyCollector and properties. * First profiling for the atomistic vs orm StructureData generation. Need more profiling for low-atoms systems, where it seems orm is better * Comment on profiling * Additional notes on profilings * Adding a last scaling test, to be commented and inserted in the github issue * Adding the automatic kind generation. Missing: deactivate this automatism. * Adding the `allow_kinds` input in the StructureData init. used to deactivate the automatic kind generation. * Adding the `to_legacy_structuredata` method in atomistc.StructureData This is provided in temporary way in order to make easier to migrate the plugin gradually to this new Data. In a first step, we should use this method anytime the plugin uses the StructureData. Then add the properties setting. * - Adding the developer docs - Adding some bugfixing for the empty StructureData init. * Added user guide with examples * Fixing #16 * This solves issue #10 This is implemented following the pydantic PR for aiida-core: aiidateam/aiida-core#6255 * New implementation, following the pydantic PR. adding the `charges` properties, in the same way as we have the cell and pbc: hard-coding the getter and setter methods. For now, no data validation is there. * Added the support to read charges also from ASE Atoms object. Added the map_kinds method, which return the mapping for the kinds as contained in the sites attribute. * Reverting back to non-pydantic implementation (Just removing the Model attribute from the StructureData class, nothing else changed) Added also the charges in the get_ase method. * Attaching the properties to the Sites. - code refactored: now `StructureData` and `Site` classes are in separated files, as well as the utils; `Kind` class is removed - `Site` class now contains the properties: `symbol`, `position`, `mass`, `charge`, `magnetization`, `kind_name`; `kind_name` should be removed. The properties are defined via the `property` decorator - `from_ase`, `to_ase` methods are added - `from_pymatgen`, `to_pymatgen` methods are added (**TOBE fixed to add also the magnetization**) - `from_file`, `to_file` methods are added. They rely on ASE `io.read` and `io.write` functions. - the `orm.StructureData` (the old one) now has a method `to_atomistic` - the `utils.py` file contains also the `to_kinds` and `get_kinds` functions, to automatically generate kinds. The user should only use the `get_kinds` function. **TOBE refined: kind names, and remove the other kinds methods from the StructureData class.** - the `StructureData` constructor now works in this way: we provide the pbc, cell. Then we should use the `append_atom` method afterwards. **TOBE discussed, I guess we cann add lists contaning site-wise value for properties** - slicing of structure data: defined the __getitem__ method, to obtain a sliced structuredata instance from the initial one. - TOBE added: `to_supercell` method. * adding the possiblity to provide lists in the constructor BUT we are gonna change this by providing the possibility to provide list of sites (each of them is a dictionary), in the same format as they are represented in the database * Moving the get_kinds and _to_kinds (now a private method) into the StructureData class. * Removed kind related methods which now are no more needed in the StructureData * Utilities for cell and sites update * New StructureData and StructureDataMutable both inheriting from StructureDataCore. Added properties: - charge - magmom Relevant added methods: - from/to ASE/Pymatgen - to_legacy - to_structuredata - to_mutable_structuredata Added single page for API tutorial. Added tests for basic functionalities of both Structure Classes. * Allowing also magmom to be provided as floats: they are then stored as [magmom,0,0] * Improving mutability and minor changes. Mutability is improved by removing all the setter methods, both in StructureDataCore and Site classes. Moreover, lists and tuples are returned as np.array with flags.writeable=False, meaning that we cannot even modify the internal arrays/lists. Other minor changes: - Now in StructureDataCore we have to define the pbc, cell and sites keywords to initialise it (and so the other subclasses), this is helpful to then document the inputs. - `get_kinds` routine fixed to work also with magmoms (arrays) - docu and tests updated with respect to these changes. * Adding type hinting and minor fixes removing mutable attribute in Site class adding a get_global_properties method in StructureDataCore, not used now but in the future to have a list of the properties in the database (not only attached to the single sites), to make easy to query wrt a property. * Improving flexibility, automatic get_kinds and pymatgen fixings. Providing full flexibility in the StructureDataMutable while preserving same data structure of StructureData. - adding the parent and index when we initialise the Site class (i.e. when we access structure.sites). This is helpful to be able to change directly properties in the StructureDataMutable class. - in the same direction, we implemented the ObservedArray class which basically allows to trigger the setter methods of arrays/lists even if we modify only one element (for example: structure.sites[0].position[0] = 5). In this way, changes are written into the structure._data, which represents the real data stored in the instance. `get_kinds` method now is stable. Fixing pymatgen problems with oxi_state and magmom. * Adding in the tutorial markdown also the get_kinds method, and minimal update of the pytests.
9816200
to
d4c4d94
Compare
prefixes = ('aiida.orm.nodes.',) | ||
prefixes = ('aiida.orm.nodes.', 'aiida.orm.core.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the changes of adding the aiida.orm
entry point group and making it possible to load the ORM classes through an entry point. But I will have to go back to the PR and actually test to see if this is still necessary
@property | ||
def sealed(self) -> bool: | ||
return self.base.attributes.get(self.SEALED_KEY, False) # type: ignore[attr-defined] | ||
|
||
@property | ||
def is_sealed(self) -> bool: | ||
"""Returns whether the node is sealed, i.e. whether the sealed attribute has been set to True.""" | ||
return self.base.attributes.get(self.SEALED_KEY, False) # type: ignore[attr-defined] | ||
return self.sealed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping both for backwards compatibility, or is there a different reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, keeping it for backwards compatibility. In the model, I define the field as sealed
so the (de)serialization code expects a sealed
property, which is why I add it. I could have also chosen to not add the property and just name the field is_sealed
instead. Or, we could keep sealed
as the field and then define the model_to_orm
attribute as model_to_orm: lambda model: model.is_sealed
.
SEALED_KEY = 'sealed' | ||
|
||
__qb_fields__ = [ | ||
add_field( | ||
SEALED_KEY, | ||
dtype=bool, | ||
doc='Whether the node is sealed', | ||
), | ||
] | ||
class Model(pydantic.BaseModel): | ||
sealed: bool = MetadataField(description='Whether the node is sealed') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the SEALED_KEY
still in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is now used in the new sealed
property
@@ -375,7 +377,7 @@ def set_file(self, file, filename=None): | |||
from aiida.common.exceptions import ParsingError | |||
from aiida.common.files import md5_file, md5_from_filelike | |||
|
|||
parsed_data = parse_upf(file) | |||
parsed_data = parse_upf(file, check_filename=self.CHECK_FILENAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Thanks for the review @edan-bainglass . I see that most your comments are questions. This is fair enough the PR doesn't provide any description of the changes and the reasoning behind it. Since most of your questions all refer to the same concept and so would be answered by the same answer, instead of going through them one by one, I will answer it in general here. The whole point of this PR is to provide a way that any ORM class can automatically be serialized to JSON and the same instance can be recovered from that serialized JSON payload. This is implemented in the Instead of manually implementing JSON (de)serializing for all classes, we use For all of this to work, we need to be able to clean map an instance of a I have been thinking whether those changes should be in separate commits. If we take the example of the The changes to the entry point handling, and adding |
@sphuber first, thank you so much for the detailed explanation of the PR. The scope/depth of change + uncharacteristic lack of explanation was a recipe for confusion. Note that at the time of you writing the above, I was only "half"-way through my review, as I had started it quite late in the day. I'm wrapping up the review at the moment, but with the above clarifications, the PR is now mostly crystal clear! And I think you can skip PR refactoring. It all makes sense now as a whole. Thanks for implementing this! It is pertinent to my current use case (query builder web app), which requires JSON (de)serialization of nodes via REST API calls. I believe you had mentioned before when we worked on the fields PR that this would precisely be where these changes would make an impact. Looking forward to this dropping! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sphuber thanks again for implementing this. As mentioned in my other comment, it is mostly clear. Left a few comments after your post that I think would still benefit from clarification. Other than that, LGTM!
:param model_to_orm: Optional callable to convert the value of a field from a model instance to an ORM instance. | ||
:param exclude_to_orm: When set to ``True``, this field value will not be passed to the ORM entity constructor | ||
through ``Entity.from_model``. | ||
:param exclude_to_orm: When set to ``True``, this field value will not be exposed on the CLI command that is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exclude_from_cli
node: int = MetadataField( | ||
description='Node PK that the comment is attached to', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node
as an int
type might be confusing, no? One might expect a Node
type. Was node_pk
, so I guess you had a reason to go with just node
?
user: int = MetadataField( | ||
description='The PK of the user', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating a comment from comments.py
, but again, user
here is the PK, but in the __init__
, user
comes in as User
and when PK is needed -> user.pk
. What was your reasoning for dropping _pk
in the models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another generic "problem" that arises due to the discrepancies in field types when comparing an ORM instance and its equivalent pydantic model. The Python ORM deals with instances of the ORM, i.e. an AuthInfo
takes a User
instance, even though in the database layer the link is represented by the pk of the User
instance. Similarly, when serializing an AuthInfo
to JSON, the User
field also needs to be serialized, which we do by returning its pk as an integer.
From the ORM's perspective, it makes more sense to work with ORM instances, i.e. it is easier for the user (I would argue) for the constructor to request a User
instance for the user
argument as opposed to its pk. Same goes for other ORM classes like Computer
etc. But the pydantic
model is outward facing where the User
class makes no sense. So the pydantic
model instead defines the user
field as its serialized data type, i.e. an int
. When a user will create a new AuthInfo
over a REST API, they will be specifying its user
by its pk and not through a User
instance.
This discrepancy between the types of the ORM class constructor and its pydantic model will pop up in various places. That is why these orm_class
and orm_to_model
attributes are there to allow the (de)serialization code in Entity
to convert between the two. We could choose to make it more consistent by changing the constructors to not take ORM class instances but always work with their pks. However this carries the downsides:
- it makes the ORM less intuitive to work with (IMHO)
- it is backwards incompatible
For the last point, it is perhaps possible to provide a deprecation pathway but I think this will still introduce a lot of hassle for users. That is why I opted for the current solution and keep the slight discrepancy between pydantic model and the ORM class constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally understand, though I was more commenting on the model key names - user
instead of user_pk
- and not on the types - int
vs. User
, the latter I understand from your comment 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more commenting on the model key names - user instead of user_pk - and not on the types - int vs. User
I see. The reason I didn't name it user_pk
is because the serializer on the pydantic model will already convert it to an actual user. So if you do pydantic_comment = Comment.Model(user=1)
then assert isinstance(pydantic_comment.user, User)
will pass. In other words, the user
field on a pydantic model instance will return a User
instance and not the integer, but an integer is required to construct the model instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for clarifying 🙂
@@ -88,29 +88,25 @@ def get_command(self, ctx: click.Context, cmd_name: str) -> click.Command | None | |||
command = super().get_command(ctx, cmd_name) | |||
return command | |||
|
|||
def call_command(self, ctx, cls, **kwargs): | |||
def call_command(self, ctx, cls, non_interactive, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you briefly explain non_interactive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generic option that is used in all interactive verdi
commands. It is the -n/--non-interactive
flag, and when specified it forces the command to never prompt, which is necessary for use in non-interactive scripts for example.
@edan-bainglass one reason why I hadn't (yet) written an extensive documentation in the commits here, is because I actually put this in an AEP. There you will find an extensive description of the goal, design, implementation and considerations. I also provide examples of how these changes make it then trivial to write a REST API around it. I provide a working example with FastAPI. As I mentioned in a comment elsewhere, I will quickly revisit this and see if the changes to the plugin loading system are still required for the REST API use case. If not, I will revert those. I would encourage yourself to also start testing this PR out with your envisioned use case to see what works and what doesn't and what might be missing. |
No description provided.