-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Integration and test of RSS code #84
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here 😄
I've left a few comments regarding style - most of my points are applicable across the PR but I've made them in just one representative case.
The nequip_fitting
and the default parameters you have selected seem reasonable to me!
autoplex/data/common/jobs.py
Outdated
random_seed: int = None, | ||
): | ||
""" | ||
Job to sample training configurations from trajs of MD/RSS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider changing instances of "trajs" to "trajectories"? To me, the former is a needless contraction, particularly in a docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good! It has been changed now.
autoplex/data/common/jobs.py
Outdated
|
||
Parameters | ||
---------- | ||
selection_method : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are expecting one of several options, why not type hint this with Literal["cur", "bcur", ...]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using type hint is indeed better. I have implemented that change.
autoplex/data/common/jobs.py
Outdated
- 'cur': Pure CUR selection. | ||
- 'bcur': Boltzmann flat histogram in enthalpy, then CUR. | ||
- 'random': Random selection. | ||
- 'uniform': Uniform selection. Default is None. If None, then default to random. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why default to None
here, when you could just default to "random"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let‘s make it even simpler.
"energy_label": "energy", | ||
} | ||
|
||
if bcur_params is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the user specifies bcur_param={"soap_params": {"atom_sigma": 0.5}}
- do you want to keep the rest of the default soap params? Or require the user to provide all of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will only change the value of "atom_sigma". The rest of the default soap params will be kept. The user does not need to provide all parameters.
List of structures for sampling. Default is None. | ||
|
||
traj_info : list, optional | ||
List of dictionaries containing trajectory information. Each dictionary should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type hint should be list[dict]
here as a minimum I think. If you want to give more info, you could consider creating e.g. a TypedDict
to indicate to the the typing tools that you are expecting these keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This is done by traj_info: list[dict[str, Union[str, float]]] = None
.
autoplex/data/common/utils.py
Outdated
if recursive: | ||
for element in atoms_object: | ||
if isinstance(element, Iterable) and not isinstance( | ||
element, (str, bytes, ase.atoms.Atoms, ase.Atoms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is not mirrored in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made some changes.
@@ -634,6 +650,9 @@ def m3gnet_fitting( | |||
max_n: int = 4, | |||
device: str = "cuda", | |||
test_equal_to_val: bool = True, | |||
ref_energy_name: str = "REF_energy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thi nequip_fitting
functions appears to do what it says on the tin, but is quite hacky. Rather than writing a raw text file to a file, have you considered yaml dumping a dictionary of the config instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still have a few rather hacky parts in the fitting part of autoplex. Any further suggestions or pull requests would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the short term, I think this is fine to be merged.
In the medium term, we could consider using the functions exposed in the nequip.scripts.train/deploy
submodules to just run things from within python.
In the long term, a new fitting framework is being made that unifies a lot of these model architectures - integrating this will simplify things here significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very much agreed.
We should try to get a minimum viable product out first and then subsequently improve it. (I also very much hope this whole pytorch version and conflict mess between different fitting frameworks won't keep us too busy in the future. We might have to think about other installation strategies then.... )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point and agree with all views. For nequip, we definitely can switch to yaml. But for J-ACE, I don't find a good solution.
@jla-gardner, do you think the hyperparameters listed for nequip are the most important ones? Besides these, do you think there's any other hyperparameter that might have been overlooked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These hyperparameters are definitely the most important. Looks good to me!
autoplex/data/common/jobs.py
Outdated
|
||
import logging | ||
|
||
logging.basicConfig(level=logging.DEBUG, format='[%(levelname)s] %(message)s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, this will do. In a subsequent PR we should extract this into its own file, and adopt its use across the whole repo.
Most things have been cleared up. I will check the code to see if it still runs as intended. Still, to give my part of the approval to this PR, the following (EDIT!) three and an optional fourth criteria have to be met:
|
@YuanbinLiu I agree with @QuantumChemist here on the required changes.
|
One slight addition for the unit tests: I suggest having a common test file for all fitting-related unit tests. I think adding all tests to the RSS-related fitting tests is not a perfect solution, as I would not have expected all other tests for the fitting code to be in there as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just marked one of my requests now in the pull request itself.
tests/rss/test_ml_fitting.py
Outdated
from __future__ import annotations | ||
from autoplex.fitting.common.flows import MLIPFitMaker | ||
import shutil | ||
from pathlib import Path | ||
from jobflow import run_locally | ||
|
||
|
||
def test_gap_fit_maker(test_dir, memory_jobstore): | ||
|
||
database_dir = test_dir / "fitting/rss_training_dataset/" | ||
|
||
gapfit = MLIPFitMaker().make( | ||
auto_delta=False, | ||
glue_xml=False, | ||
twob={"delta": 2.0, "cutoff": 4}, | ||
threeb={"n_sparse": 10}, | ||
preprocessing_data=False, | ||
database_dir=database_dir | ||
) | ||
|
||
responses = run_locally( | ||
gapfit, ensure_success=True, create_folders=True, store=memory_jobstore | ||
) | ||
|
||
assert Path(gapfit.output["mlip_path"].resolve(memory_jobstore)).exists() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving these tests to the general fitting part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! It is done.
Another thing that is also missing that I notice now are type hints for every function.that isn't a Flow or job object. |
This could be the problem for the documentation generation, btw |
autoplex/fitting/common/utils.py
Outdated
@@ -213,7 +217,7 @@ def gap_fitting( | |||
title="Data error metrics", | |||
energy_limit=0.005, | |||
force_limit=0.1, | |||
species_list=species_list, | |||
species_list=species_list, # species list is required here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
species_list is required here to post_process GAP data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that maybe a new argument like pairplot: bool = None is better. If pairplot is used, we can use the existing function to get species_list instead of passing it (the process is quick). Because species_list does not seem related to all model fitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add this to our list of potential todos within an issue? Maybe to reach consensus here fast, we can keep the species_list for now and improve this later? @YuanbinLiu would this work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sounds sensible! Let's keep species_list at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add an Issue with your suggestion and then we can discuss this there! It can also be a more general issue if you have additional points!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Species_list will get related to all models, once such analysis plots are implemented for them as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just discuss this in a separate issue how to handle this. For now, let's be pragmatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some docstrings missing
ref_energy_name: str = "REF_energy", | ||
ref_force_name: str = "REF_forces", | ||
ref_virial_name: str = "REF_virial", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, they are added now.
ref_energy_name: str = "REF_energy", | ||
ref_force_name: str = "REF_forces", | ||
ref_virial_name: str = "REF_virial", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings are missing
ref_energy_name: str = "REF_energy", | ||
ref_force_name: str = "REF_forces", | ||
ref_virial_name: str = "REF_virial", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings missing
ref_energy_name: str = "REF_energy", | ||
ref_force_name: str = "REF_forces", | ||
ref_virial_name: str = "REF_virial", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings missing
I have now carefully checked and adjusted the code where needed. The only issues that are left now are the ones addressed here. Of course, as always, I'm happy to help with some of the changes, like e.g. the unit tests, if it's not possible for you to restore them by one click in the IDE etc. 😄 |
I'd like to kindly remind everyone about our contribution guidelines as well regarding variable names, type-hints etc.: Guidelines for contributions
General code structure
Formatting requirements
Commit guidelines
|
I will temporarily set aside the final suggestion in this PR for now, as we will consider it in the upcoming modifications to MLIPFitMaker. |
As I am on vacation, @QuantumChemist could you shortly check if everything looks good from your perspective and if you have further suggestions with regard to the three open points? Thank you! |
@YuanbinLiu the docs are still failing. Could you fix this as well? |
This looks really good at first glance, but Yuanbin has run out of Action time. Would you prefer me to merge this PR for now and start the fixes in a new PR or should I rather merge Yuanbin's rss branch into one of my branches and push that back to this PR once the last problems are fixed? |
I can merge but the other issues need to be addressed in a subsequent pull request in the following days. Only in this way we will avoid any issues with subsequent pull requests |
@YuanbinLiu Could you please adress the open points in a subsequent pull request? Maybe using a fork from @QuantumChemist or @naik-aakash ? Thank you! |
I made a new branch https://github.com/QuantumChemist/autoplex/tree/rss_fixes and will start a new PR then tomorrow. |
Yes, for sure |
@YuanbinLiu thank you 😃 |
This PR primarily integrates the RSS code with the phonon code.
Todo list