Note: This EOSNet package is inherited from the CGCNN framework, and there are some major changes.
- Using atomic-centered Gaussian Overlap Matrix (GOM) Fingerprint vectors as atomic features
- Switch reading pymatgen structures from CIF to POSCAR
- Add
drop_last
option inget_train_val_test_loader
- Take data imbalance into account for classification job
- Clip
lfp
(Long FP) andsfp
(Contracted FP) length for arbitrary crystal structures - Add MPS support to accelerate training on MacOS, for details see PyTorch MPS Backend and Apple Metal acceleration
Note: For classification jobs you may need to modify line 227 in WeightedRandomSampler toweights_tensor = torch.as_tensor(weights, dtype=torch.float32 if weights.device.type == "mps" else torch.float64)
when using MPS backend. To maximize the efficiency of training while using MPS backend, you may want to use only single core (--workers 0
) of the CPU to load the dataset. - Add
save_to_disk
option,disable_mps
option and FP related arg options - Introduce
IdTargetData
class to do efficient sampling on the dataset - Update
collate_pool
to handle bothIdTargetData
andStructData
type dataset - Add residual and skip connection when called in
n_conv
loops - Use
CosineAnnealingLR
instead of linear LR schedulers - Use
get_neighbor_info
to correctly handle padding of neighbors with given cutoff - Add
--update-bond
option to useBondConvLayer
to updatenbr_fea
CrystalGraphConvNet
has been completely restructured and renamed toEosNet
- Add
--attention
and related options to useAttentionReadout
for crystal feature pooling - Complete refrom the
StructData
with batch loading and processing, and adddataset.clear_cache()
to release memory - Move instancing of
StructData
totain()
andvalidate()
seprately instead of inmain()
- Use
IdTargetData
forget_train_val_test_loader()
, and getstruct_data
fromStructData
by batches - Saving the
processed_data
to multiplenpz
files undersaved_npz_files
directory instead of one big file - Save both
train_results.csv
andtest_results.csv
at the end of training - Switching from Python3 implementation of the Fingerprint Library to C implementation to improve speed.
(Optional) Modify thesetup.py
infplib
if you useconda
to install LAPACK:if you uselapack_dir=["$CONDA_PREFIX/lib"] lapack_lib=['openblas'] extra_link_args = ["-Wl,-rpath,$CONDA_PREFIX/lib"] . . . include_dirs = [source_dir, "$CONDA_PREFIX/include"]
brew
to install LAPACK:you probably need to modify yourlapack_dir=["$HOMEBREW_PREFIX/opt/openblas/lib"] lapack_lib=['openblas'] extra_link_args = ['-framework', 'Accelerate', "-Wl,-rpath,$HOMEBREW_PREFIX/opt/openblas/lib"] . . . include_dirs = [source_dir, "$HOMEBREW_PREFIX/opt/openblas/include"]
~/.bashrc
file for compiler to find the correct LAPACK library:Then install the Fingerprint library (Snapshot of# If you use `conda install conda-forge::lapack` export DYLD_LIBRARY_PATH="$CONDA_PREFIX/lib:$DYLD_LIBRARY_PATH" # If you use `brew install openblas` export CFLAGS="-I/opt/homebrew/opt/openblas/include $CFLAGS" export LDFLAGS="-L/opt/homebrew/opt/openblas/lib $LDFLAGS"
fplib-3.1.2
):For the remaining EOSNet dependecies follow the original instruction.conda create -n fplibenv python=3.10 pip ; conda activate fplibenv python3 -m pip install -U pip setuptools wheel git clone https://github.com/Tack-Tau/fplib.git cd fplib ; git checkout fplib_3.1.2 python3 -m pip install .
Note:Currently onlylmax=0
is supported in the C version
This package is based on the Crystal Graph Convolutional Neural Networks that takes an arbitary crystal structure to predict material properties.
The package provides two major functions:
- Train a EOSNet model with a customized dataset.
- Predict material properties of new crystals with a pre-trained EOSNet model.
This package requires:
- fplib
- PyTorch
- scikit-learn
- pymatgen
- ASE
Numba(Numba is no longer needed since we are switching fromfplib3
tofplib_c
)
If you are new to Python, please conda to manage Python packages and environments.
conda activate fplibenv
python3 -m pip install numpy>=1.21.4 scipy>=1.8.0 ase==3.22.1
python3 -m pip install scikit-learn torch==2.2.2 torchvision==0.17.2 pymatgen==2024.3.1
The above environment has been tested stable for both M-chip MacOS and CentOS clusters
To catch the erroneous POSCAR file you can use the following check_fp.py
in the root_dir
:
#!/usr/bin/env python3
import os
import sys
import numpy as np
from functools import reduce
import fplib
from ase.io import read as ase_read
def get_ixyz(lat, cutoff):
lat = np.ascontiguousarray(lat)
lat2 = np.dot(lat, np.transpose(lat))
vec = np.linalg.eigvals(lat2)
ixyz = int(np.sqrt(1.0/max(vec))*cutoff) + 1
ixyz = np.int32(ixyz)
return ixyz
def check_n_sphere(rxyz, lat, cutoff, natx):
ixyzf = get_ixyz(lat, cutoff)
ixyz = int(ixyzf) + 1
nat = len(rxyz)
cutoff2 = cutoff**2
for iat in range(nat):
xi, yi, zi = rxyz[iat]
n_sphere = 0
for jat in range(nat):
for ix in range(-ixyz, ixyz+1):
for iy in range(-ixyz, ixyz+1):
for iz in range(-ixyz, ixyz+1):
xj = rxyz[jat][0] + ix*lat[0][0] + iy*lat[1][0] + iz*lat[2][0]
yj = rxyz[jat][1] + ix*lat[0][1] + iy*lat[1][1] + iz*lat[2][1]
zj = rxyz[jat][2] + ix*lat[0][2] + iy*lat[1][2] + iz*lat[2][2]
d2 = (xj-xi)**2 + (yj-yi)**2 + (zj-zi)**2
if d2 <= cutoff2:
n_sphere += 1
if n_sphere > natx:
raise ValueError()
def read_types(cell_file):
buff = []
with open(cell_file) as f:
for line in f:
buff.append(line.split())
try:
typt = np.array(buff[5], int)
except:
del(buff[5])
typt = np.array(buff[5], int)
types = []
for i in range(len(typt)):
types += [i+1]*typt[i]
types = np.array(types, int)
return types
if __name__ == "__main__":
current_dir = './'
for filename in os.listdir(current_dir):
f = os.path.join(current_dir, filename)
if os.path.isfile(f) and os.path.splitext(f)[-1].lower() == '.vasp':
atoms = ase_read(f)
lat = atoms.cell[:]
rxyz = atoms.get_positions()
chem_nums = list(atoms.numbers)
znucl_list = reduce(lambda re, x: re+[x] if x not in re else re, chem_nums, [])
typ = len(znucl_list)
znucl = np.array(znucl_list, int)
types = read_types(f)
cell = (lat, rxyz, types, znucl)
natx = int(256)
lmax = int(0)
cutoff = np.float64(int(np.sqrt(8.0))*3) # Shorter cutoff for GOM
try:
check_n_sphere(rxyz, lat, cutoff, natx)
except ValueError:
print(str(filename) + " is glitchy !")
if len(rxyz) != len(types) or len(set(types)) != len(znucl):
print(str(filename) + " is glitchy !")
else:
fp = fplib.get_lfp(cell, cutoff=cutoff, natx=natx, log=False) # Long Fingerprint
# fp = fplib.get_sfp(cell, cutoff=cutoff, natx=natx, log=False) # Contracted Fingerprint
To input crystal structures to EOSNet, you will need to define a customized dataset. Note that this is required for both training and predicting.
Before defining a customized dataset, you will need:
- POSCAR files recording the structure of the crystals that you are interested in
- The target properties for each crystal (not needed for predicting, but you need to put some random numbers in
id_prop.csv
)
You can create a customized dataset by creating a directory root_dir
with the following files:
-
id_prop.csv
: a CSV file with two columns. The first column recodes a uniqueID
for each crystal, and the second column recodes the value of target property. If you want to predict material properties withpredict.py
, you can put any number in the second column. (The second column is still needed.) -
ID.vasp
a POSCAR file that recodes the crystal structure, whereID
is the uniqueID
for the crystal.
The structure of the root_dir
should be:
root_dir
├── id_prop.csv
├── atom_init.json
├── id0.vasp
├── id1.vasp
├── ...
Before training a new GNN model, you will need to:
- Define a customized dataset at
root_dir
to store the structure-property relations of interest.
Then, in directory EOSNet
, you can train a GNN model for your customized dataset by:
python3 train.py root_dir
For detailed info of setting tags you can run
python3 train.py -h
python3 train.py root_dir --save_to_disk true --disable-mps --task regression --workers 7 --epochs 500 --batch-size 64 --optim 'Adam' --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1 --n-conv 3 --n-h 1 --lr 1e-3 --warmup-epochs 20 --lr-milestones 100 200 400 --weight-decay 0 | tee EOSNet_log.txt
To resume from a previous checkpoint
python3 train.py root_dir --save_to_disk false --disable-mps --resume ./checkpoint.pth.tar --task regression --workers 7 --epochs 500 --batch-size 64 --optim 'Adam' --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1 --n-conv 3 --n-h 1 --lr 1e-3 --warmup-epochs 20 --lr-milestones 100 200 400 --weight-decay 0 >> EOSNet_log.txt
After training, you will get three files in EOSNet
directory.
model_best.pth.tar
: stores the GNN model with the best validation accuracy.checkpoint.pth.tar
: stores the GNN model at the last epoch.test_results.csv
: stores theID
, target value, and predicted value for each crystal in test set.
In directory EOSNet
, you can predict the properties of the crystals in root_dir
:
python predict.py pre-trained.pth.tar --save_to_disk false --test root_dir
Note: you need to put some random numbers in id_prop.csv
and the struct_id
s are the structures you want to predict.
Please cite the following work if you want to use EOSNet:
For CGCNN framework, please cite:
@article{PhysRevLett.120.145301,
title = {Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties},
author = {Xie, Tian and Grossman, Jeffrey C.},
journal = {Phys. Rev. Lett.},
volume = {120},
issue = {14},
pages = {145301},
numpages = {6},
year = {2018},
month = {Apr},
publisher = {American Physical Society},
doi = {10.1103/PhysRevLett.120.145301},
url = {https://link.aps.org/doi/10.1103/PhysRevLett.120.145301}
}
If you use Python3 implementation of the Fingerprint Library, please cite:
@article{taoAcceleratingStructuralOptimization2024,
title = {Accelerating Structural Optimization through Fingerprinting Space Integration on the Potential Energy Surface},
author = {Tao, Shuo and Shao, Xuecheng and Zhu, Li},
year = {2024},
month = mar,
journal = {J. Phys. Chem. Lett.},
volume = {15},
number = {11},
pages = {3185--3190},
doi = {10.1021/acs.jpclett.4c00275},
url = {https://pubs.acs.org/doi/10.1021/acs.jpclett.4c00275}
}
If you use C implementation of the Fingerprint Library, please cite:
@article{zhuFingerprintBasedMetric2016,
title = {A Fingerprint Based Metric for Measuring Similarities of Crystalline Structures},
author = {Zhu, Li and Amsler, Maximilian and Fuhrer, Tobias and Schaefer, Bastian and Faraji, Somayeh and Rostami, Samare and Ghasemi, S. Alireza and Sadeghi, Ali and Grauzinyte, Migle and Wolverton, Chris and Goedecker, Stefan},
year = {2016},
month = jan,
journal = {The Journal of Chemical Physics},
volume = {144},
number = {3},
pages = {034203},
doi = {10.1063/1.4940026},
url = {https://doi.org/10.1063/1.4940026}
}
For GOM Fingerprint methodology, please cite:
@article{sadeghiMetricsMeasuringDistances2013,
title = {Metrics for Measuring Distances in Configuration Spaces},
author = {Sadeghi, Ali and Ghasemi, S. Alireza and Schaefer, Bastian and Mohr, Stephan and Lill, Markus A. and Goedecker, Stefan},
year = {2013},
month = nov,
journal = {The Journal of Chemical Physics},
volume = {139},
number = {18},
pages = {184118},
doi = {10.1063/1.4828704},
url = {https://pubs.aip.org/aip/jcp/article/317391}
}