EOSNet: Embedded Overlap Structures for Graph Neural Networks

Note: This EOSNet package is inherited from the CGCNN framework, and there are some major changes.

Change log

Using atomic-centered Gaussian Overlap Matrix (GOM) Fingerprint vectors as atomic features
Switch reading pymatgen structures from CIF to POSCAR
Add drop_last option in get_train_val_test_loader
Take data imbalance into account for classification job
Clip lfp (Long FP) and sfp (Contracted FP) length for arbitrary crystal structures
Add MPS support to accelerate training on MacOS, for details see PyTorch MPS Backend and Apple Metal acceleration
Note: For classification jobs you may need to modify line 227 in WeightedRandomSampler to weights_tensor = torch.as_tensor(weights, dtype=torch.float32 if weights.device.type == "mps" else torch.float64) when using MPS backend. To maximize the efficiency of training while using MPS backend, you may want to use only single core (--workers 0) of the CPU to load the dataset.
Add save_to_disk option, disable_mps option and FP related arg options
Introduce IdTargetData class to do efficient sampling on the dataset
Update collate_pool to handle both IdTargetData and StructData type dataset
Add residual and skip connection when called in n_conv loops
Use CosineAnnealingLR instead of linear LR schedulers
Use get_neighbor_info to correctly handle padding of neighbors with given cutoff
Add --update-bond option to use BondConvLayer to update nbr_fea
CrystalGraphConvNet has been completely restructured and renamed to EosNet
Add --attention and related options to use AttentionReadout for crystal feature pooling
Complete refrom the StructData with batch loading and processing, and add dataset.clear_cache() to release memory
Move instancing of StructData to tain() and validate() seprately instead of in main()
Use IdTargetData for get_train_val_test_loader(), and get struct_data from StructData by batches
Saving the processed_data to multiple npz files under saved_npz_files directory instead of one big file
Save both train_results.csv and test_results.csv at the end of training

Switching from Python3 implementation of the Fingerprint Library to C implementation to improve speed.
(Optional) Modify the setup.py in fplib if you use conda to install LAPACK:

lapack_dir=["$CONDA_PREFIX/lib"]
lapack_lib=['openblas']
extra_link_args = ["-Wl,-rpath,$CONDA_PREFIX/lib"]
.
.
.
include_dirs = [source_dir, "$CONDA_PREFIX/include"]

if you use brew to install LAPACK:

lapack_dir=["$HOMEBREW_PREFIX/opt/openblas/lib"]
lapack_lib=['openblas']
extra_link_args = ['-framework', 'Accelerate',
                   "-Wl,-rpath,$HOMEBREW_PREFIX/opt/openblas/lib"]
.
.
.
include_dirs = [source_dir, "$HOMEBREW_PREFIX/opt/openblas/include"]

you probably need to modify your ~/.bashrc file for compiler to find the correct LAPACK library:

# If you use `conda install conda-forge::lapack`
export DYLD_LIBRARY_PATH="$CONDA_PREFIX/lib:$DYLD_LIBRARY_PATH"
# If you use `brew install openblas`
export CFLAGS="-I/opt/homebrew/opt/openblas/include $CFLAGS"
export LDFLAGS="-L/opt/homebrew/opt/openblas/lib $LDFLAGS"

Then install the Fingerprint library (Snapshot of fplib-3.1.2 ):

conda create -n fplibenv python=3.10 pip ; conda activate fplibenv
python3 -m pip install -U pip setuptools wheel
git clone https://github.com/Tack-Tau/fplib.git
cd fplib ; git checkout fplib_3.1.2
python3 -m pip install .

For the remaining EOSNet dependecies follow the original instruction.
Note: ~~Currently only lmax=0 is supported in the C version~~

This package is based on the Crystal Graph Convolutional Neural Networks that takes an arbitary crystal structure to predict material properties.

The package provides two major functions:

Train a EOSNet model with a customized dataset.
Predict material properties of new crystals with a pre-trained EOSNet model.

Dependencies

This package requires:

fplib
PyTorch
scikit-learn
pymatgen
ASE
~~Numba~~ (Numba is no longer needed since we are switching from fplib3 to fplib_c)

If you are new to Python, please conda to manage Python packages and environments.

conda activate fplibenv
python3 -m pip install numpy>=1.21.4 scipy>=1.8.0 ase==3.22.1
python3 -m pip install scikit-learn torch==2.2.2 torchvision==0.17.2 pymatgen==2024.3.1

The above environment has been tested stable for both M-chip MacOS and CentOS clusters

Check your strcuture files before use EOSNet

To catch the erroneous POSCAR file you can use the following check_fp.py in the root_dir:

#!/usr/bin/env python3

import os
import sys
import numpy as np
from functools import reduce
import fplib
from ase.io import read as ase_read

def get_ixyz(lat, cutoff):
    lat = np.ascontiguousarray(lat)
    lat2 = np.dot(lat, np.transpose(lat))
    vec = np.linalg.eigvals(lat2)
    ixyz = int(np.sqrt(1.0/max(vec))*cutoff) + 1
    ixyz = np.int32(ixyz)
    return ixyz

def check_n_sphere(rxyz, lat, cutoff, natx):
    
    ixyzf = get_ixyz(lat, cutoff)
    ixyz = int(ixyzf) + 1
    nat = len(rxyz)
    cutoff2 = cutoff**2

    for iat in range(nat):
        xi, yi, zi = rxyz[iat]
        n_sphere = 0
        for jat in range(nat):
            for ix in range(-ixyz, ixyz+1):
                for iy in range(-ixyz, ixyz+1):
                    for iz in range(-ixyz, ixyz+1):
                        xj = rxyz[jat][0] + ix*lat[0][0] + iy*lat[1][0] + iz*lat[2][0]
                        yj = rxyz[jat][1] + ix*lat[0][1] + iy*lat[1][1] + iz*lat[2][1]
                        zj = rxyz[jat][2] + ix*lat[0][2] + iy*lat[1][2] + iz*lat[2][2]
                        d2 = (xj-xi)**2 + (yj-yi)**2 + (zj-zi)**2
                        if d2 <= cutoff2:
                            n_sphere += 1
                            if n_sphere > natx:
                                raise ValueError()


def read_types(cell_file):
    buff = []
    with open(cell_file) as f:
        for line in f:
            buff.append(line.split())
    try:
        typt = np.array(buff[5], int)
    except:
        del(buff[5])
        typt = np.array(buff[5], int)
    types = []
    for i in range(len(typt)):
        types += [i+1]*typt[i]
    types = np.array(types, int)
    return types

if __name__ == "__main__":
    current_dir = './'
    for filename in os.listdir(current_dir):
        f = os.path.join(current_dir, filename)
        if os.path.isfile(f) and os.path.splitext(f)[-1].lower() == '.vasp':
            atoms = ase_read(f)
            lat = atoms.cell[:]
            rxyz = atoms.get_positions()
            chem_nums = list(atoms.numbers)
            znucl_list = reduce(lambda re, x: re+[x] if x not in re else re, chem_nums, [])
            typ = len(znucl_list)
            znucl = np.array(znucl_list, int)
            types = read_types(f)
            cell = (lat, rxyz, types, znucl)

            natx = int(256)
            lmax = int(0)
            cutoff = np.float64(int(np.sqrt(8.0))*3) # Shorter cutoff for GOM
            
            try:
                check_n_sphere(rxyz, lat, cutoff, natx)
            except ValueError:
                print(str(filename) + " is glitchy !")
            
            if len(rxyz) != len(types) or len(set(types)) != len(znucl):
                print(str(filename) + " is glitchy !")
            else:
                fp = fplib.get_lfp(cell, cutoff=cutoff, natx=natx, log=False) # Long Fingerprint
                # fp = fplib.get_sfp(cell, cutoff=cutoff, natx=natx, log=False)   # Contracted Fingerprint

Usage

Define a customized dataset

To input crystal structures to EOSNet, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

POSCAR files recording the structure of the crystals that you are interested in
The target properties for each crystal (not needed for predicting, but you need to put some random numbers in id_prop.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

id_prop.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the value of target property. If you want to predict material properties with predict.py, you can put any number in the second column. (The second column is still needed.)
ID.vasp a POSCAR file that recodes the crystal structure, where ID is the unique ID for the crystal.

The structure of the root_dir should be:

root_dir
├── id_prop.csv
├── atom_init.json
├── id0.vasp
├── id1.vasp
├── ...

Train a GNN model

Before training a new GNN model, you will need to:

Define a customized dataset at root_dir to store the structure-property relations of interest.

Then, in directory EOSNet, you can train a GNN model for your customized dataset by:

python3 train.py root_dir

For detailed info of setting tags you can run

python3 train.py -h

python3 train.py root_dir --save_to_disk true --disable-mps --task regression --workers 7 --epochs 500 --batch-size 64 --optim 'Adam' --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1 --n-conv 3 --n-h 1 --lr 1e-3 --warmup-epochs 20 --lr-milestones 100 200 400 --weight-decay 0 | tee EOSNet_log.txt

To resume from a previous checkpoint

python3 train.py root_dir --save_to_disk false --disable-mps --resume ./checkpoint.pth.tar --task regression --workers 7 --epochs 500 --batch-size 64 --optim 'Adam' --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1 --n-conv 3 --n-h 1 --lr 1e-3 --warmup-epochs 20 --lr-milestones 100 200 400 --weight-decay 0 >> EOSNet_log.txt

After training, you will get three files in EOSNet directory.

model_best.pth.tar: stores the GNN model with the best validation accuracy.
checkpoint.pth.tar: stores the GNN model at the last epoch.
test_results.csv: stores the ID, target value, and predicted value for each crystal in test set.

Predict material properties with a pre-trained GNN model

In directory EOSNet, you can predict the properties of the crystals in root_dir:

python predict.py pre-trained.pth.tar --save_to_disk false --test root_dir

Note: you need to put some random numbers in id_prop.csv and the struct_ids are the structures you want to predict.

How to cite

Please cite the following work if you want to use EOSNet:

For CGCNN framework, please cite:

@article{PhysRevLett.120.145301,
  title = {Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties},
  author = {Xie, Tian and Grossman, Jeffrey C.},
  journal = {Phys. Rev. Lett.},
  volume = {120},
  issue = {14},
  pages = {145301},
  numpages = {6},
  year = {2018},
  month = {Apr},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevLett.120.145301},
  url = {https://link.aps.org/doi/10.1103/PhysRevLett.120.145301}
}

If you use Python3 implementation of the Fingerprint Library, please cite:

@article{taoAcceleratingStructuralOptimization2024,
  title = {Accelerating Structural Optimization through Fingerprinting Space Integration on the Potential Energy Surface},
  author = {Tao, Shuo and Shao, Xuecheng and Zhu, Li},
  year = {2024},
  month = mar,
  journal = {J. Phys. Chem. Lett.},
  volume = {15},
  number = {11},
  pages = {3185--3190},
  doi = {10.1021/acs.jpclett.4c00275},
  url = {https://pubs.acs.org/doi/10.1021/acs.jpclett.4c00275}
}

If you use C implementation of the Fingerprint Library, please cite:

@article{zhuFingerprintBasedMetric2016,
  title = {A Fingerprint Based Metric for Measuring Similarities of Crystalline Structures},
  author = {Zhu, Li and Amsler, Maximilian and Fuhrer, Tobias and Schaefer, Bastian and Faraji, Somayeh and Rostami, Samare and Ghasemi, S. Alireza and Sadeghi, Ali and Grauzinyte, Migle and Wolverton, Chris and Goedecker, Stefan},
  year = {2016},
  month = jan,
  journal = {The Journal of Chemical Physics},
  volume = {144},
  number = {3},
  pages = {034203},
  doi = {10.1063/1.4940026},
  url = {https://doi.org/10.1063/1.4940026}
}

For GOM Fingerprint methodology, please cite:

@article{sadeghiMetricsMeasuringDistances2013,
  title = {Metrics for Measuring Distances in Configuration Spaces},
  author = {Sadeghi, Ali and Ghasemi, S. Alireza and Schaefer, Bastian and Mohr, Stephan and Lill, Markus A. and Goedecker, Stefan},
  year = {2013},
  month = nov,
  journal = {The Journal of Chemical Physics},
  volume = {139},
  number = {18},
  pages = {184118},
  doi = {10.1063/1.4828704},
  url = {https://pubs.aip.org/aip/jcp/article/317391}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
fpcnn		fpcnn
results		results
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
predict.py		predict.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EOSNet: Embedded Overlap Structures for Graph Neural Networks

Change log

Dependencies

Check your strcuture files before use EOSNet

Usage

Define a customized dataset

Train a GNN model

Predict material properties with a pre-trained GNN model

How to cite

About

Releases

Packages

Languages

License

Tack-Tau/EosNet-dev

Folders and files

Latest commit

History

Repository files navigation

EOSNet: Embedded Overlap Structures for Graph Neural Networks

Change log

Dependencies

Check your strcuture files before use EOSNet

Usage

Define a customized dataset

Train a GNN model

Predict material properties with a pre-trained GNN model

How to cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages