Skip to content

Developer version of EosNet for fixing bugs & adding features

License

Notifications You must be signed in to change notification settings

Tack-Tau/EosNet-dev

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EOSNet: Embedded Overlap Structures for Graph Neural Networks

Note: This EOSNet package is inherited from the CGCNN framework, and there are some major changes.

Change log

  • Using atomic-centered Gaussian Overlap Matrix (GOM) Fingerprint vectors as atomic features
  • Switch reading pymatgen structures from CIF to POSCAR
  • Add drop_last option in get_train_val_test_loader
  • Take data imbalance into account for classification job
  • Clip lfp (Long FP) and sfp (Contracted FP) length for arbitrary crystal structures
  • Add MPS support to accelerate training on MacOS, for details see PyTorch MPS Backend and Apple Metal acceleration
    Note: For classification jobs you may need to modify line 227 in WeightedRandomSampler to weights_tensor = torch.as_tensor(weights, dtype=torch.float32 if weights.device.type == "mps" else torch.float64) when using MPS backend. To maximize the efficiency of training while using MPS backend, you may want to use only single core (--workers 0) of the CPU to load the dataset.
  • Add save_to_disk option, disable_mps option and FP related arg options
  • Introduce IdTargetData class to do efficient sampling on the dataset
  • Update collate_pool to handle both IdTargetData and StructData type dataset
  • Add residual and skip connection when called in n_conv loops
  • Use CosineAnnealingLR instead of linear LR schedulers
  • Use get_neighbor_info to correctly handle padding of neighbors with given cutoff
  • Add --update-bond option to use BondConvLayer to update nbr_fea
  • CrystalGraphConvNet has been completely restructured and renamed to EosNet
  • Add --attention and related options to use AttentionReadout for crystal feature pooling
  • Complete refrom the StructData with batch loading and processing, and add dataset.clear_cache() to release memory
  • Move instancing of StructData to tain() and validate() seprately instead of in main()
  • Use IdTargetData for get_train_val_test_loader(), and get struct_data from StructData by batches
  • Saving the processed_data to multiple npz files under saved_npz_files directory instead of one big file
  • Save both train_results.csv and test_results.csv at the end of training
  • Switching from Python3 implementation of the Fingerprint Library to C implementation to improve speed.
    (Optional) Modify the setup.py in fplib if you use conda to install LAPACK:
    lapack_dir=["$CONDA_PREFIX/lib"]
    lapack_lib=['openblas']
    extra_link_args = ["-Wl,-rpath,$CONDA_PREFIX/lib"]
    .
    .
    .
    include_dirs = [source_dir, "$CONDA_PREFIX/include"]
    if you use brew to install LAPACK:
    lapack_dir=["$HOMEBREW_PREFIX/opt/openblas/lib"]
    lapack_lib=['openblas']
    extra_link_args = ['-framework', 'Accelerate',
                       "-Wl,-rpath,$HOMEBREW_PREFIX/opt/openblas/lib"]
    .
    .
    .
    include_dirs = [source_dir, "$HOMEBREW_PREFIX/opt/openblas/include"]
    you probably need to modify your ~/.bashrc file for compiler to find the correct LAPACK library:
    # If you use `conda install conda-forge::lapack`
    export DYLD_LIBRARY_PATH="$CONDA_PREFIX/lib:$DYLD_LIBRARY_PATH"
    # If you use `brew install openblas`
    export CFLAGS="-I/opt/homebrew/opt/openblas/include $CFLAGS"
    export LDFLAGS="-L/opt/homebrew/opt/openblas/lib $LDFLAGS"
    Then install the Fingerprint library (Snapshot of fplib-3.1.2 ):
    conda create -n fplibenv python=3.10 pip ; conda activate fplibenv
    python3 -m pip install -U pip setuptools wheel
    git clone https://github.com/Tack-Tau/fplib.git
    cd fplib ; git checkout fplib_3.1.2
    python3 -m pip install .
    For the remaining EOSNet dependecies follow the original instruction.
    Note: Currently only lmax=0 is supported in the C version

This package is based on the Crystal Graph Convolutional Neural Networks that takes an arbitary crystal structure to predict material properties.

The package provides two major functions:

  • Train a EOSNet model with a customized dataset.
  • Predict material properties of new crystals with a pre-trained EOSNet model.

Dependencies

This package requires:

If you are new to Python, please conda to manage Python packages and environments.

conda activate fplibenv
python3 -m pip install numpy>=1.21.4 scipy>=1.8.0 ase==3.22.1
python3 -m pip install scikit-learn torch==2.2.2 torchvision==0.17.2 pymatgen==2024.3.1

The above environment has been tested stable for both M-chip MacOS and CentOS clusters

Check your strcuture files before use EOSNet

To catch the erroneous POSCAR file you can use the following check_fp.py in the root_dir:

#!/usr/bin/env python3

import os
import sys
import numpy as np
from functools import reduce
import fplib
from ase.io import read as ase_read

def get_ixyz(lat, cutoff):
    lat = np.ascontiguousarray(lat)
    lat2 = np.dot(lat, np.transpose(lat))
    vec = np.linalg.eigvals(lat2)
    ixyz = int(np.sqrt(1.0/max(vec))*cutoff) + 1
    ixyz = np.int32(ixyz)
    return ixyz

def check_n_sphere(rxyz, lat, cutoff, natx):
    
    ixyzf = get_ixyz(lat, cutoff)
    ixyz = int(ixyzf) + 1
    nat = len(rxyz)
    cutoff2 = cutoff**2

    for iat in range(nat):
        xi, yi, zi = rxyz[iat]
        n_sphere = 0
        for jat in range(nat):
            for ix in range(-ixyz, ixyz+1):
                for iy in range(-ixyz, ixyz+1):
                    for iz in range(-ixyz, ixyz+1):
                        xj = rxyz[jat][0] + ix*lat[0][0] + iy*lat[1][0] + iz*lat[2][0]
                        yj = rxyz[jat][1] + ix*lat[0][1] + iy*lat[1][1] + iz*lat[2][1]
                        zj = rxyz[jat][2] + ix*lat[0][2] + iy*lat[1][2] + iz*lat[2][2]
                        d2 = (xj-xi)**2 + (yj-yi)**2 + (zj-zi)**2
                        if d2 <= cutoff2:
                            n_sphere += 1
                            if n_sphere > natx:
                                raise ValueError()


def read_types(cell_file):
    buff = []
    with open(cell_file) as f:
        for line in f:
            buff.append(line.split())
    try:
        typt = np.array(buff[5], int)
    except:
        del(buff[5])
        typt = np.array(buff[5], int)
    types = []
    for i in range(len(typt)):
        types += [i+1]*typt[i]
    types = np.array(types, int)
    return types

if __name__ == "__main__":
    current_dir = './'
    for filename in os.listdir(current_dir):
        f = os.path.join(current_dir, filename)
        if os.path.isfile(f) and os.path.splitext(f)[-1].lower() == '.vasp':
            atoms = ase_read(f)
            lat = atoms.cell[:]
            rxyz = atoms.get_positions()
            chem_nums = list(atoms.numbers)
            znucl_list = reduce(lambda re, x: re+[x] if x not in re else re, chem_nums, [])
            typ = len(znucl_list)
            znucl = np.array(znucl_list, int)
            types = read_types(f)
            cell = (lat, rxyz, types, znucl)

            natx = int(256)
            lmax = int(0)
            cutoff = np.float64(int(np.sqrt(8.0))*3) # Shorter cutoff for GOM
            
            try:
                check_n_sphere(rxyz, lat, cutoff, natx)
            except ValueError:
                print(str(filename) + " is glitchy !")
            
            if len(rxyz) != len(types) or len(set(types)) != len(znucl):
                print(str(filename) + " is glitchy !")
            else:
                fp = fplib.get_lfp(cell, cutoff=cutoff, natx=natx, log=False) # Long Fingerprint
                # fp = fplib.get_sfp(cell, cutoff=cutoff, natx=natx, log=False)   # Contracted Fingerprint         

Usage

Define a customized dataset

To input crystal structures to EOSNet, you will need to define a customized dataset. Note that this is required for both training and predicting.

Before defining a customized dataset, you will need:

  • POSCAR files recording the structure of the crystals that you are interested in
  • The target properties for each crystal (not needed for predicting, but you need to put some random numbers in id_prop.csv)

You can create a customized dataset by creating a directory root_dir with the following files:

  1. id_prop.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the value of target property. If you want to predict material properties with predict.py, you can put any number in the second column. (The second column is still needed.)

  2. ID.vasp a POSCAR file that recodes the crystal structure, where ID is the unique ID for the crystal.

The structure of the root_dir should be:

root_dir
├── id_prop.csv
├── atom_init.json
├── id0.vasp
├── id1.vasp
├── ...

Train a GNN model

Before training a new GNN model, you will need to:

  • Define a customized dataset at root_dir to store the structure-property relations of interest.

Then, in directory EOSNet, you can train a GNN model for your customized dataset by:

python3 train.py root_dir

For detailed info of setting tags you can run

python3 train.py -h
python3 train.py root_dir --save_to_disk true --disable-mps --task regression --workers 7 --epochs 500 --batch-size 64 --optim 'Adam' --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1 --n-conv 3 --n-h 1 --lr 1e-3 --warmup-epochs 20 --lr-milestones 100 200 400 --weight-decay 0 | tee EOSNet_log.txt

To resume from a previous checkpoint

python3 train.py root_dir --save_to_disk false --disable-mps --resume ./checkpoint.pth.tar --task regression --workers 7 --epochs 500 --batch-size 64 --optim 'Adam' --train-ratio 0.8 --val-ratio 0.1 --test-ratio 0.1 --n-conv 3 --n-h 1 --lr 1e-3 --warmup-epochs 20 --lr-milestones 100 200 400 --weight-decay 0 >> EOSNet_log.txt

After training, you will get three files in EOSNet directory.

  • model_best.pth.tar: stores the GNN model with the best validation accuracy.
  • checkpoint.pth.tar: stores the GNN model at the last epoch.
  • test_results.csv: stores the ID, target value, and predicted value for each crystal in test set.

Predict material properties with a pre-trained GNN model

In directory EOSNet, you can predict the properties of the crystals in root_dir:

python predict.py pre-trained.pth.tar --save_to_disk false --test root_dir

Note: you need to put some random numbers in id_prop.csv and the struct_ids are the structures you want to predict.

How to cite

Please cite the following work if you want to use EOSNet:

For CGCNN framework, please cite:

@article{PhysRevLett.120.145301,
  title = {Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties},
  author = {Xie, Tian and Grossman, Jeffrey C.},
  journal = {Phys. Rev. Lett.},
  volume = {120},
  issue = {14},
  pages = {145301},
  numpages = {6},
  year = {2018},
  month = {Apr},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevLett.120.145301},
  url = {https://link.aps.org/doi/10.1103/PhysRevLett.120.145301}
}

If you use Python3 implementation of the Fingerprint Library, please cite:

@article{taoAcceleratingStructuralOptimization2024,
  title = {Accelerating Structural Optimization through Fingerprinting Space Integration on the Potential Energy Surface},
  author = {Tao, Shuo and Shao, Xuecheng and Zhu, Li},
  year = {2024},
  month = mar,
  journal = {J. Phys. Chem. Lett.},
  volume = {15},
  number = {11},
  pages = {3185--3190},
  doi = {10.1021/acs.jpclett.4c00275},
  url = {https://pubs.acs.org/doi/10.1021/acs.jpclett.4c00275}
}

If you use C implementation of the Fingerprint Library, please cite:

@article{zhuFingerprintBasedMetric2016,
  title = {A Fingerprint Based Metric for Measuring Similarities of Crystalline Structures},
  author = {Zhu, Li and Amsler, Maximilian and Fuhrer, Tobias and Schaefer, Bastian and Faraji, Somayeh and Rostami, Samare and Ghasemi, S. Alireza and Sadeghi, Ali and Grauzinyte, Migle and Wolverton, Chris and Goedecker, Stefan},
  year = {2016},
  month = jan,
  journal = {The Journal of Chemical Physics},
  volume = {144},
  number = {3},
  pages = {034203},
  doi = {10.1063/1.4940026},
  url = {https://doi.org/10.1063/1.4940026}
}

For GOM Fingerprint methodology, please cite:

@article{sadeghiMetricsMeasuringDistances2013,
  title = {Metrics for Measuring Distances in Configuration Spaces},
  author = {Sadeghi, Ali and Ghasemi, S. Alireza and Schaefer, Bastian and Mohr, Stephan and Lill, Markus A. and Goedecker, Stefan},
  year = {2013},
  month = nov,
  journal = {The Journal of Chemical Physics},
  volume = {139},
  number = {18},
  pages = {184118},
  doi = {10.1063/1.4828704},
  url = {https://pubs.aip.org/aip/jcp/article/317391}
}

About

Developer version of EosNet for fixing bugs & adding features

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%