Showcase of RI-CCSD with rust

This program evaluates restricted RI-CCSD energy. By pure Rust code.

To non-chemists: RI-CCSD can be seen as a group of dense 2-4 dimension tensor numerical computations. Most of tasks in RI-CCSD can be converted to matrix multiplication, so it is mostly compute-bounded.

RI-CCSD is 4-dimension problem in it's nature; coding with libraries that focus on 2-dimension matrices is usually not convenient, if not impossible.

Refer incomplete article list (Q-Chem, Psi4 fnocc, FHI-Aims, Gamess US, to name a few) to interested readers.

Efficiency demonstration

Computation device information:

personal computer
AMD Ryzen 7945HX, 16 physical cores
only one NUMA node
2 x 32 GB memory (5600 MT/s)

System information:

(H2O)₁₀ cluster, PP5 structure from 10.1021/jp104865w.
basis: cc-pVDZ
auxiliary basis: cc-pVDZ-ri (both for SCF and CCSD, which is not recommended for real-world evaluation, but this project is only efficiency benchmark)
$n_\mathrm{occ} = 40$ (frozen core), $n_\mathrm{vir} = 190$, $n_\mathrm{aux} = 820$.

	this showcase	Psi4	PySCF
corr eng (a.u.)	-2.1735512	-2.1735494	-2.1735499
time each iter (sec)	~ 18.5	~ 19.0	~ 29.5
version	-	1.9.1 (conda-forge)	2.7.0 (pypi)
math library	OpenBLAS (compiled)	Intel OneAPI (conda-forge)	OpenBLAS (pypi)
math library threading	pthread	TBB	serial
algorithms	DF	DF (fnocc)	Conv

DF refers to density fitting integral and algorithms, Conv refers to conventional integral.

Some important notes:

Psi4 uses Intel OneAPI (MKL), which is not very efficienct on AMD CPUs, so this is actually not fair comparasion. Estimated 20% efficiency boost if using OpenBLAS.
Psi4 have multiple CC engines. FNOCC is more efficient, while OCC have more functionalities.
By comparing to conventional integral algorithms, density fitting (RI-CCSD) actually increases FLOPs, but only decreases memory footprints for large species (if I get both RI-CCSD and Conv-CCSD algorithms correctly).

Details of Efficiency

Time of each iter: ~ 18.5 sec
$O(n_\mathrm{occ}^3 n_\mathrm{vir}^3)$ term: ~ 6.0 sec
- FLOPs estimation: slightly larger than $2 \times 4 \times n_\mathrm{occ}^3 n_\mathrm{vir}^3 = 3.27 \ \mathrm{T}$
- ~ 540 GFLOP/sec, 48% CPU maximum L1 bandwidth
$O(n_\mathrm{occ}^2 n_\mathrm{vir}^4)$ term (pp-Ladder): ~ 8.8 sec
- FLOPs estimation: slightly larger than $2 \times (n_\mathrm{vir}^4 n_\mathrm{aux} + 0.5 \times n_\mathrm{occ}^2 n_\mathrm{vir}^4) = 3.98 \ \mathrm{T}$
- ~ 450 GFLOP/sec, 40% CPU maximum L1 bandwidth

We expect 50% efficiency usage is achievable, but that requires more fine-tuned code.

This project has not optimized for lowering memory footprints. This code accepts dupilcating some $O(n_\mathrm{occ}^2 n_\mathrm{vir}^2)$ and $O(n_\mathrm{vir}^2 n_\mathrm{aux})$ tensors. This project also does not use advanced iteration drivers (DIIS), so more iterations than usual is expected.

To reproduce

This project is only for efficiency and code style demonstration. Usability is not the first concern.

Binary file is available for this showcase. Refer to release page.

To use this binary, some preparation is required:

export RAYON_NUM_THREADS=16     # number of parallel
export RUST_MIN_STACK=16777216  # could be larger if stack overflow
export LD_LIBRARY_PATH=<your pthread openblas directory>:$LD_LIBRARY_PATH

./showcase_rust_riccsd_glibc_2.17 <directory of your npy files>

This project requires the user (more details in env file or vscode setting)

Provide libopenblas.so (pthread scheme) in $LD_LIBRARY_PATH. Due to how rust's FFI works, OpenMP compiled OpenBLAS does not work;
Provide *.npy files, in pyscf convention (see python_scripts for details):
- mo_coeff.npy (in c-contiguous, shape (nao, nmo))
- mo_energy.npy
- mo_coeff.npy
- cderi.npy (in lower-triangular packed AO basis)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
outputs		outputs
python_scripts		python_scripts
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
build.rs		build.rs
env.sh		env.sh
readme.md		readme.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Showcase of RI-CCSD with rust

Contents

Efficiency demonstration

Details of Efficiency

To reproduce

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

ajz34/showcase_rust_riccsd

Folders and files

Latest commit

History

Repository files navigation

Showcase of RI-CCSD with rust

Contents

Efficiency demonstration

Details of Efficiency

To reproduce

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages