add two center integral of overlap computation by floatingCatty · Pull Request #311 · deepmodeling/DeePTB

floatingCatty · 2026-02-05T00:12:49Z

Add a two-centre integral overlap computation and test for verification.

Summary by CodeRabbit

New Features
- Added orbital overlap computation capability for atomic data processing workflows.
Tests
- Added end-to-end test validating overlap computation using silicon orbital data.
Data
- Included silicon orbital dataset to support overlap calculations and testing.

coderabbitai · 2026-02-05T00:20:33Z

📝 Walkthrough

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add two center integral of overlap computation' directly describes the main change: a new module and function (compute_overlap) that implements two-center integral overlap computation as shown in the files added.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@dptb/postprocess/ovp2c.py`:
- Around line 24-31: The code accesses data[_keys.EDGE_CELL_SHIFT_KEY] without
guarding against its absence; update the postprocessing block (where edge_index,
atom_types, edge_vectors and cell_shifts are created) to check for
_keys.EDGE_CELL_SHIFT_KEY in data and if missing create a default zero-tensor of
the correct shape (matching edge_index.shape or edge_vectors.shape[0]) and
dtype/device before calling .cpu(); ensure this fallback is applied in the same
place as with_edge_vectors(data) is used so downstream consumers expect a
cell_shifts tensor.
- Around line 27-52: The overlap tensors are created on CPU (because inputs are
.cpu()) so ovp_dict and what block_to_feature builds end up on CPU even when the
input data is CUDA; capture the original device (e.g., device =
data[_keys.EDGE_INDEX_KEY].device or data.device) before the .cpu() calls and
restore that device for generated overlap tensors (move S or the final tensors
in ovp_dict back with .to(device) or create them directly on device) so that
ovp_dict entries (EDGE_OVERLAP_KEY / NODE_OVERLAP_KEY created via
block_to_feature) match the device of data; update the code around
edge_index/atom_types/edge_vectors/cell_shifts, op.overlap calls and the
ovp_dict inserts to ensure all created tensors are on the original device.
- Around line 1-20: The module currently imports op2c at top-level which breaks
environments without that optional dependency; remove the top-level "import
op2c" and instead perform a lazy import inside compute_overlap: wrap "import
op2c" in a try/except ImportError within compute_overlap, and on ImportError
raise a clear RuntimeError (or similar) explaining that op2c is optional and
must be installed to call compute_overlap; keep the rest of compute_overlap
(Op2C instantiation) unchanged so callers get the explicit error only when they
invoke compute_overlap.

coderabbitai · 2026-02-05T00:20:36Z

dptb/postprocess/ovp2c.py

+import op2c
+import torch
+from typing import Union
+
+from dptb.data.transforms import OrbitalMapper
+from dptb.data import AtomicDataDict    
+from dptb.data import _keys
+from dptb.data.interfaces import block_to_feature
+from dptb.data.AtomicDataDict import with_edge_vectors
+
+def compute_overlap(data: AtomicDataDict, idp: OrbitalMapper, orb_dir, orb_names):
+    ntype = idp.num_types
+    op = op2c.Op2C(
+        ntype=ntype, 
+        nspin=1, # for current usage
+        lspinorb=False,
+        orb_dir=orb_dir, 
+        orb_name=orb_names,
+        log_file="op2c.log"
+    )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check imports and usages of ovp2c and compute_overlap rg -n "ovp2c" -C 2 rg -n "compute_overlap" -C 2

Repository: deepmodeling/DeePTB

Length of output: 3079

🏁 Script executed:

#!/bin/bash # Check for pytest.importorskip patterns related to op2c rg -n "importorskip.*op2c" -C 2 rg -n "pytest.importorskip" -C 2

Repository: deepmodeling/DeePTB

Length of output: 528

🏁 Script executed:

#!/bin/bash # Check dependency declarations for op2c (optional vs required) find . -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" | head -5 | xargs grep -l "op2c" 2>/dev/null

Repository: deepmodeling/DeePTB

Length of output: 47

🏁 Script executed:

#!/bin/bash # Check if op2c is imported elsewhere and how it's handled rg -n "import op2c" -C 2 rg -n "from op2c" -C 2

Repository: deepmodeling/DeePTB

Length of output: 196

Move the op2c import inside compute_overlap to keep the module usable without the optional dependency.

Line 1 imports op2c at module scope, which breaks in environments where it isn't installed, even if compute_overlap is never called. The test file uses pytest.importorskip("op2c"), confirming that op2c is optional. Move the import inside compute_overlap and raise a clear error if the dependency is missing.

🔧 Suggested change (lazy import with a clear error)

-import op2c import torch from typing import Union @@ def compute_overlap(data: AtomicDataDict, idp: OrbitalMapper, orb_dir, orb_names): + try: + import op2c + except ModuleNotFoundError as e: + raise ImportError( + "op2c is required for compute_overlap. Please install it to use this feature." + ) from e

🤖 Prompt for AI Agents

In `@dptb/postprocess/ovp2c.py` around lines 1 - 20, The module currently imports op2c at top-level which breaks environments without that optional dependency; remove the top-level "import op2c" and instead perform a lazy import inside compute_overlap: wrap "import op2c" in a try/except ImportError within compute_overlap, and on ImportError raise a clear RuntimeError (or similar) explaining that op2c is optional and must be installed to call compute_overlap; keep the rest of compute_overlap (Op2C instantiation) unchanged so callers get the explicit error only when they invoke compute_overlap.

coderabbitai · 2026-02-05T00:20:36Z

dptb/postprocess/ovp2c.py

+    if _keys.EDGE_VECTORS_KEY not in data:
+        data = with_edge_vectors(data)
+
+    edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
+    atom_types = data[_keys.ATOM_TYPE_KEY].cpu()
+    edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()
+    cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
+


⚠️ Potential issue | 🟡 Minor

Guard against missing EDGE_CELL_SHIFT_KEY to avoid KeyError.

At Line 30, EDGE_CELL_SHIFT_KEY is accessed unconditionally. Non-periodic datasets may omit it, which will crash. Consider defaulting to zeros when the key is absent.

🛡️ Suggested fallback

- cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu() + if _keys.EDGE_CELL_SHIFT_KEY in data: + cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu() + else: + cell_shifts = torch.zeros((edge_index.shape[1], 3), dtype=edge_vectors.dtype)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if _keys.EDGE_VECTORS_KEY not in data:

data = with_edge_vectors(data)

edge_index = data[_keys.EDGE_INDEX_KEY].cpu()

atom_types = data[_keys.ATOM_TYPE_KEY].cpu()

edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()

cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()

if _keys.EDGE_VECTORS_KEY not in data:

data = with_edge_vectors(data)

edge_index = data[_keys.EDGE_INDEX_KEY].cpu()

atom_types = data[_keys.ATOM_TYPE_KEY].cpu()

edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()

if _keys.EDGE_CELL_SHIFT_KEY in data:

cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()

else:

cell_shifts = torch.zeros((edge_index.shape[1], 3), dtype=edge_vectors.dtype)

🤖 Prompt for AI Agents

In `@dptb/postprocess/ovp2c.py` around lines 24 - 31, The code accesses data[_keys.EDGE_CELL_SHIFT_KEY] without guarding against its absence; update the postprocessing block (where edge_index, atom_types, edge_vectors and cell_shifts are created) to check for _keys.EDGE_CELL_SHIFT_KEY in data and if missing create a default zero-tensor of the correct shape (matching edge_index.shape or edge_vectors.shape[0]) and dtype/device before calling .cpu(); ensure this fallback is applied in the same place as with_edge_vectors(data) is used so downstream consumers expect a cell_shifts tensor.

coderabbitai · 2026-02-05T00:20:36Z

dptb/postprocess/ovp2c.py

+    edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
+    atom_types = data[_keys.ATOM_TYPE_KEY].cpu()
+    edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()
+    cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
+
+    for k in range(edge_index.shape[1]):
+        i, j = edge_index[:, k].tolist()
+        itype, jtype = atom_types[i].item(), atom_types[j].item()
+
+        Rij = edge_vectors[k] * 1.8897259886 # angstrom to bohr
+        Rij = Rij.tolist()
+        Rvec = cell_shifts[k].int().tolist()
+
+        inorb = idp.atom_norb[itype]
+        jnorb = idp.atom_norb[jtype]
+
+        S = op.overlap(itype, jtype, Rij, is_transpose=False)
+        ovp_dict["{}_{}_{}_{}_{}".format(i, j, *Rvec)] = S.reshape(inorb, jnorb)
+
+    for k in range(atom_types.shape[0]):
+        itype = atom_types[k].item()
+        S = op.overlap(itype, itype, [0, 0, 0], is_transpose=False)
+        inorb = idp.atom_norb[itype]
+        ovp_dict["{}_{}_{}_{}_{}".format(k, k, 0, 0, 0)] = S.reshape(inorb, inorb)
+
+    block_to_feature(data, idp, False, ovp_dict, False)


⚠️ Potential issue | 🟠 Major

Avoid CPU/GPU device mismatches for generated overlap tensors.

At Line 27 and Line 52, inputs are copied to CPU and block_to_feature creates CPU tensors, so EDGE_OVERLAP_KEY (and possibly NODE_OVERLAP_KEY) will be on CPU even if data is on CUDA. This can break downstream ops that expect a single device. Consider restoring the original device (or explicitly erroring if CUDA is used).

✅ Suggested fix (restore tensors to the original device)

- edge_index = data[_keys.EDGE_INDEX_KEY].cpu() + edge_index = data[_keys.EDGE_INDEX_KEY] + device = edge_index.device + edge_index = edge_index.cpu() @@ - block_to_feature(data, idp, False, ovp_dict, False) + block_to_feature(data, idp, False, ovp_dict, False) + if device.type != "cpu": + data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device) + if _keys.NODE_OVERLAP_KEY in data: + data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device)

🤖 Prompt for AI Agents

In `@dptb/postprocess/ovp2c.py` around lines 27 - 52, The overlap tensors are created on CPU (because inputs are .cpu()) so ovp_dict and what block_to_feature builds end up on CPU even when the input data is CUDA; capture the original device (e.g., device = data[_keys.EDGE_INDEX_KEY].device or data.device) before the .cpu() calls and restore that device for generated overlap tensors (move S or the final tensors in ovp_dict back with .to(device) or create them directly on device) so that ovp_dict entries (EDGE_OVERLAP_KEY / NODE_OVERLAP_KEY created via block_to_feature) match the device of data; update the code around edge_index/atom_types/edge_vectors/cell_shifts, op.overlap calls and the ovp_dict inserts to ensure all created tensors are on the original device.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@dptb/postprocess/ovp2c.py`:
- Around line 55-56: The calls data[_keys.NODE_OVERLAP_KEY].to(device) and
data[_keys.EDGE_OVERLAP_KEY].to(device) are no-ops because Tensor.to() returns a
new tensor; update the code so the moved tensors are stored back (e.g., assign
the result to data[_keys.NODE_OVERLAP_KEY] and data[_keys.EDGE_OVERLAP_KEY]) or
use the in-place variant (.to_(device)) to ensure the overlap tensors are
actually moved to device; make the change where these symbols are used in
ovp2c.py.

🧹 Nitpick comments (3)

dptb/postprocess/ovp2c.py (3)

34-46: Python-level per-edge loop will be a bottleneck on large graphs.

Each edge invokes op.overlap individually. For production-scale systems with tens of thousands of edges this will be slow. Consider batching the overlap calls if op2c supports it, or at minimum document the scaling limitation.

38-38: Extract the Ångström-to-Bohr magic number into a named constant.

1.8897259886 is used without explanation. A module-level constant (e.g., ANGSTROM_TO_BOHR = 1.8897259886) improves readability and avoids silent duplication if used elsewhere.

48-52: Potential key collision with self-loop edges.

If edge_index contains self-loop entries (k, k) with cell_shift = [0, 0, 0], the edge loop (Line 46) writes the same key that the atom loop (Line 52) later overwrites. The values should be identical so this is benign, but it's worth being aware of if the data ever includes self-loops.

coderabbitai · 2026-02-09T22:12:12Z

dptb/postprocess/ovp2c.py

+    data[_keys.NODE_OVERLAP_KEY].to(device)
+    data[_keys.EDGE_OVERLAP_KEY].to(device)


⚠️ Potential issue | 🔴 Critical

.to(device) is not in-place — results are discarded, tensors stay on CPU.

Tensor.to() returns a new tensor; it does not mutate in place. These two lines are no-ops, so the device-mismatch issue from the earlier review is still present.

Proposed fix

- data[_keys.NODE_OVERLAP_KEY].to(device) - data[_keys.EDGE_OVERLAP_KEY].to(device) + data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device) + data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

data[_keys.NODE_OVERLAP_KEY].to(device)

data[_keys.EDGE_OVERLAP_KEY].to(device)

data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device)

data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device)

🤖 Prompt for AI Agents

In `@dptb/postprocess/ovp2c.py` around lines 55 - 56, The calls data[_keys.NODE_OVERLAP_KEY].to(device) and data[_keys.EDGE_OVERLAP_KEY].to(device) are no-ops because Tensor.to() returns a new tensor; update the code so the moved tensors are stored back (e.g., assign the result to data[_keys.NODE_OVERLAP_KEY] and data[_keys.EDGE_OVERLAP_KEY]) or use the in-place variant (.to_(device)) to ensure the overlap tensors are actually moved to device; make the change where these symbols are used in ovp2c.py.

add two center integral of overlap computation

44ef3da

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

robustness fix

c3a5220

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

floatingCatty merged commit 6e1a544 into deepmodeling:main Feb 9, 2026
4 checks passed

coderabbitai bot mentioned this pull request Feb 13, 2026

Sparse Hr to HR forming #312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

add two center integral of overlap computation#311

add two center integral of overlap computation#311
floatingCatty merged 2 commits intodeepmodeling:mainfrom
floatingCatty:two-center-integrals

floatingCatty commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot Feb 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		data[_keys.NODE_OVERLAP_KEY].to(device)
		data[_keys.EDGE_OVERLAP_KEY].to(device)

Comments

Conversation

floatingCatty commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

floatingCatty commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading