Skip to content

Comments

add two center integral of overlap computation#311

Merged
floatingCatty merged 2 commits intodeepmodeling:mainfrom
floatingCatty:two-center-integrals
Feb 9, 2026
Merged

add two center integral of overlap computation#311
floatingCatty merged 2 commits intodeepmodeling:mainfrom
floatingCatty:two-center-integrals

Conversation

@floatingCatty
Copy link
Member

@floatingCatty floatingCatty commented Feb 5, 2026

Add a two-centre integral overlap computation and test for verification.

Summary by CodeRabbit

  • New Features

    • Added orbital overlap computation capability for atomic data processing workflows.
  • Tests

    • Added end-to-end test validating overlap computation using silicon orbital data.
  • Data

    • Included silicon orbital dataset to support overlap calculations and testing.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'add two center integral of overlap computation' directly describes the main change: a new module and function (compute_overlap) that implements two-center integral overlap computation as shown in the files added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@dptb/postprocess/ovp2c.py`:
- Around line 24-31: The code accesses data[_keys.EDGE_CELL_SHIFT_KEY] without
guarding against its absence; update the postprocessing block (where edge_index,
atom_types, edge_vectors and cell_shifts are created) to check for
_keys.EDGE_CELL_SHIFT_KEY in data and if missing create a default zero-tensor of
the correct shape (matching edge_index.shape or edge_vectors.shape[0]) and
dtype/device before calling .cpu(); ensure this fallback is applied in the same
place as with_edge_vectors(data) is used so downstream consumers expect a
cell_shifts tensor.
- Around line 27-52: The overlap tensors are created on CPU (because inputs are
.cpu()) so ovp_dict and what block_to_feature builds end up on CPU even when the
input data is CUDA; capture the original device (e.g., device =
data[_keys.EDGE_INDEX_KEY].device or data.device) before the .cpu() calls and
restore that device for generated overlap tensors (move S or the final tensors
in ovp_dict back with .to(device) or create them directly on device) so that
ovp_dict entries (EDGE_OVERLAP_KEY / NODE_OVERLAP_KEY created via
block_to_feature) match the device of data; update the code around
edge_index/atom_types/edge_vectors/cell_shifts, op.overlap calls and the
ovp_dict inserts to ensure all created tensors are on the original device.
- Around line 1-20: The module currently imports op2c at top-level which breaks
environments without that optional dependency; remove the top-level "import
op2c" and instead perform a lazy import inside compute_overlap: wrap "import
op2c" in a try/except ImportError within compute_overlap, and on ImportError
raise a clear RuntimeError (or similar) explaining that op2c is optional and
must be installed to call compute_overlap; keep the rest of compute_overlap
(Op2C instantiation) unchanged so callers get the explicit error only when they
invoke compute_overlap.

Comment on lines 1 to 20
import op2c
import torch
from typing import Union

from dptb.data.transforms import OrbitalMapper
from dptb.data import AtomicDataDict
from dptb.data import _keys
from dptb.data.interfaces import block_to_feature
from dptb.data.AtomicDataDict import with_edge_vectors

def compute_overlap(data: AtomicDataDict, idp: OrbitalMapper, orb_dir, orb_names):
ntype = idp.num_types
op = op2c.Op2C(
ntype=ntype,
nspin=1, # for current usage
lspinorb=False,
orb_dir=orb_dir,
orb_name=orb_names,
log_file="op2c.log"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check imports and usages of ovp2c and compute_overlap
rg -n "ovp2c" -C 2
rg -n "compute_overlap" -C 2

Repository: deepmodeling/DeePTB

Length of output: 3079


🏁 Script executed:

#!/bin/bash
# Check for pytest.importorskip patterns related to op2c
rg -n "importorskip.*op2c" -C 2
rg -n "pytest.importorskip" -C 2

Repository: deepmodeling/DeePTB

Length of output: 528


🏁 Script executed:

#!/bin/bash
# Check dependency declarations for op2c (optional vs required)
find . -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" | head -5 | xargs grep -l "op2c" 2>/dev/null

Repository: deepmodeling/DeePTB

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Check if op2c is imported elsewhere and how it's handled
rg -n "import op2c" -C 2
rg -n "from op2c" -C 2

Repository: deepmodeling/DeePTB

Length of output: 196


Move the op2c import inside compute_overlap to keep the module usable without the optional dependency.

Line 1 imports op2c at module scope, which breaks in environments where it isn't installed, even if compute_overlap is never called. The test file uses pytest.importorskip("op2c"), confirming that op2c is optional. Move the import inside compute_overlap and raise a clear error if the dependency is missing.

🔧 Suggested change (lazy import with a clear error)
-import op2c
 import torch
 from typing import Union
@@
 def compute_overlap(data: AtomicDataDict, idp: OrbitalMapper, orb_dir, orb_names):
+    try:
+        import op2c
+    except ModuleNotFoundError as e:
+        raise ImportError(
+            "op2c is required for compute_overlap. Please install it to use this feature."
+        ) from e
🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 1 - 20, The module currently imports
op2c at top-level which breaks environments without that optional dependency;
remove the top-level "import op2c" and instead perform a lazy import inside
compute_overlap: wrap "import op2c" in a try/except ImportError within
compute_overlap, and on ImportError raise a clear RuntimeError (or similar)
explaining that op2c is optional and must be installed to call compute_overlap;
keep the rest of compute_overlap (Op2C instantiation) unchanged so callers get
the explicit error only when they invoke compute_overlap.

Comment on lines 24 to 31
if _keys.EDGE_VECTORS_KEY not in data:
data = with_edge_vectors(data)

edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
atom_types = data[_keys.ATOM_TYPE_KEY].cpu()
edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()
cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against missing EDGE_CELL_SHIFT_KEY to avoid KeyError.

At Line 30, EDGE_CELL_SHIFT_KEY is accessed unconditionally. Non-periodic datasets may omit it, which will crash. Consider defaulting to zeros when the key is absent.

🛡️ Suggested fallback
-    cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
+    if _keys.EDGE_CELL_SHIFT_KEY in data:
+        cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
+    else:
+        cell_shifts = torch.zeros((edge_index.shape[1], 3), dtype=edge_vectors.dtype)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if _keys.EDGE_VECTORS_KEY not in data:
data = with_edge_vectors(data)
edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
atom_types = data[_keys.ATOM_TYPE_KEY].cpu()
edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()
cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
if _keys.EDGE_VECTORS_KEY not in data:
data = with_edge_vectors(data)
edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
atom_types = data[_keys.ATOM_TYPE_KEY].cpu()
edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()
if _keys.EDGE_CELL_SHIFT_KEY in data:
cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
else:
cell_shifts = torch.zeros((edge_index.shape[1], 3), dtype=edge_vectors.dtype)
🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 24 - 31, The code accesses
data[_keys.EDGE_CELL_SHIFT_KEY] without guarding against its absence; update the
postprocessing block (where edge_index, atom_types, edge_vectors and cell_shifts
are created) to check for _keys.EDGE_CELL_SHIFT_KEY in data and if missing
create a default zero-tensor of the correct shape (matching edge_index.shape or
edge_vectors.shape[0]) and dtype/device before calling .cpu(); ensure this
fallback is applied in the same place as with_edge_vectors(data) is used so
downstream consumers expect a cell_shifts tensor.

Comment on lines +27 to +52
edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
atom_types = data[_keys.ATOM_TYPE_KEY].cpu()
edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu()
cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()

for k in range(edge_index.shape[1]):
i, j = edge_index[:, k].tolist()
itype, jtype = atom_types[i].item(), atom_types[j].item()

Rij = edge_vectors[k] * 1.8897259886 # angstrom to bohr
Rij = Rij.tolist()
Rvec = cell_shifts[k].int().tolist()

inorb = idp.atom_norb[itype]
jnorb = idp.atom_norb[jtype]

S = op.overlap(itype, jtype, Rij, is_transpose=False)
ovp_dict["{}_{}_{}_{}_{}".format(i, j, *Rvec)] = S.reshape(inorb, jnorb)

for k in range(atom_types.shape[0]):
itype = atom_types[k].item()
S = op.overlap(itype, itype, [0, 0, 0], is_transpose=False)
inorb = idp.atom_norb[itype]
ovp_dict["{}_{}_{}_{}_{}".format(k, k, 0, 0, 0)] = S.reshape(inorb, inorb)

block_to_feature(data, idp, False, ovp_dict, False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid CPU/GPU device mismatches for generated overlap tensors.

At Line 27 and Line 52, inputs are copied to CPU and block_to_feature creates CPU tensors, so EDGE_OVERLAP_KEY (and possibly NODE_OVERLAP_KEY) will be on CPU even if data is on CUDA. This can break downstream ops that expect a single device. Consider restoring the original device (or explicitly erroring if CUDA is used).

✅ Suggested fix (restore tensors to the original device)
-    edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
+    edge_index = data[_keys.EDGE_INDEX_KEY]
+    device = edge_index.device
+    edge_index = edge_index.cpu()
@@
-    block_to_feature(data, idp, False, ovp_dict, False)
+    block_to_feature(data, idp, False, ovp_dict, False)
+    if device.type != "cpu":
+        data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device)
+        if _keys.NODE_OVERLAP_KEY in data:
+            data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device)
🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 27 - 52, The overlap tensors are
created on CPU (because inputs are .cpu()) so ovp_dict and what block_to_feature
builds end up on CPU even when the input data is CUDA; capture the original
device (e.g., device = data[_keys.EDGE_INDEX_KEY].device or data.device) before
the .cpu() calls and restore that device for generated overlap tensors (move S
or the final tensors in ovp_dict back with .to(device) or create them directly
on device) so that ovp_dict entries (EDGE_OVERLAP_KEY / NODE_OVERLAP_KEY created
via block_to_feature) match the device of data; update the code around
edge_index/atom_types/edge_vectors/cell_shifts, op.overlap calls and the
ovp_dict inserts to ensure all created tensors are on the original device.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dptb/postprocess/ovp2c.py`:
- Around line 55-56: The calls data[_keys.NODE_OVERLAP_KEY].to(device) and
data[_keys.EDGE_OVERLAP_KEY].to(device) are no-ops because Tensor.to() returns a
new tensor; update the code so the moved tensors are stored back (e.g., assign
the result to data[_keys.NODE_OVERLAP_KEY] and data[_keys.EDGE_OVERLAP_KEY]) or
use the in-place variant (.to_(device)) to ensure the overlap tensors are
actually moved to device; make the change where these symbols are used in
ovp2c.py.
🧹 Nitpick comments (3)
dptb/postprocess/ovp2c.py (3)

34-46: Python-level per-edge loop will be a bottleneck on large graphs.

Each edge invokes op.overlap individually. For production-scale systems with tens of thousands of edges this will be slow. Consider batching the overlap calls if op2c supports it, or at minimum document the scaling limitation.


38-38: Extract the Ångström-to-Bohr magic number into a named constant.

1.8897259886 is used without explanation. A module-level constant (e.g., ANGSTROM_TO_BOHR = 1.8897259886) improves readability and avoids silent duplication if used elsewhere.


48-52: Potential key collision with self-loop edges.

If edge_index contains self-loop entries (k, k) with cell_shift = [0, 0, 0], the edge loop (Line 46) writes the same key that the atom loop (Line 52) later overwrites. The values should be identical so this is benign, but it's worth being aware of if the data ever includes self-loops.

Comment on lines +55 to +56
data[_keys.NODE_OVERLAP_KEY].to(device)
data[_keys.EDGE_OVERLAP_KEY].to(device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

.to(device) is not in-place — results are discarded, tensors stay on CPU.

Tensor.to() returns a new tensor; it does not mutate in place. These two lines are no-ops, so the device-mismatch issue from the earlier review is still present.

Proposed fix
-    data[_keys.NODE_OVERLAP_KEY].to(device)
-    data[_keys.EDGE_OVERLAP_KEY].to(device)
+    data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device)
+    data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data[_keys.NODE_OVERLAP_KEY].to(device)
data[_keys.EDGE_OVERLAP_KEY].to(device)
data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device)
data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device)
🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 55 - 56, The calls
data[_keys.NODE_OVERLAP_KEY].to(device) and
data[_keys.EDGE_OVERLAP_KEY].to(device) are no-ops because Tensor.to() returns a
new tensor; update the code so the moved tensors are stored back (e.g., assign
the result to data[_keys.NODE_OVERLAP_KEY] and data[_keys.EDGE_OVERLAP_KEY]) or
use the in-place variant (.to_(device)) to ensure the overlap tensors are
actually moved to device; make the change where these symbols are used in
ovp2c.py.

@floatingCatty floatingCatty merged commit 6e1a544 into deepmodeling:main Feb 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant