add two center integral of overlap computation#311
add two center integral of overlap computation#311floatingCatty merged 2 commits intodeepmodeling:mainfrom
Conversation
📝 Walkthrough🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@dptb/postprocess/ovp2c.py`:
- Around line 24-31: The code accesses data[_keys.EDGE_CELL_SHIFT_KEY] without
guarding against its absence; update the postprocessing block (where edge_index,
atom_types, edge_vectors and cell_shifts are created) to check for
_keys.EDGE_CELL_SHIFT_KEY in data and if missing create a default zero-tensor of
the correct shape (matching edge_index.shape or edge_vectors.shape[0]) and
dtype/device before calling .cpu(); ensure this fallback is applied in the same
place as with_edge_vectors(data) is used so downstream consumers expect a
cell_shifts tensor.
- Around line 27-52: The overlap tensors are created on CPU (because inputs are
.cpu()) so ovp_dict and what block_to_feature builds end up on CPU even when the
input data is CUDA; capture the original device (e.g., device =
data[_keys.EDGE_INDEX_KEY].device or data.device) before the .cpu() calls and
restore that device for generated overlap tensors (move S or the final tensors
in ovp_dict back with .to(device) or create them directly on device) so that
ovp_dict entries (EDGE_OVERLAP_KEY / NODE_OVERLAP_KEY created via
block_to_feature) match the device of data; update the code around
edge_index/atom_types/edge_vectors/cell_shifts, op.overlap calls and the
ovp_dict inserts to ensure all created tensors are on the original device.
- Around line 1-20: The module currently imports op2c at top-level which breaks
environments without that optional dependency; remove the top-level "import
op2c" and instead perform a lazy import inside compute_overlap: wrap "import
op2c" in a try/except ImportError within compute_overlap, and on ImportError
raise a clear RuntimeError (or similar) explaining that op2c is optional and
must be installed to call compute_overlap; keep the rest of compute_overlap
(Op2C instantiation) unchanged so callers get the explicit error only when they
invoke compute_overlap.
| import op2c | ||
| import torch | ||
| from typing import Union | ||
|
|
||
| from dptb.data.transforms import OrbitalMapper | ||
| from dptb.data import AtomicDataDict | ||
| from dptb.data import _keys | ||
| from dptb.data.interfaces import block_to_feature | ||
| from dptb.data.AtomicDataDict import with_edge_vectors | ||
|
|
||
| def compute_overlap(data: AtomicDataDict, idp: OrbitalMapper, orb_dir, orb_names): | ||
| ntype = idp.num_types | ||
| op = op2c.Op2C( | ||
| ntype=ntype, | ||
| nspin=1, # for current usage | ||
| lspinorb=False, | ||
| orb_dir=orb_dir, | ||
| orb_name=orb_names, | ||
| log_file="op2c.log" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check imports and usages of ovp2c and compute_overlap
rg -n "ovp2c" -C 2
rg -n "compute_overlap" -C 2Repository: deepmodeling/DeePTB
Length of output: 3079
🏁 Script executed:
#!/bin/bash
# Check for pytest.importorskip patterns related to op2c
rg -n "importorskip.*op2c" -C 2
rg -n "pytest.importorskip" -C 2Repository: deepmodeling/DeePTB
Length of output: 528
🏁 Script executed:
#!/bin/bash
# Check dependency declarations for op2c (optional vs required)
find . -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" | head -5 | xargs grep -l "op2c" 2>/dev/nullRepository: deepmodeling/DeePTB
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check if op2c is imported elsewhere and how it's handled
rg -n "import op2c" -C 2
rg -n "from op2c" -C 2Repository: deepmodeling/DeePTB
Length of output: 196
Move the op2c import inside compute_overlap to keep the module usable without the optional dependency.
Line 1 imports op2c at module scope, which breaks in environments where it isn't installed, even if compute_overlap is never called. The test file uses pytest.importorskip("op2c"), confirming that op2c is optional. Move the import inside compute_overlap and raise a clear error if the dependency is missing.
🔧 Suggested change (lazy import with a clear error)
-import op2c
import torch
from typing import Union
@@
def compute_overlap(data: AtomicDataDict, idp: OrbitalMapper, orb_dir, orb_names):
+ try:
+ import op2c
+ except ModuleNotFoundError as e:
+ raise ImportError(
+ "op2c is required for compute_overlap. Please install it to use this feature."
+ ) from e🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 1 - 20, The module currently imports
op2c at top-level which breaks environments without that optional dependency;
remove the top-level "import op2c" and instead perform a lazy import inside
compute_overlap: wrap "import op2c" in a try/except ImportError within
compute_overlap, and on ImportError raise a clear RuntimeError (or similar)
explaining that op2c is optional and must be installed to call compute_overlap;
keep the rest of compute_overlap (Op2C instantiation) unchanged so callers get
the explicit error only when they invoke compute_overlap.
| if _keys.EDGE_VECTORS_KEY not in data: | ||
| data = with_edge_vectors(data) | ||
|
|
||
| edge_index = data[_keys.EDGE_INDEX_KEY].cpu() | ||
| atom_types = data[_keys.ATOM_TYPE_KEY].cpu() | ||
| edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu() | ||
| cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu() | ||
|
|
There was a problem hiding this comment.
Guard against missing EDGE_CELL_SHIFT_KEY to avoid KeyError.
At Line 30, EDGE_CELL_SHIFT_KEY is accessed unconditionally. Non-periodic datasets may omit it, which will crash. Consider defaulting to zeros when the key is absent.
🛡️ Suggested fallback
- cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
+ if _keys.EDGE_CELL_SHIFT_KEY in data:
+ cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu()
+ else:
+ cell_shifts = torch.zeros((edge_index.shape[1], 3), dtype=edge_vectors.dtype)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if _keys.EDGE_VECTORS_KEY not in data: | |
| data = with_edge_vectors(data) | |
| edge_index = data[_keys.EDGE_INDEX_KEY].cpu() | |
| atom_types = data[_keys.ATOM_TYPE_KEY].cpu() | |
| edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu() | |
| cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu() | |
| if _keys.EDGE_VECTORS_KEY not in data: | |
| data = with_edge_vectors(data) | |
| edge_index = data[_keys.EDGE_INDEX_KEY].cpu() | |
| atom_types = data[_keys.ATOM_TYPE_KEY].cpu() | |
| edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu() | |
| if _keys.EDGE_CELL_SHIFT_KEY in data: | |
| cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu() | |
| else: | |
| cell_shifts = torch.zeros((edge_index.shape[1], 3), dtype=edge_vectors.dtype) |
🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 24 - 31, The code accesses
data[_keys.EDGE_CELL_SHIFT_KEY] without guarding against its absence; update the
postprocessing block (where edge_index, atom_types, edge_vectors and cell_shifts
are created) to check for _keys.EDGE_CELL_SHIFT_KEY in data and if missing
create a default zero-tensor of the correct shape (matching edge_index.shape or
edge_vectors.shape[0]) and dtype/device before calling .cpu(); ensure this
fallback is applied in the same place as with_edge_vectors(data) is used so
downstream consumers expect a cell_shifts tensor.
| edge_index = data[_keys.EDGE_INDEX_KEY].cpu() | ||
| atom_types = data[_keys.ATOM_TYPE_KEY].cpu() | ||
| edge_vectors = data[_keys.EDGE_VECTORS_KEY].cpu() | ||
| cell_shifts = data[_keys.EDGE_CELL_SHIFT_KEY].cpu() | ||
|
|
||
| for k in range(edge_index.shape[1]): | ||
| i, j = edge_index[:, k].tolist() | ||
| itype, jtype = atom_types[i].item(), atom_types[j].item() | ||
|
|
||
| Rij = edge_vectors[k] * 1.8897259886 # angstrom to bohr | ||
| Rij = Rij.tolist() | ||
| Rvec = cell_shifts[k].int().tolist() | ||
|
|
||
| inorb = idp.atom_norb[itype] | ||
| jnorb = idp.atom_norb[jtype] | ||
|
|
||
| S = op.overlap(itype, jtype, Rij, is_transpose=False) | ||
| ovp_dict["{}_{}_{}_{}_{}".format(i, j, *Rvec)] = S.reshape(inorb, jnorb) | ||
|
|
||
| for k in range(atom_types.shape[0]): | ||
| itype = atom_types[k].item() | ||
| S = op.overlap(itype, itype, [0, 0, 0], is_transpose=False) | ||
| inorb = idp.atom_norb[itype] | ||
| ovp_dict["{}_{}_{}_{}_{}".format(k, k, 0, 0, 0)] = S.reshape(inorb, inorb) | ||
|
|
||
| block_to_feature(data, idp, False, ovp_dict, False) |
There was a problem hiding this comment.
Avoid CPU/GPU device mismatches for generated overlap tensors.
At Line 27 and Line 52, inputs are copied to CPU and block_to_feature creates CPU tensors, so EDGE_OVERLAP_KEY (and possibly NODE_OVERLAP_KEY) will be on CPU even if data is on CUDA. This can break downstream ops that expect a single device. Consider restoring the original device (or explicitly erroring if CUDA is used).
✅ Suggested fix (restore tensors to the original device)
- edge_index = data[_keys.EDGE_INDEX_KEY].cpu()
+ edge_index = data[_keys.EDGE_INDEX_KEY]
+ device = edge_index.device
+ edge_index = edge_index.cpu()
@@
- block_to_feature(data, idp, False, ovp_dict, False)
+ block_to_feature(data, idp, False, ovp_dict, False)
+ if device.type != "cpu":
+ data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device)
+ if _keys.NODE_OVERLAP_KEY in data:
+ data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device)🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 27 - 52, The overlap tensors are
created on CPU (because inputs are .cpu()) so ovp_dict and what block_to_feature
builds end up on CPU even when the input data is CUDA; capture the original
device (e.g., device = data[_keys.EDGE_INDEX_KEY].device or data.device) before
the .cpu() calls and restore that device for generated overlap tensors (move S
or the final tensors in ovp_dict back with .to(device) or create them directly
on device) so that ovp_dict entries (EDGE_OVERLAP_KEY / NODE_OVERLAP_KEY created
via block_to_feature) match the device of data; update the code around
edge_index/atom_types/edge_vectors/cell_shifts, op.overlap calls and the
ovp_dict inserts to ensure all created tensors are on the original device.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dptb/postprocess/ovp2c.py`:
- Around line 55-56: The calls data[_keys.NODE_OVERLAP_KEY].to(device) and
data[_keys.EDGE_OVERLAP_KEY].to(device) are no-ops because Tensor.to() returns a
new tensor; update the code so the moved tensors are stored back (e.g., assign
the result to data[_keys.NODE_OVERLAP_KEY] and data[_keys.EDGE_OVERLAP_KEY]) or
use the in-place variant (.to_(device)) to ensure the overlap tensors are
actually moved to device; make the change where these symbols are used in
ovp2c.py.
🧹 Nitpick comments (3)
dptb/postprocess/ovp2c.py (3)
34-46: Python-level per-edge loop will be a bottleneck on large graphs.Each edge invokes
op.overlapindividually. For production-scale systems with tens of thousands of edges this will be slow. Consider batching the overlap calls ifop2csupports it, or at minimum document the scaling limitation.
38-38: Extract the Ångström-to-Bohr magic number into a named constant.
1.8897259886is used without explanation. A module-level constant (e.g.,ANGSTROM_TO_BOHR = 1.8897259886) improves readability and avoids silent duplication if used elsewhere.
48-52: Potential key collision with self-loop edges.If
edge_indexcontains self-loop entries(k, k)withcell_shift = [0, 0, 0], the edge loop (Line 46) writes the same key that the atom loop (Line 52) later overwrites. The values should be identical so this is benign, but it's worth being aware of if the data ever includes self-loops.
| data[_keys.NODE_OVERLAP_KEY].to(device) | ||
| data[_keys.EDGE_OVERLAP_KEY].to(device) |
There was a problem hiding this comment.
.to(device) is not in-place — results are discarded, tensors stay on CPU.
Tensor.to() returns a new tensor; it does not mutate in place. These two lines are no-ops, so the device-mismatch issue from the earlier review is still present.
Proposed fix
- data[_keys.NODE_OVERLAP_KEY].to(device)
- data[_keys.EDGE_OVERLAP_KEY].to(device)
+ data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device)
+ data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| data[_keys.NODE_OVERLAP_KEY].to(device) | |
| data[_keys.EDGE_OVERLAP_KEY].to(device) | |
| data[_keys.NODE_OVERLAP_KEY] = data[_keys.NODE_OVERLAP_KEY].to(device) | |
| data[_keys.EDGE_OVERLAP_KEY] = data[_keys.EDGE_OVERLAP_KEY].to(device) |
🤖 Prompt for AI Agents
In `@dptb/postprocess/ovp2c.py` around lines 55 - 56, The calls
data[_keys.NODE_OVERLAP_KEY].to(device) and
data[_keys.EDGE_OVERLAP_KEY].to(device) are no-ops because Tensor.to() returns a
new tensor; update the code so the moved tensors are stored back (e.g., assign
the result to data[_keys.NODE_OVERLAP_KEY] and data[_keys.EDGE_OVERLAP_KEY]) or
use the in-place variant (.to_(device)) to ensure the overlap tensors are
actually moved to device; make the change where these symbols are used in
ovp2c.py.
Add a two-centre integral overlap computation and test for verification.
Summary by CodeRabbit
New Features
Tests
Data