Sparse Hr to HR forming by floatingCatty · Pull Request #312 · deepmodeling/DeePTB

floatingCatty · 2026-02-13T21:34:43Z

Add the support for constructing the image container type HR: a dict of matrices of R -> VBCSR.

The image container have a intrinsic sample_K function that support transform R -> K Hamiltonian with two types of position conventions.

This allows for:

direct sparse Hamiltonian construction, reduce memory.
Very efficient HR to HK maps.
parallel HR/HK forming from model output (not yet implemented)

Summary by CodeRabbit

New Features
- Export block-structured Hamiltonians (bond-wise and on-site) with optional spin–orbit into image-container format.
- Compute overlap as an image (per-edge and per-atom blocks).
- Parse orbital definition files during training and embed parsed content into config.
- New MKL-backed eigen-solvers and wrappers (FEAST, PARDISO) and dtype conversion utility.
- New API to retrieve H/R image containers from the system.
Tests
- Added unit and integration tests covering orbital parsing, training preprocessing, Hamiltonian/overlap pipeline, and new solvers.

coderabbitai · 2026-02-13T21:41:15Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Hr2HR to convert distributed orbital features into a vbcsr ImageContainer (onsite + bondwise, optional SOC); introduces compute_overlap_image; integrates orbital-file parsing into train; adds dtype utility; MKL-backed FEAST/PARDISO wrappers and eigen-solver classes; and multiple tests exercising these paths.

Changes

Cohort / File(s)	Summary
Block Hamiltonian Construction `dptb/nn/hr2hR.py`, `dptb/tests/test_hr2hr.py`	New `Hr2HR` class that builds bond-wise block Hamiltonian (onsite + hoppings), supports optional SOC and dtype/device handling, and exports a `vbcsr` `ImageContainer`. End-to-end test added.
Overlap Processing (image-based) `dptb/postprocess/ovp2c.py`	Guarded `vbcsr`/`op2c` imports; added `compute_overlap_image` to assemble per-edge and diagonal overlap blocks into a `vbcsr` `ImageContainer`; existing `compute_overlap` updated for safer CPU data extraction.
Unified postprocess / TBSystem API `dptb/postprocess/unified/calculator.py`, `dptb/postprocess/unified/system.py`	Added `get_hR` to `HamiltonianCalculator`/`DeePTBAdapter` and `TBSystem.get_hR()` to return (H_container, S_container) using `Hr2HR`.
Orbital file parsing & train integration `dptb/utils/orbital_parser.py`, `dptb/entrypoints/train.py`, `dptb/tests/test_orbital_parser.py`, `dptb/tests/test_train_orb.py`	New `parse_orbital_file()` to produce basis strings from .orb files; `train` now parses orbital files, enforces `e3tb` method when provided, and stores raw orbital file content in config; unit/integration tests added.
Dtype utility `dptb/utils/tools.py`	Added `tdype2ndtype(dtype)` to map PyTorch dtype (or name) to corresponding NumPy dtype for supported float/complex types.
Eigen solvers & MKL wrappers `dptb/nn/energy.py`, `dptb/utils/feast_wrapper.py`, `dptb/utils/pardiso_wrapper.py`	Added `PardisoEig` and `FEASTEig` solver classes and MKL interfacing: new `FeastSolver` and `PyPardisoSolver` ctypes wrappers, guarded imports, and solver management.
Tests — eigen & integration `dptb/tests/test_energy_feast.py`, `dptb/tests/test_feast_wrapper.py`, `dptb/tests/test_pardisoeig.py`, `dptb/tests/test_postprocess_hr.py`	New test suites for FEAST/PARDISO wrappers, PardisoEig integration with Hr2HR, and postprocess HR flows; several tests gated on MKL runtime availability.
NN env passthrough `dptb/nn/deeptb.py`	Minor change: NNENV accepts optional `orbital_files_content` via kwargs and stores it on the instance.

Sequence Diagram(s)

sequenceDiagram
    participant Data as AtomicDataDict
    participant HR as Hr2HR
    participant OrbMap as OrbitalMapper
    participant VB as vbcsr.AtomicData_vbcsr
    participant Img as vbcsr.ImageContainer

    Data->>HR: __call__(data)
    activate HR
    HR->>OrbMap: ensure mapping / query basis & pair indices
    OrbMap-->>HR: orbital sizes, masks, pair indices

    loop build blocks
        HR->>HR: construct onsite blocks (CPU tensors)
        HR->>HR: construct bondwise hopping blocks (CPU tensors)
        opt SOC enabled
            HR->>HR: construct SOC blocks and edge shifts
        end
    end

    HR->>VB: AtomicData_vbcsr.from_distributed(...)
    VB->>Img: ImageContainer(...) create
    HR->>Img: populate masked onsite + bondwise blocks (with shifts)
    Img-->>HR: assembled container
    HR-->>Data: return ImageContainer
    deactivate HR

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

add two center integral of overlap computation #311: Overlap computation changes in dptb/postprocess/ovp2c.py — touches the same overlap/image construction routines.
Feat: Add new unifined postprocess class to handle DeePTB model. #301: Unified postprocess additions integrating Hr2HR into DeePTBAdapter.get_hR and TBSystem pathways.

Suggested reviewers

AsymmetryChou
Franklalalala
Lonya0

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Sparse Hr to HR forming' directly and clearly describes the main feature added in this PR: support for constructing HR image containers from distributed Hr features using sparse (VBCSR) format, which is the central change across all modified files.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@dptb/nn/hr2hR.py`:
- Around line 112-113: The CPU conversion calls (onsite_block.to("cpu"),
bondwise_hopping.to("cpu"), soc_upup_block.to("cpu"), soc_updn_block.to("cpu"))
only assign the CPU tensors to attributes (self.onsite_block,
self.bondwise_hopping, self.soc_upup_block, self.soc_updn_block) but the
subsequent loops still use the original local variables (onsite_block,
bondwise_hopping, soc_upup_block, soc_updn_block) which may be on GPU; reassign
those local variables to the CPU copies (e.g., onsite_block = self.onsite_block,
bondwise_hopping = self.bondwise_hopping, soc_upup_block = self.soc_upup_block,
soc_updn_block = self.soc_updn_block) before the loops so the numpy operations
inside the loops operate on CPU tensors only.
- Line 50: The __call__ method is annotated to return AtomicDataDict.Type but
actually returns an ImageContainer (see the return at line where ImageContainer
is constructed); update the signature of def __call__(self, data:
AtomicDataDict.Type) -> ImageContainer to the correct return type (or to a more
general union if it can sometimes return other types), and adjust any related
docstring/type comments and imports so static type checkers and callers reflect
the ImageContainer return.
- Around line 62-64: Remove the two no-op calls after creating bondwise_hopping:
the expressions bondwise_hopping.to(self.device) and
bondwise_hopping.type(self.dtype) are non-mutating and their return values are
discarded; since bondwise_hopping is already constructed with dtype=self.dtype
and device=self.device in the torch.zeros call (using self.idp.full_basis_norb
for shape), delete those two lines so only the initialized tensor assignment
remains.

In `@dptb/postprocess/ovp2c.py`:
- Line 98: The code uses np.float64 in the call to ImageContainer(adata,
np.float64) but numpy is not imported; add the missing import by adding "import
numpy as np" at the top of the module so np.float64 is defined and
ImageContainer receives a valid dtype.
- Around line 78-79: Remove the unused leftover variables by deleting the
assignments to device and ovp_dict in the compute_overlap_image function (these
were copied from compute_overlap but never used); ensure no other code in
compute_overlap_image references _keys.EDGE_INDEX_KEY.device or ovp_dict so the
function remains clean and compiles.

🧹 Nitpick comments (6)

dptb/tests/test_hr2hr.py (2)
36-38: Test has no assertions — it only verifies the pipeline doesn't crash.

vb is assigned but never checked. Without assertions on shape, dtype, or values, regressions in the numerical output will go undetected. Consider adding at least basic sanity checks:
💡 Suggested assertions
     adata = model(adata)
     image_c = hr2hr(adata)
     vb = image_c.sample_k([0,0,0])
+    assert vb is not None
+    # e.g., check shape, dtype, or that diagonal elements are non-zero
1-7: Unused import: Band is imported but never referenced.

Line 1 imports Band from dptb.postprocess.bandstructure.band, but it is not used anywhere in the test.
Proposed fix
-from dptb.postprocess.bandstructure.band import Band
 from dptb.utils.tools import j_loader
 from dptb.nn.build import build_model
 from dptb.data import build_dataset, AtomicData, AtomicDataDict
 from dptb.nn.hr2hR import Hr2HR
 import torch
 import pytest
dptb/postprocess/ovp2c.py (1)

8-17: Guarded imports silently swallow errors via print() — use logging.warning and consider the failure mode.

Using bare print() for import failures is not ideal in a library module. More importantly, if vbcsr or op2c are not installed, the functions that depend on them (compute_overlap_image, compute_overlap) will fail with an unrelated NameError at call time rather than a clear message.

Also note the inconsistency: dptb/nn/hr2hR.py raises ImportError immediately for missing vbcsr, while this file silently continues.

Consider either:

Using logging.warning() and raising a clear error inside the functions that need these deps, or

Raising ImportError at import time (matching hr2hR.py).
dptb/nn/hr2hR.py (3)
9-13: Chain the original exception for better debugging context.

Per PEP 3134 / B904, use raise ... from err so the traceback preserves the original ImportError.
Proposed fix
 try:
     from vbcsr import ImageContainer
     from vbcsr import AtomicData as AtomicData_vbcsr
-except ImportError:
-    raise ImportError("Please install vbcsr first: pip install vbcsr")
+except ImportError as err:
+    raise ImportError("Please install vbcsr first: pip install vbcsr") from err
18-18: basis parameter should use Optional type hint.

basis defaults to None but the type hint is Dict[str, Union[str, list]]. PEP 484 requires explicit Optional when None is a valid value.
-            basis: Dict[str, Union[str, list]]=None,
+            basis: Optional[Dict[str, Union[str, list]]]=None,
(Also add Optional to the typing imports on line 3.)

71-71: tdtype2ndtype is imported but unused; ndtype is hardcoded.

Line 7 imports tdtype2ndtype from dptb.utils.tools, but ndtype is hardcoded to np.float64 (line 71) or np.complex128 (line 80) rather than derived from self.dtype. Consider using the utility:
-        ndtype = np.float64
+        ndtype = tdtype2ndtype(self.dtype)
This would also correctly handle float32 → np.float32 if someone passes dtype=torch.float64, rather than always defaulting to float64.

Also applies to: 7-7

coderabbitai · 2026-02-13T21:41:19Z

dptb/nn/hr2hR.py

+        self.edge_field = edge_field
+        self.node_field = node_field
+
+    def __call__(self, data: AtomicDataDict.Type) -> AtomicDataDict.Type:


⚠️ Potential issue | 🟡 Minor

Return type annotation is incorrect — returns ImageContainer, not AtomicDataDict.Type.

The method returns an ImageContainer (line 165), but the signature declares -> AtomicDataDict.Type. This will mislead callers and type checkers.

- def __call__(self, data: AtomicDataDict.Type) -> AtomicDataDict.Type: + def __call__(self, data: AtomicDataDict.Type) -> "ImageContainer":

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __call__(self, data: AtomicDataDict.Type) -> AtomicDataDict.Type:

def __call__(self, data: AtomicDataDict.Type) -> "ImageContainer":

🤖 Prompt for AI Agents

In `@dptb/nn/hr2hR.py` at line 50, The __call__ method is annotated to return AtomicDataDict.Type but actually returns an ImageContainer (see the return at line where ImageContainer is constructed); update the signature of def __call__(self, data: AtomicDataDict.Type) -> ImageContainer to the correct return type (or to a more general union if it can sometimes return other types), and adjust any related docstring/type comments and imports so static type checkers and callers reflect the ImageContainer return.

dptb/nn/hr2hR.py

dptb/postprocess/ovp2c.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@dptb/entrypoints/train.py`:
- Around line 105-106: The current check uses "if pred_opts and
pred_opts.get('method', 'e3tb') != 'e3tb':" which skips validation when
pred_opts is an empty dict; change it to always validate the method (e.g.,
ensure pred_opts is a dict via pred_opts = pred_opts or {} or remove the
truthiness guard) and then check pred_opts.get('method', 'e3tb') != 'e3tb' and
raise the ValueError accordingly so orbital files in 'basis' are rejected for
non-'e3tb' methods; reference the pred_opts variable, its .get(...) call, and
the surrounding logic in train.py (normalize() may already populate
prediction.method) when making this change.

In `@dptb/tests/test_train_orb.py`:
- Around line 79-102: The test test_train_orbital_parsing_wrong_method leaves
output directories and log files behind because it calls train(...,
output="dummy_out", ...) without mocking filesystem or logging helpers; update
the test to either (a) mock Path.mkdir, os.makedirs and set_log_handles (same
mocks used in the other test) so no directories/log files are created, or (b)
call train with output=None to skip directory creation and also ensure
set_log_handles is mocked or safe to call when both output and log_path are None
so the code path still raises the intended ValueError in train(); reference
train(), set_log_handles, Path.mkdir and os.makedirs when making the change.

🧹 Nitpick comments (3)

dptb/utils/orbital_parser.py (1)
21-25: Minor TOCTOU between os.path.exists and open.

The file could be removed between the existence check and the open call. Since open itself raises FileNotFoundError, you could simplify by just catching the exception from open, eliminating both the race and the redundant check.
♻️ Suggested simplification
-    if not os.path.exists(filepath):
-        raise FileNotFoundError(f"Orbital file not found: {filepath}")
-        
-    with open(filepath, 'r') as f:
-        content = f.read()
+    try:
+        with open(filepath, 'r') as f:
+            content = f.read()
+    except FileNotFoundError:
+        raise FileNotFoundError(f"Orbital file not found: {filepath}")
dptb/tests/test_train_orb.py (2)
19-77: Test relies on a hardcoded relative path and has unnecessary boilerplate.

The hardcoded path "./dptb/tests/data/e3_band/data/Si_gga_7au_100Ry_2s2p1d.orb" (line 21) breaks if tests run from a different working directory. Consider using os.path.dirname(__file__) to build the path relative to the test file.

The finally: pass block (lines 76–77) is a no-op and can be removed.
♻️ Use a path relative to the test file
-        orb_file_path = "./dptb/tests/data/e3_band/data/Si_gga_7au_100Ry_2s2p1d.orb"
+        orb_file_path = os.path.join(
+            os.path.dirname(__file__), "data", "e3_band", "data", "Si_gga_7au_100Ry_2s2p1d.orb"
+        )
46-57: Fragile test pattern: using InterruptedError side-effect to halt execution mid-flow.

The test relies on build_model being called after the orbital parsing block to capture the modified config. If the call order in train() changes, this test will silently pass or fail for the wrong reason. Consider adding an explicit assertion that mock_build_model was actually called (e.g., mock_build_model.assert_called_once()), so the test fails clearly if the control flow changes.

Also, the except Exception catch-all on line 56 will swallow genuine failures — e.g., if the orbital parsing itself raises unexpectedly, it gets reported as a generic self.fail message rather than the real traceback.

coderabbitai · 2026-02-13T22:36:02Z

dptb/entrypoints/train.py

+                if pred_opts and pred_opts.get("method", "e3tb") != "e3tb":
+                     raise ValueError(f"Orbital files in 'basis' are only supported for the 'e3tb' method. Found method: {pred_opts.get('method')}")


⚠️ Potential issue | 🟠 Major

Method validation is silently skipped when pred_opts is an empty dict.

if pred_opts and pred_opts.get(...) — since bool({}) is False, an empty prediction dict (or absent model_options) will bypass validation entirely. If normalize() is guaranteed to populate prediction.method by this point, the pred_opts truthiness guard is unnecessary and misleading. If it's not guaranteed, orbital files could slip through for non-e3tb methods.

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 106-106: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@dptb/entrypoints/train.py` around lines 105 - 106, The current check uses "if pred_opts and pred_opts.get('method', 'e3tb') != 'e3tb':" which skips validation when pred_opts is an empty dict; change it to always validate the method (e.g., ensure pred_opts is a dict via pred_opts = pred_opts or {} or remove the truthiness guard) and then check pred_opts.get('method', 'e3tb') != 'e3tb' and raise the ValueError accordingly so orbital files in 'basis' are rejected for non-'e3tb' methods; reference the pred_opts variable, its .get(...) call, and the surrounding logic in train.py (normalize() may already populate prediction.method) when making this change.

coderabbitai · 2026-02-13T22:36:02Z

dptb/tests/test_train_orb.py

+    @patch('dptb.entrypoints.train.j_loader')
+    @patch('dptb.entrypoints.train.normalize')
+    def test_train_orbital_parsing_wrong_method(self, mock_normalize, mock_j_loader):
+         with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.orb') as tmp_orb:
+            tmp_orb_path = tmp_orb.name
+         try:
+            mock_config = {
+                "common_options": {
+                    "basis": {"Si": tmp_orb_path},
+                    "dtype": "float32"
+                },
+                "model_options": {
+                    "prediction": {"method": "sktb"} # Not e3tb
+                }
+            }
+            mock_j_loader.return_value = mock_config
+            mock_normalize.return_value = mock_config
+
+            with self.assertRaisesRegex(ValueError, "only supported for the 'e3tb' method"):
+                train(INPUT="dummy.json", init_model=None, restart=None, output="dummy_out", log_level=20, log_path=None)
+
+         finally:
+            if os.path.exists(tmp_orb_path):
+                os.remove(tmp_orb_path)


⚠️ Potential issue | 🟠 Major

test_train_orbital_parsing_wrong_method creates real directories and log files.

Unlike the first test, this test doesn't mock Path.mkdir, os.makedirs, or set_log_handles. Calling train(..., output="dummy_out", ...) will create dummy_out/, dummy_out/checkpoint/, and a log file on disk before the ValueError is raised. The finally block only cleans up the temp .orb file, leaving these artifacts behind.

Either add the same filesystem mocks as in the first test, or pass output=None (which skips directory creation in train()).

🐛 Simplest fix: pass output=None

with self.assertRaisesRegex(ValueError, "only supported for the 'e3tb' method"): - train(INPUT="dummy.json", init_model=None, restart=None, output="dummy_out", log_level=20, log_path=None) + train(INPUT="dummy.json", init_model=None, restart=None, output=None, log_level=20, log_path=None)

Note: you'll also need to mock or handle set_log_handles since log_path will be None and output is None, but verify this doesn't cause issues in the code path before the ValueError is raised.

🤖 Prompt for AI Agents

In `@dptb/tests/test_train_orb.py` around lines 79 - 102, The test test_train_orbital_parsing_wrong_method leaves output directories and log files behind because it calls train(..., output="dummy_out", ...) without mocking filesystem or logging helpers; update the test to either (a) mock Path.mkdir, os.makedirs and set_log_handles (same mocks used in the other test) so no directories/log files are created, or (b) call train with output=None to skip directory creation and also ensure set_log_handles is mocked or safe to call when both output and log_path are None so the code path still raises the intended ValueError in train(); reference train(), set_log_handles, Path.mkdir and os.makedirs when making the change.

…FEAST

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@dptb/nn/energy.py`:
- Around line 133-137: Constructor contains a duplicate assignment to self.mode;
remove the redundant line so self.mode is only assigned once (keep the first
assignment near self.sigma/self.neig and delete the second self.mode = mode).
Update the initializer in the class (the method that sets self.sigma, self.neig,
self.mode) to only assign self.mode a single time to avoid redundancy.
- Around line 14-19: Replace the bare except by setting sentinel None values for
the optionally-imported symbols (PyPardisoSolver, FeastSolver, eigsh,
LinearOperator) so they're defined even if imports fail, and update the
constructors PardisoEig.__init__ and FEASTEig.__init__ to check those sentinels
and raise informative ImportError messages (e.g., PardisoEig should check
PyPardisoSolver and eigsh and raise "PardisoEig requires MKL (pypardiso) and
scipy.sparse.linalg"; FEASTEig should check FeastSolver and LinearOperator
similarly) so users get a clear dependency error instead of a NameError.

In `@dptb/tests/test_pardisoeig.py`:
- Around line 1-8: The test imports PardisoEig which depends on
PyPardisoSolver/MKL; add a guard to skip the test when PyPardisoSolver (or MKL)
is not available similar to the FEAST tests — e.g., at the top of
dptb/tests/test_pardisoeig.py call pytest.importorskip("pypardiso") or attempt
to import PyPardisoSolver and call pytest.skip() on ImportError so
PardisoEig-related tests are not executed in MKL-less environments; ensure the
skip is placed before any usage or import of PardisoEig so the NameError cannot
occur.
- Line 50: Remove the unused variable h by either deleting the assignment or
turning it into an assertion; specifically, in the test where h is set via
image_h.sample_k([0,0,0], symm=True).to_scipy(format="csr"), either remove that
line entirely if it was a leftover debug, or replace it with an assertion that
validates the returned object (e.g., check type, shape, non-emptiness, or
expected values) so the call is exercised; locate the assignment to h in the
test_pardisoeig test and apply the chosen change.
- Around line 37-48: The test constructs sr2sr incorrectly: change the Hr2HR
instantiation named sr2sr to use overlap=True and set
edge_field=AtomicDataDict.EDGE_OVERLAP_KEY and
node_field=AtomicDataDict.NODE_OVERLAP_KEY so sr2sr builds the S (overlap)
blocks rather than a second Hamiltonian; also add the pytest skip marker
`@pytest.mark.skipif`(_MKL_RT is None, reason="MKL runtime not found") to the test
function (PardisoEig requires MKL/PARDISO) and remove the unused variable h on
line 50 to clean up the test.

In `@dptb/tests/test_postprocess_hr.py`:
- Around line 31-34: The test calls system.get_hR() and then unconditionally
calls s.sample_k(...), but get_hR can return (h_container, None) when there is
no overlap; guard against s being None before calling sample_k by checking "s is
not None" (or assert/skip the sample step) and only call s.sample_k when
present; update the test in dptb/tests/test_postprocess_hr.py around the get_hR
usage to handle the None case (e.g., conditional branch, early return, or
informative assertion) referencing get_hR, the returned variable s, and the
method sample_k.
- Around line 22-43: The test test_get_hR_real_model currently only prints
results; add assertions to catch regressions: assert that h is not None and has
a non-empty shape (e.g., hasattr(h, "shape") and h.shape[0] > 0) or appropriate
length if it's a list, assert that s is not None and exposes sample_k
(hasattr(s, "sample_k")), call sK = s.sample_k([0,0,0], sym=True) and assert sK
is not None and that sK.to_dense() has non-zero size (e.g., hasattr(...,
"shape") and shape sum > 0) to validate types/contents returned by
TBSystem.get_hR.

In `@dptb/utils/pardiso_wrapper.py`:
- Around line 209-213: In _is_already_factorized, replace the direct type
comparison "type(self.factorized_A) == str" with an isinstance check to handle
str subclasses properly; i.e. use isinstance(self.factorized_A, str) in the
conditional so the branch that returns self._hash_csr_matrix(A) ==
self.factorized_A executes for any string-like factorized_A while leaving the
else branch (self._csr_matrix_equal) unchanged.

🧹 Nitpick comments (9)

dptb/postprocess/unified/calculator.py (1)
172-197: Inconsistent indentation inside get_hR.

Lines 173–181 have an extra level of indentation compared to the rest of the method body (lines 191–197). This appears to be a formatting mistake.

Also, Hr2HR is re-instantiated on every call, which involves get_orbpair_maps() and get_orbpair_soc_maps() internally. Consider caching these converters (e.g., in __init__ or lazily) if get_hR is called repeatedly.
Fix indentation
     def get_hR(self, atomic_data):
-                # Initialize hR converters
-        h2R = Hr2HR(
-            idp=self.model.idp,
-            edge_field=AtomicDataDict.EDGE_FEATURES_KEY,
-            node_field=AtomicDataDict.NODE_FEATURES_KEY,
-            overlap=False,
-            dtype=self.model.dtype,
-            device=self.device
-        )
+        # Initialize hR converters
+        h2R = Hr2HR(
+            idp=self.model.idp,
+            edge_field=AtomicDataDict.EDGE_FEATURES_KEY,
+            node_field=AtomicDataDict.NODE_FEATURES_KEY,
+            overlap=False,
+            dtype=self.model.dtype,
+            device=self.device
+        )
dptb/tests/test_postprocess_hr.py (1)

1-16: Remove commented-out mock/patch code.

Lines 4, 6–7, and 14–15 contain commented-out imports and decorators that add noise. If mocking is planned for the future, track it in an issue rather than leaving dead comments in the test file.
dptb/utils/feast_wrapper.py (4)
12-35: Duplicated _find_mkl_rt — extract into a shared module.

This function is identical to the one in dptb/utils/pardiso_wrapper.py (lines 20–50). Having two copies means bug fixes or search-path changes must be applied in both places. Consider moving it to a shared utility (e.g., dptb/utils/mkl_utils.py) and importing from there.

174-193: Add stacklevel=2 to warnings.warn.

Without stacklevel=2, the warning will point to line 178 inside _prepare_matrix rather than to the caller's line, making it harder to diagnose. This was also flagged by static analysis (B028).
Proposed fix
         if mat.dtype != dtype:
-            warnings.warn(f"Converting {name} to {dtype} for FEAST")
+            warnings.warn(f"Converting {name} to {dtype} for FEAST", stacklevel=2)
             mat = mat.astype(dtype)
269-348: Replace print() with warnings.warn or logging for retry messages.

Line 346 uses print() for the info=3 retry message. In a library/solver context, print() is uncontrollable by callers; warnings.warn or logging.info would be more appropriate. Also, the loop variable attempt (line 270) is unused — rename to _ per convention.
Proposed fix
-        for attempt in range(max_refinement + 1):
+        for _attempt in range(max_refinement + 1):
-                print(f"FEAST info=3 (Subspace too small/bad estimate). MKL suggested {m0_c.value}. Increasing m0 from {m0} to {new_m0} and retrying.")
+                warnings.warn(
+                    f"FEAST info=3: subspace too small. Increasing m0 from {m0} to {new_m0} and retrying.",
+                    stacklevel=2,
+                )
301-327: Move invariant M preparations outside the retry loop.

m_data, ib, and jb (lines 303–305) are derived from M_triu, which doesn't change across retry iterations. Computing them once before the loop avoids redundant work.
Proposed fix sketch
+        if M is not None:
+            m_data = M_triu.data
+            ib = M_triu.indptr.astype(np.int32) + 1
+            jb = M_triu.indices.astype(np.int32) + 1
+
         # Retry loop for m0 refinement
-        for attempt in range(max_refinement + 1):
+        for _attempt in range(max_refinement + 1):
             ...
             else:
-                 m_data = M_triu.data
-                 ib = M_triu.indptr.astype(np.int32) + 1
-                 jb = M_triu.indices.astype(np.int32) + 1
-
                  fn_gen(
dptb/utils/pardiso_wrapper.py (1)
220-223: Consider replacing sha1 with sha256 to silence security linter (S324).

While this is used for matrix fingerprinting (not a security-sensitive context), swapping to sha256 is a trivial change that eliminates the linter warnings with no performance impact at this scale.
Proposed fix
     def _hash_csr_matrix(self, matrix):
-        return (hashlib.sha1(matrix.indices).hexdigest() +
-                hashlib.sha1(matrix.indptr).hexdigest() +
-                hashlib.sha1(matrix.data).hexdigest())
+        return (hashlib.sha256(matrix.indices).hexdigest() +
+                hashlib.sha256(matrix.indptr).hexdigest() +
+                hashlib.sha256(matrix.data).hexdigest())
dptb/nn/energy.py (2)
160-195: Closure captures loop variables solver and A (B023) — bind explicitly.

While this currently works because matvec is consumed synchronously within the same iteration, it's fragile. Any refactor that defers eigsh execution (e.g., async, batching) would silently use the wrong solver/matrix. Bind via default arguments to make the intent explicit.

Also, the broad except Exception on line 191 could mask real bugs (e.g., memory errors). Consider catching ArpackNoConvergence specifically.
Proposed fix for closure binding
-            def matvec(b):
-                return solver.solve(A, b)
+            def matvec(b, _solver=solver, _A=A):
+                return _solver.solve(_A, b)
227-232: Chain exceptions with raise ... from e for proper tracebacks.

Lines 230 and 232 re-raise without chaining, which suppresses the original traceback and makes debugging harder.
Proposed fix
         try:
             self.solver = FeastSolver()
         except ImportError as e:
-            raise ImportError(f"FEAST solver not available: {e}")
+            raise ImportError(f"FEAST solver not available: {e}") from e
         except Exception as e:
-             raise RuntimeError(f"Failed to initialize FeastSolver: {e}")
+             raise RuntimeError(f"Failed to initialize FeastSolver: {e}") from e

dptb/nn/energy.py

coderabbitai · 2026-02-14T15:12:11Z

dptb/tests/test_pardisoeig.py

+from dptb.postprocess.bandstructure.band import Band
+from dptb.utils.tools import j_loader
+from dptb.nn.build import build_model
+from dptb.data import build_dataset, AtomicData, AtomicDataDict
+from dptb.nn.hr2hR import Hr2HR
+from dptb.nn.energy import PardisoEig
+import torch
+import pytest


⚠️ Potential issue | 🟠 Major

Missing MKL availability guard — test will fail with NameError if MKL is not installed.

PardisoEig requires PyPardisoSolver which needs MKL. Without a skip condition, this test will crash in environments without MKL (e.g., CI runners). Add a skip marker consistent with the FEAST tests.

Proposed fix

from dptb.nn.energy import PardisoEig import torch import pytest +from dptb.utils.pardiso_wrapper import _MKL_RT_HANDLE +@pytest.mark.skipif(_MKL_RT_HANDLE is None, reason="MKL runtime not found") def test_hr2hr(root_directory):

Also applies to: 15-15

🤖 Prompt for AI Agents

In `@dptb/tests/test_pardisoeig.py` around lines 1 - 8, The test imports PardisoEig which depends on PyPardisoSolver/MKL; add a guard to skip the test when PyPardisoSolver (or MKL) is not available similar to the FEAST tests — e.g., at the top of dptb/tests/test_pardisoeig.py call pytest.importorskip("pypardiso") or attempt to import PyPardisoSolver and call pytest.skip() on ImportError so PardisoEig-related tests are not executed in MKL-less environments; ensure the skip is placed before any usage or import of PardisoEig so the NameError cannot occur.

coderabbitai · 2026-02-14T15:12:11Z

dptb/tests/test_pardisoeig.py

+    sr2sr = Hr2HR(
+        idp=model.idp,
+        edge_field=AtomicDataDict.EDGE_FEATURES_KEY,
+        node_field=AtomicDataDict.NODE_FEATURES_KEY,
+        overlap=False,
+        dtype=torch.float32, 
+        device=torch.device("cpu")
+    )
+
+    adata = model(adata)
+    image_h = hr2hr(adata)
+    image_s = sr2sr(adata)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find Hr2HR class definition and understand overlap parameter usage find . -type f -name "*.py" | xargs rg -l "class Hr2HR" | head -5

Repository: deepmodeling/DeePTB

Length of output: 81

🏁 Script executed:

#!/bin/bash # Get Hr2HR class implementation and overlap parameter handling rg -A 30 "class Hr2HR" --type=py | head -60

Repository: deepmodeling/DeePTB

Length of output: 1802

🏁 Script executed:

#!/bin/bash # Search for overlap parameter usage in Hr2HR to understand semantics rg -n "overlap" --type=py -C 3 | grep -E "Hr2HR|overlap" | head -40

Repository: deepmodeling/DeePTB

Length of output: 3172

🏁 Script executed:

#!/bin/bash # Get more of Hr2HR class to find __call__ or forward method rg -A 100 "class Hr2HR" --type=py dptb/nn/hr2hR.py | head -150

Repository: deepmodeling/DeePTB

Length of output: 4831

🏁 Script executed:

#!/bin/bash # Also check the test file to see how sr2sr and hr2hr are used cat -n dptb/tests/test_pardisoeig.py | head -70

Repository: deepmodeling/DeePTB

Length of output: 2500

🏁 Script executed:

#!/bin/bash # Verify that PardisoEig.solve expects both H and S matrices rg -A 10 "class PardisoEig" --type=py dptb/nn/energy.py | head -30

Repository: deepmodeling/DeePTB

Length of output: 468

🏁 Script executed:

#!/bin/bash # Check if there are any MKL-related imports or dependencies in the energy module rg -n "mkl|MKL|pardiso" --type=py dptb/nn/energy.py

Repository: deepmodeling/DeePTB

Length of output: 211

🏁 Script executed:

#!/bin/bash # Look for other test patterns with MKL skip decorators rg -B 3 "@pytest.mark.skipif" --type=py dptb/tests/ | head -30

Repository: deepmodeling/DeePTB

Length of output: 615

🏁 Script executed:

#!/bin/bash # Check if there are separate overlap feature fields expected rg -n "OVERLAP|overlap" --type=py dptb/data/ | grep -E "KEY|FEATURES" | head -20

Repository: deepmodeling/DeePTB

Length of output: 2245

🏁 Script executed:

#!/bin/bash # Search for how overlap matrices are handled - check data dictionary keys rg "OVERLAP.*KEY|overlap.*features" --type=py -C 2 | head -40

Repository: deepmodeling/DeePTB

Length of output: 2850

🏁 Script executed:

#!/bin/bash # Check PardisoEig.solve signature to understand parameter order rg -A 15 "def solve" --type=py dptb/nn/energy.py | head -40

Repository: deepmodeling/DeePTB

Length of output: 1531

🏁 Script executed:

#!/bin/bash # Check if the model generates overlap features automatically rg -n "EDGE_OVERLAP|NODE_OVERLAP" --type=py dptb/nn/build.py | head -10

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

#!/bin/bash # Look for examples of how to properly create both H and S matrices rg -B 5 -A 5 "sr2sr\|image_s" --type=py dptb/tests/ | head -50

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

#!/bin/bash # Check if PardisoEig requires MKL like FeastSolver does rg -n "PyPardisoSolver\|pardiso_wrapper" --type=py dptb/nn/energy.py

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

#!/bin/bash # Check if the model generates overlap features and what the actual structure looks like cat -n dptb/tests/test_pardisoeig.py | sed -n '40,65p'

Repository: deepmodeling/DeePTB

Length of output: 883

🏁 Script executed:

#!/bin/bash # Look at the model structure to understand if it generates both H and S rg -n "overlap_param\|NODE_OVERLAP\|EDGE_OVERLAP" --type=py dptb/nn/ | head -20

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

#!/bin/bash # Check what happens when we call model(adata) - does it add overlap features? rg -B 2 -A 8 "def forward" --type=py dptb/nn/build.py | head -40

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

#!/bin/bash # Search for what build_model returns and if it has overlap handling rg -A 30 "def build_model" --type=py dptb/nn/build.py | head -50

Repository: deepmodeling/DeePTB

Length of output: 1316

🏁 Script executed:

#!/bin/bash # Check what fields the model outputs when called rg -n "class.*Model\|def forward" --type=py dptb/nn/network.py | head -20

Repository: deepmodeling/DeePTB

Length of output: 121

🏁 Script executed:

#!/bin/bash # Look at how overlap=True should be used - find examples rg -A 3 "overlap=True" --type=py dptb/ | head -30

Repository: deepmodeling/DeePTB

Length of output: 1952

🏁 Script executed:

#!/bin/bash # Confirm the field names needed for overlap by looking at the calculator example rg -B 2 -A 5 "edge_field=AtomicDataDict.EDGE_OVERLAP_KEY" --type=py dptb/postprocess/unified/calculator.py

Repository: deepmodeling/DeePTB

Length of output: 766

🏁 Script executed:

#!/bin/bash # Check if PardisoEig needs MKL skip like FeastSolver does rg -B 5 -A 15 "class PardisoEig" --type=py dptb/nn/energy.py

Repository: deepmodeling/DeePTB

Length of output: 651

🏁 Script executed:

#!/bin/bash # Check if PardisoEig imports or uses MKL/Pardiso solver rg -n "import\|from" --type=py dptb/nn/energy.py | head -30

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

#!/bin/bash # Check what happens with s_container if it's built the same way as h_container rg -B 5 -A 15 "def solve" --type=py dptb/nn/energy.py | grep -A 15 "def solve"

Repository: deepmodeling/DeePTB

Length of output: 1531

🏁 Script executed:

#!/bin/bash # Get the full PardisoEig implementation to see if it uses Pardiso/MKL rg -A 100 "class PardisoEig:" --type=py dptb/nn/energy.py | head -150

Repository: deepmodeling/DeePTB

Length of output: 4078

🏁 Script executed:

#!/bin/bash # Check for any MKL dependencies or imports in the energy module cat dptb/nn/energy.py | head -40

Repository: deepmodeling/DeePTB

Length of output: 1677

🏁 Script executed:

#!/bin/bash # Check if PyPardisoSolver requires MKL or has fallbacks rg -A 20 "class PyPardisoSolver" --type=py dptb/utils/pardiso_wrapper.py | head -30

Repository: deepmodeling/DeePTB

Length of output: 1090

🏁 Script executed:

#!/bin/bash # Check how FeastSolver handles the MKL runtime requirement rg -B 5 "_MKL_RT" --type=py dptb/utils/feast_wrapper.py | head -20

Repository: deepmodeling/DeePTB

Length of output: 598

🏁 Script executed:

#!/bin/bash # Check how the model is trained - does it have overlap features? rg -n "overlap_param\|has_overlap" --type=py dptb/tests/test_pardisoeig.py

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

#!/bin/bash # Check what the test model checkpoint contains ls -lh dptb/tests/data/e3_band/ref_model/

Repository: deepmodeling/DeePTB

Length of output: 139

🏁 Script executed:

#!/bin/bash # Verify that the test model was NOT trained with overlap rg -C 3 "nnenv.ep1474" --type=py dptb/tests/

Repository: deepmodeling/DeePTB

Length of output: 1952

🏁 Script executed:

#!/bin/bash # Check if the comment in test says it's testing with an overlap model head -20 dptb/tests/test_pardisoeig.py

Repository: deepmodeling/DeePTB

Length of output: 734

🏁 Script executed:

#!/bin/bash # Verify that model outputs both EDGE_FEATURES and EDGE_OVERLAP when overlap model exists rg -A 30 "build_dataset.from_model" --type=py dptb/tests/ | head -50

Repository: deepmodeling/DeePTB

Length of output: 2706

🏁 Script executed:

#!/bin/bash # Check if the model was trained with overlap capability rg -n "overlap" --type=py dptb/tests/test_postprocess_hr.py | head -10

Repository: deepmodeling/DeePTB

Length of output: 138

sr2sr is constructed with overlap=False — should be overlap=True with overlap field keys.

sr2sr is created with identical parameters to hr2hr (including overlap=False), but is then used as the overlap container (image_s) passed to PardisoEig.solve for a generalized eigenvalue problem. With overlap=False, Hr2HR builds Hamiltonian blocks, not overlap blocks, so image_s becomes a second copy of the Hamiltonian rather than the S matrix. This produces physically incorrect eigenvalues.

To fix, sr2sr needs:

overlap=True

edge_field=AtomicDataDict.EDGE_OVERLAP_KEY

node_field=AtomicDataDict.NODE_OVERLAP_KEY

Additionally, the test lacks @pytest.mark.skipif(_MKL_RT is None, reason="MKL runtime not found") since PardisoEig depends on MKL's PARDISO solver (see line 59 and dptb/nn/energy.py imports). Also, line 50's unused variable h should be removed.

🤖 Prompt for AI Agents

In `@dptb/tests/test_pardisoeig.py` around lines 37 - 48, The test constructs sr2sr incorrectly: change the Hr2HR instantiation named sr2sr to use overlap=True and set edge_field=AtomicDataDict.EDGE_OVERLAP_KEY and node_field=AtomicDataDict.NODE_OVERLAP_KEY so sr2sr builds the S (overlap) blocks rather than a second Hamiltonian; also add the pytest skip marker `@pytest.mark.skipif`(_MKL_RT is None, reason="MKL runtime not found") to the test function (PardisoEig requires MKL/PARDISO) and remove the unused variable h on line 50 to clean up the test.

coderabbitai · 2026-02-14T15:12:11Z

dptb/tests/test_pardisoeig.py

+    image_h = hr2hr(adata)
+    image_s = sr2sr(adata)
+
+    h = image_h.sample_k([0,0,0], symm=True).to_scipy(format="csr")


⚠️ Potential issue | 🟡 Minor

Remove unused variable h.

h is assigned on line 50 but never used (F841). If this was a debug check, either remove it or fold it into an assertion.

Proposed fix

- h = image_h.sample_k([0,0,0], symm=True).to_scipy(format="csr")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

h = image_h.sample_k([0,0,0], symm=True).to_scipy(format="csr")

🧰 Tools

🪛 Ruff (0.15.0)

[error] 50-50: Local variable h is assigned to but never used

Remove assignment to unused variable h

(F841)

🤖 Prompt for AI Agents

In `@dptb/tests/test_pardisoeig.py` at line 50, Remove the unused variable h by either deleting the assignment or turning it into an assertion; specifically, in the test where h is set via image_h.sample_k([0,0,0], symm=True).to_scipy(format="csr"), either remove that line entirely if it was a leftover debug, or replace it with an assertion that validates the returned object (e.g., check type, shape, non-emptiness, or expected values) so the call is exercised; locate the assignment to h in the test_pardisoeig test and apply the chosen change.

coderabbitai · 2026-02-14T15:12:11Z

dptb/tests/test_postprocess_hr.py

+    def test_get_hR_real_model(self):
+        # Initialize System with real model
+        system = TBSystem(
+            data=self.struc_path,
+            calculator=self.model_path,
+            device='cpu'
+        )
+
+        # Call get_hR
+        h, s = system.get_hR()
+
+        sK = s.sample_k([0,0,0], sym=True)
+        print(sK.to_dense())
+
+        # Verification
+        # The model in test_to_pardiso might or might not have overlap. 
+        # Based on name "nnsk.iter_ovp0.000.pth", it likely does? Or maybe not.
+        # Let's check if s is None or not based on model config.
+        # But at least the code ran and returned what Hr2HR produced.
+
+
+


⚠️ Potential issue | 🟠 Major

Test has no assertions — it won't catch regressions.

This test only prints output without asserting any conditions. At minimum, assert that h is not None and validate shape/type expectations. A test without assertions silently passes even when the code is broken.

Suggested assertions

# Call get_hR h, s = system.get_hR() - sK = s.sample_k([0,0,0], sym=True) - print(sK.to_dense()) - - # Verification - # The model in test_to_pardiso might or might not have overlap. - # Based on name "nnsk.iter_ovp0.000.pth", it likely does? Or maybe not. - # Let's check if s is None or not based on model config. - # But at least the code ran and returned what Hr2HR produced. - - + self.assertIsNotNone(h, "H container should not be None") + # s can be None if the model has no overlap + if s is not None: + sK = s.sample_k([0, 0, 0], sym=True) + self.assertIsNotNone(sK)

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 31-31: Unpacked variable h is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@dptb/tests/test_postprocess_hr.py` around lines 22 - 43, The test test_get_hR_real_model currently only prints results; add assertions to catch regressions: assert that h is not None and has a non-empty shape (e.g., hasattr(h, "shape") and h.shape[0] > 0) or appropriate length if it's a list, assert that s is not None and exposes sample_k (hasattr(s, "sample_k")), call sK = s.sample_k([0,0,0], sym=True) and assert sK is not None and that sK.to_dense() has non-zero size (e.g., hasattr(..., "shape") and shape sum > 0) to validate types/contents returned by TBSystem.get_hR.

coderabbitai · 2026-02-14T15:12:11Z

dptb/tests/test_postprocess_hr.py

+        h, s = system.get_hR()
+
+        sK = s.sample_k([0,0,0], sym=True)
+        print(sK.to_dense())


⚠️ Potential issue | 🔴 Critical

s can be None, causing AttributeError on s.sample_k(...).

If the model doesn't have overlap, get_hR() returns (h_container, None). Line 33 will then crash with AttributeError: 'NoneType' object has no attribute 'sample_k'. The comment on line 37 even acknowledges this uncertainty.

Proposed fix

# Call get_hR h, s = system.get_hR() - sK = s.sample_k([0,0,0], sym=True) - print(sK.to_dense()) + if s is not None: + sK = s.sample_k([0, 0, 0], sym=True) + print(sK.to_dense())

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

h, s = system.get_hR()

sK = s.sample_k([0,0,0], sym=True)

print(sK.to_dense())

h, s = system.get_hR()

if s is not None:

sK = s.sample_k([0, 0, 0], sym=True)

print(sK.to_dense())

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 31-31: Unpacked variable h is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@dptb/tests/test_postprocess_hr.py` around lines 31 - 34, The test calls system.get_hR() and then unconditionally calls s.sample_k(...), but get_hR can return (h_container, None) when there is no overlap; guard against s being None before calling sample_k by checking "s is not None" (or assert/skip the sample step) and only call s.sample_k when present; update the test in dptb/tests/test_postprocess_hr.py around the get_hR usage to handle the None case (e.g., conditional branch, early return, or informative assertion) referencing get_hR, the returned variable s, and the method sample_k.

coderabbitai · 2026-02-14T15:12:12Z

dptb/utils/pardiso_wrapper.py

+    def _is_already_factorized(self, A):
+        if type(self.factorized_A) == str:
+            return self._hash_csr_matrix(A) == self.factorized_A
+        else:
+            return self._csr_matrix_equal(A, self.factorized_A)


⚠️ Potential issue | 🟡 Minor

Use isinstance() for type check instead of ==.

type(self.factorized_A) == str fails for str subclasses and violates PEP 8 / E721. Use isinstance instead.

Proposed fix

def _is_already_factorized(self, A): - if type(self.factorized_A) == str: + if isinstance(self.factorized_A, str): return self._hash_csr_matrix(A) == self.factorized_A else: return self._csr_matrix_equal(A, self.factorized_A)

🧰 Tools

🪛 Ruff (0.15.0)

[error] 210-210: Use is and is not for type comparisons, or isinstance() for isinstance checks

(E721)

🤖 Prompt for AI Agents

In `@dptb/utils/pardiso_wrapper.py` around lines 209 - 213, In _is_already_factorized, replace the direct type comparison "type(self.factorized_A) == str" with an isinstance check to handle str subclasses properly; i.e. use isinstance(self.factorized_A, str) in the conditional so the branch that returns self._hash_csr_matrix(A) == self.factorized_A executes for any string-like factorized_A while leaving the else branch (self._csr_matrix_equal) unchanged.

coderabbitai

Actionable comments posted: 9

🤖 Fix all issues with AI agents

In `@dptb/entrypoints/train.py`:
- Around line 106-122: The snippet has inconsistent indentation causing
IndentationError; align the try/except and inner statements to the surrounding
if/for blocks: ensure the "try:" block, the calls to parse_orbital_file(value),
the with open(...) block populating orbital_files_content[value], the
jdata["common_options"]["basis"][elem] = parsed_basis assignment, and the
log.info(...) line are indented uniformly at the same level inside the for/if
scope, and ensure the except: and its raise ValueError(...) match the try's
indentation; finally make sure the final "if orbital_files_content:" and its
body jdata["common_options"]["orbital_files_content"] = orbital_files_content
are indented one level inside the outer function/block. This will fix the
off-by-one indentation issues around parse_orbital_file, orbital_files_content,
jdata["common_options"]["basis"], and log.info.

In `@dptb/nn/energy.py`:
- Around line 193-201: The broad except around the eigsh call masks real errors;
change it to catch scipy.sparse.linalg.ArpackNoConvergence (or ArpackError)
specifically when retrying, and ensure the original exception is logged before
retrying so non-ARPACK errors are not swallowed. Locate the eigsh invocation
(vars: hk, M, Op, self.neig, self.sigma, mode, ncv) and replace the bare "except
Exception:" with a targeted "except ArpackNoConvergence/ArpackError as err:"
(importing the symbol), call the logger to record err, then perform the
larger-ncv retry; leave other exceptions to propagate.
- Around line 237-242: The except blocks around FeastSolver initialization
should chain the original exceptions so tracebacks are preserved; modify the two
re-raises in the try/except that set self.solver = FeastSolver() to use "raise
ImportError(... ) from e" for the ImportError branch and "raise RuntimeError(...
) from e" for the generic Exception branch, keeping the same messages but adding
"from e" to both raise statements referencing FeastSolver and self.solver
initialization.
- Around line 166-201: The code is double-shifting the matrix: hk is mutated to
(H − σS) or (H − σ) before being passed as A to eigsh while eigsh is also given
sigma=self.sigma and OPinv built from the shifted factorization, causing (H −
2σS) mismatch. Fix by preserving an unshifted copy (e.g., hk_orig = hk) to pass
as A to eigsh, then create a separate shifted matrix only for Pardiso
factorization (use solver.factorize on the shifted matrix), build Op from that
factorization, and pass M as a scipy CSR (convert sk via
sk.to_scipy(format="csr") when s_container is present) so eigsh receives the
original H (hk_orig), M as CSR, OPinv from the factorized shifted matrix, and
sigma=self.sigma.

In `@dptb/nn/hr2hR.py`:
- Line 69: The ndtype is hardcoded to np.float64; change it to derive from the
object's torch dtype by using the imported converter tdtype2ndtype so numpy
arrays match self.dtype (e.g., replace the line setting ndtype = np.float64 with
ndtype = tdtype2ndtype(self.dtype)); ensure this change is made in the same
scope where ndtype is used (in hr2hR.py near the ndtype initialization) so numpy
arrays respect the instance's precision.
- Around line 73-78: The code sets ndtype = np.complex128 unconditionally in the
soc branch which can mismatch self.ctype and waste memory; change the assignment
to derive ndtype from self.ctype by calling tdtype2ndtype(self.ctype) so the
NumPy dtype matches the torch tensor dtype used for orbpair_soc, soc_upup_block
and soc_updn_block (these variables and the soc branch that reads
AtomicDataDict.NODE_SOC_KEY should be updated accordingly).
- Around line 144-159: iatom and jatom are taken as 0-d tensors from
data[AtomicDataDict.EDGE_INDEX_KEY] and then used for indexing and passed to
image_container.add_block, which is fragile and inconsistent; convert those
tensor scalars to Python ints using .item() (e.g., set iatom =
data[AtomicDataDict.EDGE_INDEX_KEY][0][i].item() and jatom =
data[AtomicDataDict.EDGE_INDEX_KEY][1][i].item()) before using them in mask
lookups, array slicing, and the call to image_container.add_block (also ensure
any edge_shift_vec indexing uses the int values). This keeps usage consistent
with bondwise_hopping loop index i and matches patterns like
dptb/postprocess/ovp2c.py.
- Around line 118-123: The call to AtomicData_vbcsr.from_distributed is invalid
(method doesn't exist) and will raise AttributeError; update the code that
constructs adata (the AtomicData_vbcsr usage in this block) to use a supported
constructor (e.g., AtomicData_vbcsr.from_points or AtomicData_vbcsr.from_ase) or
add the missing wrapper function if a custom from_distributed was intended, and
adjust the arguments accordingly so data[AtomicDataDict.ATOM_TYPE_KEY],
data[AtomicDataDict.EDGE_INDEX_KEY], data[AtomicDataDict.EDGE_CELL_SHIFT_KEY],
data[AtomicDataDict.CELL_KEY], data[AtomicDataDict.POSITIONS_KEY], natom, nedge
and self.idp.atom_norb map to the chosen constructor’s signature; also ensure
vbcsr is declared as a dependency in pyproject.toml and/or guard the import so
runtime errors are avoided when vbcsr is absent (symbols to locate:
AtomicData_vbcsr, AtomicDataDict, ImageContainer, adata).

In `@dptb/postprocess/ovp2c.py`:
- Around line 91-96: AtomicData_vbcsr.from_distributed is being fed tensors that
may remain on CUDA because data[_keys.CELL_KEY] and data[_keys.POSITIONS_KEY]
are not moved to CPU like edge_index, atom_types, edge_vectors, and cell_shifts;
update the call site so you explicitly call .cpu() (or .to('cpu')) on
data[_keys.CELL_KEY] and data[_keys.POSITIONS_KEY] before passing them into
AtomicData_vbcsr.from_distributed (and ensure any subsequent use, e.g. when
constructing ImageContainer(adata, ...), expects CPU tensors).

🧹 Nitpick comments (7)

dptb/entrypoints/train.py (3)
99-106: Method-compatibility check is redundantly re-evaluated on every basis entry.

Lines 102–106 rebuild model_opts / pred_opts and re-check the method for each element whose basis value is a file path. Hoist this check above the for loop so it runs once.
Proposed refactor (sketch)
     if jdata.get("common_options") and jdata["common_options"].get("basis"):
+        model_opts = jdata.get("model_options", {})
+        pred_opts = model_opts.get("prediction", {})
+        if pred_opts.get("method", "e3tb") != "e3tb":
+            raise ValueError(
+                f"Orbital files in 'basis' are only supported for the 'e3tb' method. "
+                f"Found method: {pred_opts.get('method')}"
+            )
         orbital_files_content = {}
         for elem, value in jdata["common_options"]["basis"].items():
             if isinstance(value, str) and os.path.isfile(value):
-                model_opts = jdata.get("model_options", {})
-                pred_opts = model_opts.get("prediction", {})
-                if pred_opts.get("method", "e3tb") != "e3tb":
-                    raise ValueError(...)
                 try:
Note: this also means the error fires as soon as any basis entry is a file path, rather than only when the loop happens to reach one—which is the desired semantics.
118-119: Use raise ... from e and narrow the exception type.

except Exception is overly broad (catches KeyboardInterrupt-adjacent things on some runtimes) and the re-raise discards the original traceback chain.
Proposed fix
-                except Exception as e:
-                    raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}")
+                except (ValueError, FileNotFoundError, OSError) as e:
+                    raise ValueError(
+                        f"Failed to parse orbital file {value} for element {elem}: {e}"
+                    ) from e
111-114: File is read twice—once inside parse_orbital_file and again here.

parse_orbital_file(value) already opens and reads the file (see orbital_parser.py Line 25). Lines 113–114 re-open the same file to store its content. Consider having parse_orbital_file return both the basis string and the raw content, or accept the already-read content, to avoid the double I/O.
dptb/postprocess/ovp2c.py (1)
9-18: Prefer warnings.warn or logging over print for guarded imports.

Unconditional print() at import time pollutes stdout for every user who imports this module, even if they never call compute_overlap_image. Using warnings.warn(...) or a module-level logger gives callers control over message visibility.
Suggested change
+import warnings
+
 try:
     from vbcsr import ImageContainer
     from vbcsr import AtomicData as AtomicData_vbcsr
 except ImportError:
-    print("VBCSR is not installed, therefore, the compute_overlap_image is not supported.")
+    warnings.warn("VBCSR is not installed; compute_overlap_image is not supported.", ImportWarning)
 
 try:
     import op2c
 except ImportError:
-    print("OP2C is not installed, therefore, the compute_overlap is not supported.")
+    warnings.warn("OP2C is not installed; compute_overlap is not supported.", ImportWarning)
dptb/nn/energy.py (1)
188-189: Closure captures loop variables solver and A by reference (B023).

While this works today because eigsh is invoked within the same iteration, it's a known Python footgun. Bind via default arguments to make the intent explicit and prevent future breakage if the code is refactored.
Proposed fix
-            def matvec(b):
-                return solver.solve(A, b)
+            def matvec(b, _solver=solver, _A=A):
+                return _solver.solve(_A, b)
dptb/nn/hr2hR.py (2)
9-13: Chain the exception with raise ... from err.

The original traceback is lost, making debugging harder if vbcsr installation is broken (vs. missing).
Proposed fix
 try:
     from vbcsr import ImageContainer
     from vbcsr import AtomicData as AtomicData_vbcsr
-except ImportError:
-    raise ImportError("Please install vbcsr first: pip install vbcsr")
+except ImportError as err:
+    raise ImportError("Please install vbcsr first: pip install vbcsr") from err
16-25: Minor signature hygiene issues flagged by static analysis.

Line 18: basis can be None but isn't typed as Optional[...] — violates PEP 484.

Line 24: torch.device("cpu") is a function call in a default argument (B008). While immutable, it's idiomatic to use None and resolve inside the body.
Proposed fix
+from typing import Optional
+
 class Hr2HR:
     def __init__(
             self, 
-            basis: Dict[str, Union[str, list]]=None,
+            basis: Optional[Dict[str, Union[str, list]]]=None,
             idp: Union[OrbitalMapper, None]=None,
             edge_field: str = AtomicDataDict.EDGE_FEATURES_KEY,
             node_field: str = AtomicDataDict.NODE_FEATURES_KEY,
             overlap: bool = False,
             dtype: Union[str, torch.dtype] = torch.float32, 
-            device: Union[str, torch.device] = torch.device("cpu")
+            device: Union[str, torch.device] = "cpu",
             ):

coderabbitai · 2026-02-14T15:35:19Z

dptb/entrypoints/train.py

+                     raise ValueError(f"Orbital files in 'basis' are only supported for the 'e3tb' method. Found method: {pred_opts.get('method')}")
+
+                 # Also checking if we are in a mixed model which might be different, 
+                 # but usually orbital files for basis imply the main basis handling.
+                 # transform logic
+                try:
+                    parsed_basis = parse_orbital_file(value)
+                    with open(value, 'r') as f:
+                        orbital_files_content[value] = f.read()
+
+                    jdata["common_options"]["basis"][elem] = parsed_basis
+                    log.info(f"Parsed orbital file for {elem}: {value} -> {parsed_basis}")
+                except Exception as e:
+                    raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}")
+
+        if orbital_files_content:
+             jdata["common_options"]["orbital_files_content"] = orbital_files_content


⚠️ Potential issue | 🔴 Critical

Inconsistent indentation will cause IndentationError at runtime.

Several lines have off-by-one (or more) indentation that will prevent this module from loading:

Line 106: 21 spaces (expected 20 to be inside the if on Line 105).

Lines 108–110: 17 spaces (expected 16, same level as the if/try).

Line 122: 13 spaces (expected 12, inside the outer if but outside the for).

Proposed fix

- if pred_opts.get("method", "e3tb") != "e3tb": - raise ValueError(f"Orbital files in 'basis' are only supported for the 'e3tb' method. Found method: {pred_opts.get('method')}") - - # Also checking if we are in a mixed model which might be different, - # but usually orbital files for basis imply the main basis handling. - # transform logic + if pred_opts.get("method", "e3tb") != "e3tb": + raise ValueError(f"Orbital files in 'basis' are only supported for the 'e3tb' method. Found method: {pred_opts.get('method')}") + + # Also checking if we are in a mixed model which might be different, + # but usually orbital files for basis imply the main basis handling. + # transform logic

- jdata["common_options"]["orbital_files_content"] = orbital_files_content + jdata["common_options"]["orbital_files_content"] = orbital_files_content

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise ValueError(f"Orbital files in 'basis' are only supported for the 'e3tb' method. Found method: {pred_opts.get('method')}")

# Also checking if we are in a mixed model which might be different,

# but usually orbital files for basis imply the main basis handling.

# transform logic

try:

parsed_basis = parse_orbital_file(value)

with open(value, 'r') as f:

orbital_files_content[value] = f.read()

jdata["common_options"]["basis"][elem] = parsed_basis

log.info(f"Parsed orbital file for {elem}: {value} -> {parsed_basis}")

except Exception as e:

raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}")

if orbital_files_content:

jdata["common_options"]["orbital_files_content"] = orbital_files_content

raise ValueError(f"Orbital files in 'basis' are only supported for the 'e3tb' method. Found method: {pred_opts.get('method')}")

# Also checking if we are in a mixed model which might be different,

# but usually orbital files for basis imply the main basis handling.

# transform logic

try:

parsed_basis = parse_orbital_file(value)

with open(value, 'r') as f:

orbital_files_content[value] = f.read()

jdata["common_options"]["basis"][elem] = parsed_basis

log.info(f"Parsed orbital file for {elem}: {value} -> {parsed_basis}")

except Exception as e:

raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}")

jdata["common_options"]["orbital_files_content"] = orbital_files_content

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 106-106: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 118-118: Do not catch blind exception: Exception

(BLE001)

[warning] 119-119: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

[warning] 119-119: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@dptb/entrypoints/train.py` around lines 106 - 122, The snippet has inconsistent indentation causing IndentationError; align the try/except and inner statements to the surrounding if/for blocks: ensure the "try:" block, the calls to parse_orbital_file(value), the with open(...) block populating orbital_files_content[value], the jdata["common_options"]["basis"][elem] = parsed_basis assignment, and the log.info(...) line are indented uniformly at the same level inside the for/if scope, and ensure the except: and its raise ValueError(...) match the try's indentation; finally make sure the final "if orbital_files_content:" and its body jdata["common_options"]["orbital_files_content"] = orbital_files_content are indented one level inside the outer function/block. This will fix the off-by-one indentation issues around parse_orbital_file, orbital_files_content, jdata["common_options"]["basis"], and log.info.

coderabbitai · 2026-02-14T15:35:19Z

dptb/nn/energy.py

+        for k in kpoints:
+            hk = h_container.sample_k(k, symm=True)
+
+            if s_container is not None:
+                sk = s_container.sample_k(k, symm=True)
+                hk -= self.sigma * sk
+                A = hk.to_scipy(format="csr")
+                M = sk
+            else:
+                hk.shift(-self.sigma)
+                A = hk.to_scipy(format="csr")
+                M = None
+
+            A.sort_indices()
+            A.sum_duplicates()
+            N = A.shape[0]
+
+            # Try PARDISO first, fall back to scipy SuperLU if PARDISO fails
+            # (MKL PARDISO has a known bug with certain block-structured patterns)
+            solver = PyPardisoSolver(mtype=13)
+            solver.factorize(A)
+
+            def matvec(b):
+                return solver.solve(A, b)
+
+            Op = LinearOperator((N, N), matvec=matvec, dtype=A.dtype)
+
+            try:
+                # Use larger NCV to help convergence, especially for clustered eigenvalues
+                ncv =  max(2*self.neig + 1, 20)
+                vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
+            except Exception:
+                # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
+                # This often happens when eigenvalues are clustered near the shift
+                ncv =  max(5*self.neig, 50)
+                vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

scipy sparse linalg eigsh sigma OPinv shift-invert mode documentation

💡 Result:

In scipy.sparse.linalg.eigsh, shift-invert is activated by passing a real sigma (the target shift). ARPACK then works with a transformed eigenproblem so it can efficiently return eigenvalues of A x = w M x near sigma. (docs.scipy.org)

What sigma means (shift-invert)

With sigma set, eigsh needs repeated solves of the linear system

[
(A - \sigma M),x = b \quad (\text{with } M = I \text{ if omitted})
]

SciPy will form/solve this internally (e.g., sparse LU for explicit sparse matrices), or you can provide that inverse action yourself via OPinv. (docs.scipy.org)

What OPinv must be

If you pass OPinv, it must represent the operator

[
OPinv \equiv (A - \sigma M)^{-1}
]

i.e., it must apply b ↦ (A - sigma*M)^{-1} b. SciPy notes this is the required form regardless of mode. (docs.scipy.org)

How mode changes what ARPACK selects (which)

When sigma is specified, SciPy emphasizes that which no longer refers directly to w, but to shifted Ritz values w':

mode='normal': (w' = 1/(w-\sigma))

mode='cayley': (w' = (w+\sigma)/(w-\sigma))

mode='buckling': (w' = w/(w-\sigma))

So (commonly) to get eigenvalues closest to sigma, you typically use sigma=... with which='LM' (largest magnitude in the transformed problem). (docs.scipy.org)

Minimal example (providing OPinv)

import numpy as np import scipy.sparse as sp import scipy.sparse.linalg as sla A = sp.random(10000, 10000, density=1e-4, format="csc") A = (A + A.T) * 0.5 # make symmetric sigma = 0.1 # Factor (A - sigma*I) once, then expose inverse-apply as OPinv lu = sla.splu(A - sigma * sp.eye(A.shape[0], format="csc")) OPinv = sla.LinearOperator(A.shape, matvec=lu.solve) w, v = sla.eigsh(A, k=6, sigma=sigma, which="LM", OPinv=OPinv, mode="normal")

(If you don’t supply OPinv, SciPy will choose an internal approach to apply ((A-\sigma M)^{-1}) for you.) (docs.scipy.org)

Citations:

1: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html?utm_source=openai

2: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html?utm_source=openai

3: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html?utm_source=openai

4: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html?utm_source=openai

5: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigsh.html?utm_source=openai

🏁 Script executed:

# Examine the actual code file to understand the implementation cat -n dptb/nn/energy.py | sed -n '150,210p'

Repository: deepmodeling/DeePTB

Length of output: 2996

🏁 Script executed:

# Check the vbcsr type and its methods fd -e py -x grep -l "class vbcsr\|def shift\|def sample_k" dptb/

Repository: deepmodeling/DeePTB

Length of output: 7134

🏁 Script executed:

# Find vbcsr definition to understand type compatibility fd -e py -x grep -l "class vbcsr" {} \;

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check imports in energy.py to see what vbcsr is head -50 dptb/nn/energy.py | grep -E "import|from"

Repository: deepmodeling/DeePTB

Length of output: 464

🏁 Script executed:

# Check if vbcsr has scipy-compatible methods rg "def to_scipy|def __matmul__|def __mul__" -A 2 dptb/

Repository: deepmodeling/DeePTB

Length of output: 221

🏁 Script executed:

# Find where vbcsr comes from - check dptb imports and structure rg "vbcsr" dptb/nn/energy.py -B 3 -A 1

Repository: deepmodeling/DeePTB

Length of output: 315

🏁 Script executed:

# Search more broadly for vbcsr definition rg "class vbcsr" dptb/

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check the actual type used in h_container.sample_k rg "def sample_k" dptb/ -A 3

Repository: deepmodeling/DeePTB

Length of output: 322

🏁 Script executed:

# Search for vbcsr imports rg "from.*vbcsr|import.*vbcsr" dptb/

Repository: deepmodeling/DeePTB

Length of output: 448

🏁 Script executed:

# Check if vbcsr is external package fd -e py -x grep -l "ImageContainer" {} \;

Repository: deepmodeling/DeePTB

Length of output: 243

🏁 Script executed:

# Look for what happens with hk when passed to eigsh - check if it's converted rg "eigsh.*A=" dptb/ -B 3 -A 3

Repository: deepmodeling/DeePTB

Length of output: 1025

🌐 Web query:

vbcsr Python library scipy compatible sparse matrix

💡 Result:

vbcsr is SciPy-compatible in the practical sense (works with SciPy sparse workflows), but it is not a scipy.sparse.spmatrix subclass.

Convert from SciPy sparse → VBCSR: VBCSR.from_scipy(A_scipy) (the docs mention SciPy BSR or CSR inputs). [1]

Use in SciPy iterative solvers: vbcsr.VBCSR implements the scipy.sparse.linalg.LinearOperator interface, so you can pass a VBCSR matrix directly to solvers like cg, gmres, etc. [1]

Install: pip install vbcsr (latest release shown on PyPI is 0.2.0, Feb 10, 2026). [1]

Sources: [1]

🏁 Script executed:

# Check the vbcsr type more specifically - see if it has scipy interoperability rg "\.shift\(" dptb/ -B 2 -A 2

Repository: deepmodeling/DeePTB

Length of output: 302

🏁 Script executed:

# Check what methods vbcsr objects have rg "to_scipy|__matmul__" dptb/ -B 1 -A 3

Repository: deepmodeling/DeePTB

Length of output: 2583

eigsh receives already-shifted hk together with sigma — this causes a double-shift bug.

When eigsh is called with sigma=self.sigma and OPinv, it internally computes (A - sigma*M)^{-1}. However, on line 171 (branch with overlap), hk is already shifted to (H − σS), and on line 175 (branch without overlap), hk is shifted by −σ.

By line 196, when eigsh(A=hk, M=M, sigma=self.sigma, OPinv=Op, ...) is called:

eigsh sees A = (H − σS) and sigma = σ

It attempts to compute ((H − σS) − σ*S)^{-1} = (H − 2σS)^{-1}

But OPinv was factorized as only (H − σS)^{-1}

Result: wrong operator, wrong eigenvalues

Fix: Keep the original Hamiltonian for eigsh's A argument; use the shifted matrix only for the Pardiso factorization.

Additionally, on line 173, M = sk is a vbcsr object. While vbcsr implements the LinearOperator interface, convert it to scipy format with sk.to_scipy(format="csr") for consistency and to ensure full compatibility with eigsh.

Proposed fix (sketch)

for k in kpoints: hk = h_container.sample_k(k, symm=True) + hk_scipy = hk.to_scipy(format="csr") # original H(k) for eigsh if s_container is not None: sk = s_container.sample_k(k, symm=True) - hk -= self.sigma * sk - A = hk.to_scipy(format="csr") - M = sk + sk_scipy = sk.to_scipy(format="csr") + A_shifted = hk_scipy - self.sigma * sk_scipy + M = sk_scipy else: - hk.shift(-self.sigma) - A = hk.to_scipy(format="csr") + from scipy.sparse import eye + A_shifted = hk_scipy - self.sigma * eye(hk_scipy.shape[0], format="csr") M = None - A.sort_indices() - A.sum_duplicates() - N = A.shape[0] + A_shifted.sort_indices() + A_shifted.sum_duplicates() + N = A_shifted.shape[0] solver = PyPardisoSolver(mtype=13) - solver.factorize(A) + solver.factorize(A_shifted) def matvec(b): - return solver.solve(A, b) + return solver.solve(A_shifted, b) - Op = LinearOperator((N, N), matvec=matvec, dtype=A.dtype) + Op = LinearOperator((N, N), matvec=matvec, dtype=A_shifted.dtype) try: ncv = max(2*self.neig + 1, 20) - vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) + vals, vecs = eigsh(A=hk_scipy, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) except Exception: ncv = max(5*self.neig, 50) - vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) + vals, vecs = eigsh(A=hk_scipy, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 189-189: Function definition does not bind loop variable solver

(B023)

[warning] 189-189: Function definition does not bind loop variable A

(B023)

[warning] 197-197: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@dptb/nn/energy.py` around lines 166 - 201, The code is double-shifting the matrix: hk is mutated to (H − σS) or (H − σ) before being passed as A to eigsh while eigsh is also given sigma=self.sigma and OPinv built from the shifted factorization, causing (H − 2σS) mismatch. Fix by preserving an unshifted copy (e.g., hk_orig = hk) to pass as A to eigsh, then create a separate shifted matrix only for Pardiso factorization (use solver.factorize on the shifted matrix), build Op from that factorization, and pass M as a scipy CSR (convert sk via sk.to_scipy(format="csr") when s_container is present) so eigsh receives the original H (hk_orig), M as CSR, OPinv from the factorized shifted matrix, and sigma=self.sigma.

coderabbitai · 2026-02-14T15:35:19Z

dptb/nn/energy.py

+            try:
+                # Use larger NCV to help convergence, especially for clustered eigenvalues
+                ncv =  max(2*self.neig + 1, 20)
+                vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
+            except Exception:
+                # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
+                # This often happens when eigenvalues are clustered near the shift
+                ncv =  max(5*self.neig, 50)
+                vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=self.sigma, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)


⚠️ Potential issue | 🟠 Major

Broad except Exception silently retries — real errors will be masked.

If eigsh raises for a reason other than ARPACK convergence (e.g., incompatible matrix types, memory error), the retry will likely also fail, but the original traceback is lost. Catch scipy.sparse.linalg.ArpackNoConvergence (or at least ArpackError) specifically, or at minimum log the first exception.

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 197-197: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In `@dptb/nn/energy.py` around lines 193 - 201, The broad except around the eigsh call masks real errors; change it to catch scipy.sparse.linalg.ArpackNoConvergence (or ArpackError) specifically when retrying, and ensure the original exception is logged before retrying so non-ARPACK errors are not swallowed. Locate the eigsh invocation (vars: hk, M, Op, self.neig, self.sigma, mode, ncv) and replace the bare "except Exception:" with a targeted "except ArpackNoConvergence/ArpackError as err:" (importing the symbol), call the logger to record err, then perform the larger-ncv retry; leave other exceptions to propagate.

coderabbitai · 2026-02-14T15:35:19Z

dptb/nn/energy.py

+        try:
+            self.solver = FeastSolver()
+        except ImportError as e:
+            raise ImportError(f"FEAST solver not available: {e}")
+        except Exception as e:
+             raise RuntimeError(f"Failed to initialize FeastSolver: {e}")


🛠️ Refactor suggestion | 🟠 Major

Chain exceptions with from e to preserve the original traceback.

Per B904, re-raised exceptions inside except blocks should use raise ... from e so the caller sees the full chain.

Proposed fix

try: self.solver = FeastSolver() except ImportError as e: - raise ImportError(f"FEAST solver not available: {e}") + raise ImportError(f"FEAST solver not available: {e}") from e except Exception as e: - raise RuntimeError(f"Failed to initialize FeastSolver: {e}") + raise RuntimeError(f"Failed to initialize FeastSolver: {e}") from e

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 240-240: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

[warning] 240-240: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 241-241: Do not catch blind exception: Exception

(BLE001)

[warning] 242-242: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

[warning] 242-242: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@dptb/nn/energy.py` around lines 237 - 242, The except blocks around FeastSolver initialization should chain the original exceptions so tracebacks are preserved; modify the two re-raises in the try/except that set self.solver = FeastSolver() to use "raise ImportError(... ) from e" for the ImportError branch and "raise RuntimeError(... ) from e" for the generic Exception branch, keeping the same messages but adding "from e" to both raise statements referencing FeastSolver and self.solver initialization.

coderabbitai · 2026-02-14T15:35:19Z

dptb/nn/hr2hR.py

+        edge_shift_vec = data[AtomicDataDict.EDGE_CELL_SHIFT_KEY]
+
+        soc = data.get(AtomicDataDict.NODE_SOC_SWITCH_KEY, False)
+        ndtype = np.float64


⚠️ Potential issue | 🟠 Major

ndtype is hardcoded to np.float64, ignoring self.dtype.

When self.dtype is torch.float32, the numpy blocks will still be float64, causing a silent precision/memory mismatch. This should derive from self.dtype.

Proposed fix

- ndtype = np.float64 + ndtype = tdtype2ndtype(self.dtype)

You already import tdtype2ndtype on line 7 but never use it — this is likely its intended use site.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ndtype = np.float64

ndtype = tdtype2ndtype(self.dtype)

🤖 Prompt for AI Agents

In `@dptb/nn/hr2hR.py` at line 69, The ndtype is hardcoded to np.float64; change it to derive from the object's torch dtype by using the imported converter tdtype2ndtype so numpy arrays match self.dtype (e.g., replace the line setting ndtype = np.float64 with ndtype = tdtype2ndtype(self.dtype)); ensure this change is made in the same scope where ndtype is used (in hr2hR.py near the ndtype initialization) so numpy arrays respect the instance's precision.

coderabbitai · 2026-02-14T15:35:19Z

dptb/nn/hr2hR.py

+        if soc: 
+            # this soc only support sktb.
+            orbpair_soc = data[AtomicDataDict.NODE_SOC_KEY]
+            soc_upup_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device)
+            soc_updn_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device)
+            ndtype = np.complex128


⚠️ Potential issue | 🟠 Major

SOC ndtype is also hardcoded to np.complex128, ignoring self.ctype.

When soc is true, line 78 forces ndtype = np.complex128 regardless of self.dtype. For float32/complex64 workflows, this wastes memory and creates a type mismatch with the torch tensors. Use tdtype2ndtype(self.ctype) to stay consistent.

Proposed fix

if soc: orbpair_soc = data[AtomicDataDict.NODE_SOC_KEY] soc_upup_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device) soc_updn_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device) - ndtype = np.complex128 + ndtype = tdtype2ndtype(self.ctype)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if soc:

# this soc only support sktb.

orbpair_soc = data[AtomicDataDict.NODE_SOC_KEY]

soc_upup_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device)

soc_updn_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device)

ndtype = np.complex128

if soc:

# this soc only support sktb.

orbpair_soc = data[AtomicDataDict.NODE_SOC_KEY]

soc_upup_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device)

soc_updn_block = torch.zeros((natom, self.idp.full_basis_norb, self.idp.full_basis_norb), dtype=self.ctype, device=self.device)

ndtype = tdtype2ndtype(self.ctype)

🤖 Prompt for AI Agents

In `@dptb/nn/hr2hR.py` around lines 73 - 78, The code sets ndtype = np.complex128 unconditionally in the soc branch which can mismatch self.ctype and waste memory; change the assignment to derive ndtype from self.ctype by calling tdtype2ndtype(self.ctype) so the NumPy dtype matches the torch tensor dtype used for orbpair_soc, soc_upup_block and soc_updn_block (these variables and the soc branch that reads AtomicDataDict.NODE_SOC_KEY should be updated accordingly).

coderabbitai · 2026-02-14T15:35:19Z

dptb/nn/hr2hR.py

+        adata = AtomicData_vbcsr.from_distributed(
+            natom, natom, 0, nedge, nedge,
+            list(range(natom)), data[AtomicDataDict.ATOM_TYPE_KEY], data[AtomicDataDict.EDGE_INDEX_KEY].T, self.idp.atom_norb, data[AtomicDataDict.EDGE_CELL_SHIFT_KEY],
+            data[AtomicDataDict.CELL_KEY], data[AtomicDataDict.POSITIONS_KEY]
+        )
+        image_container = ImageContainer(adata, ndtype)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find where AtomicData_vbcsr is imported from in the file rg "AtomicData_vbcsr|from_distributed" dptb/nn/hr2hR.py -A 2 -B 2

Repository: deepmodeling/DeePTB

Length of output: 569

🏁 Script executed:

# Search for imports in the file head -50 dptb/nn/hr2hR.py

Repository: deepmodeling/DeePTB

Length of output: 1901

🏁 Script executed:

# Check if AtomicData_vbcsr is defined elsewhere in the codebase fd -e py -x rg -l "class AtomicData_vbcsr" {} \;

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Look for the vbcsr module or imports of it rg "vbcsr|AtomicData" --type py | head -30

Repository: deepmodeling/DeePTB

Length of output: 2922

🏁 Script executed:

# Try to inspect vbcsr.AtomicData.from_distributed method signature pip install vbcsr 2>&1 | tail -5 python3 << 'EOF' try: from vbcsr import AtomicData import inspect # Get the signature of from_distributed sig = inspect.signature(AtomicData.from_distributed) print("Signature of AtomicData.from_distributed:") print(f" {sig}") # Get detailed parameter info print("\nParameters:") for i, (param_name, param) in enumerate(sig.parameters.items(), 1): print(f" {i}. {param_name}: {param.annotation if param.annotation != inspect.Parameter.empty else 'no annotation'}") # Also try to get docstring print("\nDocstring:") print(AtomicData.from_distributed.__doc__) except Exception as e: print(f"Error: {e}") import traceback traceback.print_exc() EOF

Repository: deepmodeling/DeePTB

Length of output: 963

🌐 Web query:

vbcsr AtomicData from_distributed documentation

💡 Result:

I couldn’t find any public vbcsr documentation (PyPI page, etc.) that defines an AtomicData type or an AtomicData.from_distributed(...) method/function. The published vbcsr docs/examples focus on the VBCSR matrix class and distributed creation via methods like VBCSR.create_distributed(...) (plus from_scipy, assemble, mult, spmm, etc.). [1]

If you meant DeePTB/NequIP-style AtomicData, DeePTB documents AtomicData.from_ase(...) and AtomicData.from_points(...), but I didn’t see from_distributed mentioned there either. [2]

Most likely next step: check where you saw AtomicData.from_distributed (a code snippet, error message, or a specific repo/branch) and share that link or snippet—then I can track down the exact definition and its intended usage.

Sources
[1] vbcsr PyPI project description & examples (distributed usage via VBCSR.create_distributed) https://pypi.org/project/vbcsr/
[2] DeePTB docs: AtomicData.from_ase / AtomicData.from_points https://deeptb.readthedocs.io/en/latest/quick_start/basic_api.html

🏁 Script executed:

# Search the entire DeePTB codebase for from_distributed definition rg "def from_distributed" --type py

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check if AtomicData_vbcsr is extended/monkey-patched anywhere rg "AtomicData_vbcsr\s*\." --type py | head -20

Repository: deepmodeling/DeePTB

Length of output: 203

🏁 Script executed:

# Also check for "from_distributed" as a general string in py files rg "from_distributed" --type py -B 3 -A 3

Repository: deepmodeling/DeePTB

Length of output: 1100

🏁 Script executed:

# Check the full context of the method call and any comments around it sed -n '110,130p' dptb/nn/hr2hR.py

Repository: deepmodeling/DeePTB

Length of output: 1188

🏁 Script executed:

# Check setup.py, requirements, and pyproject.toml for vbcsr version requirements find . -name "setup.py" -o -name "setup.cfg" -o -name "requirements*.txt" -o -name "pyproject.toml" | xargs grep -l vbcsr 2>/dev/null

Repository: deepmodeling/DeePTB

Length of output: 47

🏁 Script executed:

# Show the vbcsr version requirements rg "vbcsr" setup.py setup.cfg pyproject.toml requirements*.txt -A 1 -B 1

Repository: deepmodeling/DeePTB

Length of output: 219

🏁 Script executed:

# Check if there's any custom extension or monkey-patching of AtomicData rg "AtomicData_vbcsr\s*=" --type py -A 5

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Search for any imports or dynamic methods added to AtomicData rg "setattr.*AtomicData|AtomicData.*from_distributed\s*=" --type py

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Find all config files in the repo fd -e "toml" -e "cfg" -e "txt" | grep -E "(setup|requirements|pyproject)" | head -20

Repository: deepmodeling/DeePTB

Length of output: 99

🏁 Script executed:

# Check for pyproject.toml specifically cat pyproject.toml 2>/dev/null | grep -A 10 -B 2 "vbcsr"

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check for any requirements files find . -maxdepth 2 -type f -name "*requirements*" -exec cat {} \; 2>/dev/null | grep vbcsr

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# List all top-level config files ls -la | grep -E "setup|requirements|pyproject|poetry"

Repository: deepmodeling/DeePTB

Length of output: 129

🏁 Script executed:

# Read pyproject.toml to find vbcsr requirements cat pyproject.toml

Repository: deepmodeling/DeePTB

Length of output: 2459

🏁 Script executed:

# Check if vbcsr is mentioned anywhere in the project rg "vbcsr" --type py -l

Repository: deepmodeling/DeePTB

Length of output: 261

🏁 Script executed:

# Check the imports and any __init__.py files that might extend vbcsr cat dptb/__init__.py 2>/dev/null | head -50

Repository: deepmodeling/DeePTB

Length of output: 138

🏁 Script executed:

# Check if there's any monkey-patching or extension in the vbcsr module usage area rg -B 5 -A 5 "from vbcsr" dptb/nn/hr2hR.py

Repository: deepmodeling/DeePTB

Length of output: 403

🏁 Script executed:

# Check git history or comments about from_distributed rg "from_distributed" -B 5 -A 5 --type py

Repository: deepmodeling/DeePTB

Length of output: 1641

🏁 Script executed:

# Check if vbcsr is in pyproject.toml at all (might be undeclared) cat pyproject.toml | grep -i vbcsr

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check for any fork or custom vbcsr implementation find . -name "vbcsr*" -o -name "*vbcsr*" 2>/dev/null

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check git for any recent changes or commits mentioning from_distributed git log --all --oneline --grep="from_distributed" 2>/dev/null || echo "No git history available"

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Look at the tests to understand how from_distributed is supposed to work cat dptb/tests/test_postprocess_hr.py 2>/dev/null | head -100

Repository: deepmodeling/DeePTB

Length of output: 1613

🏁 Script executed:

# Check the unified system to see if there's more context on vbcsr usage rg "from_distributed\|AtomicData_vbcsr" dptb/postprocess/unified/system.py -B 3 -A 10

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Double-check if there's a vbcsr fork or custom version in requirements find . -name "*.txt" -o -name "*.toml" -o -name "*.cfg" | xargs grep -i vbcsr 2>/dev/null

Repository: deepmodeling/DeePTB

Length of output: 47

🏁 Script executed:

# Check if vbcsr is installed as an optional dependency elsewhere rg "vbcsr" docs/requirements.txt 2>/dev/null || echo "Not found"

Repository: deepmodeling/DeePTB

Length of output: 72

🏁 Script executed:

# Verify the actual vbcsr package once more - check all available methods on AtomicData python3 << 'EOF' from vbcsr import AtomicData import inspect print("Available methods and attributes on vbcsr.AtomicData:") members = inspect.getmembers(AtomicData) for name, _ in members: if not name.startswith('_'): print(f" - {name}") EOF

Repository: deepmodeling/DeePTB

Length of output: 318

🏁 Script executed:

# Check all vbcsr versions available pip index versions vbcsr 2>&1 | head -20

Repository: deepmodeling/DeePTB

Length of output: 273

🏁 Script executed:

# Try installing an older version to see if from_distributed exists pip install "vbcsr<0.2.0" 2>&1 | tail -10

Repository: deepmodeling/DeePTB

Length of output: 601

🏁 Script executed:

# Check if there's any custom implementation or wrapper in DeePTB rg "class.*AtomicData|def.*from_distributed" --type py

Repository: deepmodeling/DeePTB

Length of output: 829

🏁 Script executed:

# Check if vbcsr is being extended somewhere in __init__ or conftest find . -name "conftest.py" -o -name "__init__.py" | xargs rg "from_distributed\|AtomicData" 2>/dev/null

Repository: deepmodeling/DeePTB

Length of output: 47

🏁 Script executed:

# Check the latest vbcsr on PyPI for method availability python3 << 'EOF' import subprocess import json # Get package info from PyPI try: result = subprocess.run( ["pip", "index", "versions", "vbcsr"], capture_output=True, text=True, timeout=10 ) print("Available vbcsr versions:") print(result.stdout) except Exception as e: print(f"Could not fetch versions: {e}") # Also check current vbcsr.__version__ try: import vbcsr print(f"\nCurrent vbcsr version: {vbcsr.__version__ if hasattr(vbcsr, '__version__') else 'unknown'}") except Exception as e: print(f"Error: {e}") EOF

Repository: deepmodeling/DeePTB

Length of output: 204

🏁 Script executed:

# Check if vbcsr 0.1.1 has from_distributed python3 << 'EOF' from vbcsr import AtomicData import inspect print("Checking vbcsr.AtomicData for from_distributed:") if hasattr(AtomicData, 'from_distributed'): print("✓ from_distributed method found!") sig = inspect.signature(AtomicData.from_distributed) print(f"Signature: {sig}") print("\nParameters:") for i, (param_name, param) in enumerate(sig.parameters.items(), 1): print(f" {i}. {param_name}: {param.annotation if param.annotation != inspect.Parameter.empty else 'no annotation'}") print("\nDocstring:") print(AtomicData.from_distributed.__doc__) else: print("✗ from_distributed method NOT found") print("\nAvailable methods:") members = inspect.getmembers(AtomicData, predicate=inspect.ismethod) for name, _ in members: if not name.startswith('_'): print(f" - {name}") EOF

Repository: deepmodeling/DeePTB

Length of output: 264

🏁 Script executed:

# Check what vbcsr 0.1.1 actually exports python3 << 'EOF' import vbcsr print("vbcsr.__version__:", vbcsr.__version__) print("\nAvailable in vbcsr:") print(dir(vbcsr)) EOF

Repository: deepmodeling/DeePTB

Length of output: 400

🏁 Script executed:

# Go back to 0.2.0 and check when from_distributed was added pip install "vbcsr==0.2.0" -q python3 << 'EOF' import vbcsr print("vbcsr.__version__:", vbcsr.__version__) print("\nAvailable in vbcsr:") for item in dir(vbcsr): if not item.startswith('_'): print(f" - {item}") # Check if AtomicData exists and what it has try: from vbcsr import AtomicData print("\nAtomicData found in vbcsr 0.2.0") print("Methods on AtomicData:") for attr in dir(AtomicData): if not attr.startswith('_') and 'from' in attr: print(f" - {attr}") except ImportError as e: print(f"\nAtomicData NOT found: {e}") EOF

Repository: deepmodeling/DeePTB

Length of output: 436

🏁 Script executed:

# Check git history for when from_distributed was introduced git log -p --all -- "dptb/nn/hr2hR.py" | grep -A 5 -B 5 "from_distributed" | head -50 2>/dev/null || echo "Could not get git history"

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check if there's a fork or special vbcsr version referenced anywhere git remote -v 2>/dev/null || echo "Not a git repo or no remotes"

Repository: deepmodeling/DeePTB

Length of output: 189

🏁 Script executed:

# Look for any comments or TODOs about from_distributed rg "from_distributed|TODO|FIXME" dptb/nn/hr2hR.py dptb/postprocess/ovp2c.py -B 3 -A 3

Repository: deepmodeling/DeePTB

Length of output: 1100

🏁 Script executed:

# Check if vbcsr 0.2.0 has from_distributed as a function rather than method python3 << 'EOF' import vbcsr import inspect # Check if from_distributed is a top-level function if hasattr(vbcsr, 'from_distributed'): print("Found vbcsr.from_distributed!") sig = inspect.signature(vbcsr.from_distributed) print(f"Signature: {sig}") else: print("from_distributed is NOT a top-level vbcsr function") # Check AtomicData class more thoroughly from vbcsr import AtomicData print("\nAll AtomicData attributes/methods:") for attr in sorted(dir(AtomicData)): print(f" {attr}") EOF

Repository: deepmodeling/DeePTB

Length of output: 662

AtomicData_vbcsr.from_distributed does not exist in vbcsr — this call will fail at runtime.

The vbcsr library (0.2.0, the latest version) does not provide a from_distributed method on AtomicData. The available constructors are from_ase, from_file, and from_points. This method is not defined anywhere in the DeePTB codebase either. Additionally, vbcsr is not declared as a dependency in pyproject.toml, though the code unconditionally imports it. This call at lines 118–123 will raise AttributeError at runtime.

Verify the correct method name and vbcsr version constraints with the vbcsr maintainers or check if a custom extension is intended but missing.

🤖 Prompt for AI Agents

In `@dptb/nn/hr2hR.py` around lines 118 - 123, The call to AtomicData_vbcsr.from_distributed is invalid (method doesn't exist) and will raise AttributeError; update the code that constructs adata (the AtomicData_vbcsr usage in this block) to use a supported constructor (e.g., AtomicData_vbcsr.from_points or AtomicData_vbcsr.from_ase) or add the missing wrapper function if a custom from_distributed was intended, and adjust the arguments accordingly so data[AtomicDataDict.ATOM_TYPE_KEY], data[AtomicDataDict.EDGE_INDEX_KEY], data[AtomicDataDict.EDGE_CELL_SHIFT_KEY], data[AtomicDataDict.CELL_KEY], data[AtomicDataDict.POSITIONS_KEY], natom, nedge and self.idp.atom_norb map to the chosen constructor’s signature; also ensure vbcsr is declared as a dependency in pyproject.toml and/or guard the import so runtime errors are avoided when vbcsr is absent (symbols to locate: AtomicData_vbcsr, AtomicDataDict, ImageContainer, adata).

coderabbitai · 2026-02-14T15:35:19Z

dptb/nn/hr2hR.py

+        for i, hblock in enumerate(bondwise_hopping):
+            iatom = data[AtomicDataDict.EDGE_INDEX_KEY][0][i]
+            jatom = data[AtomicDataDict.EDGE_INDEX_KEY][1][i]
+            imask = self.idp.mask_to_basis[atom_type[iatom]]
+            jmask = self.idp.mask_to_basis[atom_type[jatom]]
+            masked_hblock = hblock[imask][:,jmask]
+            nrol, ncol = masked_hblock.shape
+            full_block = np.zeros([nrol*spin_factor, ncol*spin_factor], dtype=ndtype)
+            if soc:
+                full_block[:nrol,:ncol] = masked_hblock
+                full_block[nrol:,ncol:] = masked_hblock
+            else:
+                full_block[:nrol,:ncol] = masked_hblock
+            full_block = np.ascontiguousarray(full_block)
+
+            image_container.add_block(g_row=iatom, g_col=jatom, data=full_block, R=edge_shift_vec[i], mode="insert")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n dptb/nn/hr2hR.py | head -170 | tail -40

Repository: deepmodeling/DeePTB

Length of output: 2355

🏁 Script executed:

# Search for add_block method definition and usages rg -n "def add_block|\.add_block" --type py -B 2 -A 5

Repository: deepmodeling/DeePTB

Length of output: 1892

🏁 Script executed:

# Check the vbcsr module import and understand what it is rg -n "vbcsr|from_distributed" dptb/nn/hr2hR.py | head -20

Repository: deepmodeling/DeePTB

Length of output: 287

🏁 Script executed:

# Look at the data structure and EDGE_INDEX_KEY to understand its type rg -n "EDGE_INDEX_KEY\|AtomicDataDict" dptb/nn/hr2hR.py -B 1 -A 1 | head -30

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check the data structure - what is EDGE_INDEX_KEY and atom_type rg -n "EDGE_INDEX_KEY|ATOM_TYPE_KEY" dptb/nn/hr2hR.py -B 2 -A 2 | head -40

Repository: deepmodeling/DeePTB

Length of output: 1640

🏁 Script executed:

# Look earlier in the function to see how atom_type is created and data is processed sed -n '1,160p' dptb/nn/hr2hR.py | tail -60

Repository: deepmodeling/DeePTB

Length of output: 3441

🏁 Script executed:

# Check how edge_shift_vec is constructed rg -n "edge_shift_vec" dptb/nn/hr2hR.py -B 3 -A 1

Repository: deepmodeling/DeePTB

Length of output: 621

🏁 Script executed:

# Check the type of data coming from AtomicDataDict - are these torch tensors? rg -n "class AtomicDataDict\|EDGE_INDEX_KEY\s*=" --type py -A 5 | head -40

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check if mask_to_basis can handle tensor keys or if it's a list rg -n "mask_to_basis\s*=" --type py -B 2 -A 2 | head -40

Repository: deepmodeling/DeePTB

Length of output: 3028

🏁 Script executed:

# Look for type hints or documentation on add_block fd "\.py$" -x grep -l "ImageContainer\|add_block" {} \; | head -5 | xargs grep -n "class ImageContainer\|def add_block" 2>/dev/null

Repository: deepmodeling/DeePTB

Length of output: 47

🏁 Script executed:

# Check if there are actual test files or usage patterns that show what types add_block expects rg -n "add_block.*g_row|add_block.*g_col" --type py -B 1 -A 1

Repository: deepmodeling/DeePTB

Length of output: 444

🏁 Script executed:

# Look for how vbcsr.ImageContainer is defined (might be in external lib) # Let's check if there's any documentation or type hints in the repo rg -n "ImageContainer" --type py -B 5 -A 10 | grep -A 15 "class ImageContainer" | head -20

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Check the difference between line 142 (onsite) and 159 (bondwise) more carefully # Line 142 uses loop counter i (int), line 159 uses iatom (tensor) # Let's verify atom_type indexing behavior with tensor keys python3 << 'EOF' import torch # Simulate what happens at lines 145-148 EDGE_INDEX = torch.tensor([[0, 1, 2], [1, 2, 0]]) # shape (2, 3) atom_type = torch.tensor([0, 1, 2]) # shape (3,) mask_to_basis = torch.tensor([[True, True], [False, True], [True, False]]) # shape (3, 2) i = 0 iatom = EDGE_INDEX[0][i] # This is a 0-d tensor print(f"iatom type: {type(iatom)}, value: {iatom}, is tensor: {isinstance(iatom, torch.Tensor)}") print(f"iatom.item(): {iatom.item()}") # Try indexing atom_type with 0-d tensor atom_type_val = atom_type[iatom] print(f"atom_type[iatom] type: {type(atom_type_val)}, value: {atom_type_val}") # Try indexing mask_to_basis with this value try: imask = mask_to_basis[atom_type_val] print(f"mask_to_basis[atom_type[iatom]] works: shape={imask.shape}, value={imask}") except Exception as e: print(f"mask_to_basis[atom_type[iatom]] fails: {e}") # Try with .item() try: imask = mask_to_basis[atom_type[iatom].item()] print(f"mask_to_basis[atom_type[iatom.item()]] works: shape={imask.shape}, value={imask}") except Exception as e: print(f"mask_to_basis[atom_type[iatom.item()]] fails: {e}") EOF

Repository: deepmodeling/DeePTB

Length of output: 180

🏁 Script executed:

# Check if there are any issues or TODOs in the file related to tensor conversion rg -n "TODO|FIXME|BUG|tensor|device|cpu|gpu" dptb/nn/hr2hR.py | head -30

Repository: deepmodeling/DeePTB

Length of output: 1035

🏁 Script executed:

# Check more context around how vbcsr is used - look at the from_distributed call sed -n '118,122p' dptb/nn/hr2hR.py

Repository: deepmodeling/DeePTB

Length of output: 413

🏁 Script executed:

# Check if there's a vbcsr documentation or setup.py mentioning it find . -name "*.md" -o -name "*.rst" -o -name "setup.py" -o -name "requirements*.txt" | xargs grep -l "vbcsr" 2>/dev/null | head -5

Repository: deepmodeling/DeePTB

Length of output: 45

🏁 Script executed:

# Look at the complete context again - lines 110-120 for CPU transfer mentioned in original comment sed -n '110,122p' dptb/nn/hr2hR.py

Repository: deepmodeling/DeePTB

Length of output: 790

Convert tensor indices to Python ints for consistency and clarity.

Lines 145-146 extract 0-d tensors from EDGE_INDEX_KEY and pass them directly to add_block(). This is inconsistent with line 142 (which uses plain int i) and with dptb/postprocess/ovp2c.py (which explicitly converts tensor values with .item()). Additionally, line 147 chains this with tensor indexing, creating fragile code. Use .item() to extract scalar values.

Proposed fix

for i, hblock in enumerate(bondwise_hopping): - iatom = data[AtomicDataDict.EDGE_INDEX_KEY][0][i] - jatom = data[AtomicDataDict.EDGE_INDEX_KEY][1][i] + iatom = data[AtomicDataDict.EDGE_INDEX_KEY][0][i].item() + jatom = data[AtomicDataDict.EDGE_INDEX_KEY][1][i].item()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for i, hblock in enumerate(bondwise_hopping):

iatom = data[AtomicDataDict.EDGE_INDEX_KEY][0][i]

jatom = data[AtomicDataDict.EDGE_INDEX_KEY][1][i]

imask = self.idp.mask_to_basis[atom_type[iatom]]

jmask = self.idp.mask_to_basis[atom_type[jatom]]

masked_hblock = hblock[imask][:,jmask]

nrol, ncol = masked_hblock.shape

full_block = np.zeros([nrol*spin_factor, ncol*spin_factor], dtype=ndtype)

if soc:

full_block[:nrol,:ncol] = masked_hblock

full_block[nrol:,ncol:] = masked_hblock

else:

full_block[:nrol,:ncol] = masked_hblock

full_block = np.ascontiguousarray(full_block)

image_container.add_block(g_row=iatom, g_col=jatom, data=full_block, R=edge_shift_vec[i], mode="insert")

for i, hblock in enumerate(bondwise_hopping):

iatom = data[AtomicDataDict.EDGE_INDEX_KEY][0][i].item()

jatom = data[AtomicDataDict.EDGE_INDEX_KEY][1][i].item()

imask = self.idp.mask_to_basis[atom_type[iatom]]

jmask = self.idp.mask_to_basis[atom_type[jatom]]

masked_hblock = hblock[imask][:,jmask]

nrol, ncol = masked_hblock.shape

full_block = np.zeros([nrol*spin_factor, ncol*spin_factor], dtype=ndtype)

if soc:

full_block[:nrol,:ncol] = masked_hblock

full_block[nrol:,ncol:] = masked_hblock

else:

full_block[:nrol,:ncol] = masked_hblock

full_block = np.ascontiguousarray(full_block)

image_container.add_block(g_row=iatom, g_col=jatom, data=full_block, R=edge_shift_vec[i], mode="insert")

🤖 Prompt for AI Agents

In `@dptb/nn/hr2hR.py` around lines 144 - 159, iatom and jatom are taken as 0-d tensors from data[AtomicDataDict.EDGE_INDEX_KEY] and then used for indexing and passed to image_container.add_block, which is fragile and inconsistent; convert those tensor scalars to Python ints using .item() (e.g., set iatom = data[AtomicDataDict.EDGE_INDEX_KEY][0][i].item() and jatom = data[AtomicDataDict.EDGE_INDEX_KEY][1][i].item()) before using them in mask lookups, array slicing, and the call to image_container.add_block (also ensure any edge_shift_vec indexing uses the int values). This keeps usage consistent with bondwise_hopping loop index i and matches patterns like dptb/postprocess/ovp2c.py.

coderabbitai · 2026-02-14T15:35:20Z

dptb/postprocess/ovp2c.py

+    adata = AtomicData_vbcsr.from_distributed(
+        natom, natom, 0, nedge, nedge,
+        list(range(natom)), atom_types, edge_index.T, idp.atom_norb, cell_shifts,
+        data[_keys.CELL_KEY], data[_keys.POSITIONS_KEY]
+    )
+    image_container = ImageContainer(adata, np.float64)


⚠️ Potential issue | 🟠 Major

Potential device mismatch: CELL_KEY and POSITIONS_KEY are not moved to CPU.

Lines 84–87 explicitly call .cpu() on edge_index, atom_types, edge_vectors, and cell_shifts, but data[_keys.CELL_KEY] and data[_keys.POSITIONS_KEY] on lines 94 are passed directly. If the input data resides on GPU, these will remain as CUDA tensors, likely causing a device mismatch or error inside AtomicData_vbcsr.from_distributed.

Suggested fix

adata = AtomicData_vbcsr.from_distributed( natom, natom, 0, nedge, nedge, list(range(natom)), atom_types, edge_index.T, idp.atom_norb, cell_shifts, - data[_keys.CELL_KEY], data[_keys.POSITIONS_KEY] + data[_keys.CELL_KEY].cpu(), data[_keys.POSITIONS_KEY].cpu() )

🤖 Prompt for AI Agents

In `@dptb/postprocess/ovp2c.py` around lines 91 - 96, AtomicData_vbcsr.from_distributed is being fed tensors that may remain on CUDA because data[_keys.CELL_KEY] and data[_keys.POSITIONS_KEY] are not moved to CPU like edge_index, atom_types, edge_vectors, and cell_shifts; update the call site so you explicitly call .cpu() (or .to('cpu')) on data[_keys.CELL_KEY] and data[_keys.POSITIONS_KEY] before passing them into AtomicData_vbcsr.from_distributed (and ensure any subsequent use, e.g. when constructing ImageContainer(adata, ...), expects CPU tensors).

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@dptb/entrypoints/train.py`:
- Around line 118-119: The current except block in dptb/entrypoints/train.py
that reads "except Exception as e: raise ValueError(f'Failed to parse orbital
file {value} for element {elem}: {e}')" drops the original traceback; change it
to chain the original exception by using "raise ValueError(f'Failed to parse
orbital file {value} for element {elem}: {e}') from e" (and, if feasible, narrow
the catch to specific exceptions instead of bare Exception) so the original
error context for value and elem is preserved.

In `@dptb/postprocess/unified/calculator.py`:
- Line 177: The call to os.remove(orb_name) will raise NameError because the
module never imports os; add "import os" to the top import section of
dptb/postprocess/unified/calculator.py (near the other standard-library imports)
so os.remove referenced in the code (search for os.remove(orb_name)) resolves
correctly.
- Line 180: The call to log.warning in unified.calculator (the undefined symbol
"log") will raise NameError; fix by adding a logger variable and import: import
the logging module near the top of the file and create a module logger (e.g.,
assign log = logging.getLogger(__name__)) so calls like log.warning(...) and any
other log.* uses reference a defined logger.
- Around line 166-177: The temporary file handling around the overlap branch
(when self.overlap and hasattr(self.model, "orbital_files_content")) is unsafe;
replace the manual writes to "./temp_{sym}.orb" with a
tempfile.TemporaryDirectory context so files are created inside an isolated temp
dir (unique per invocation) and automatically cleaned up even if compute_overlap
raises; write each orbital string from self.model.orbital_files_content into
files under that temp dir, collect their basenames/paths into orb_names, call
compute_overlap(atomic_data, self.model.idp, temp_dir_path, orb_names) (i.e.
pass the temp dir instead of "./"), and remove the manual os.remove loop — using
the TemporaryDirectory context ensures no concurrency collisions, CWD
independence, and guaranteed cleanup on error.

🧹 Nitpick comments (2)

dptb/entrypoints/train.py (1)

111-114: Double file read — parse_orbital_file already reads the file content.

parse_orbital_file(value) reads the entire file (see orbital_parser.py Line 25), and then Lines 113–114 open and read the same file again. Consider having parse_orbital_file return both the parsed basis string and the raw content, or cache the content from the first read.
dptb/postprocess/unified/calculator.py (1)
194-219: Inconsistent indentation inside get_hR.

Lines 195–203 have extra leading whitespace (roughly 8 spaces for the body) compared to every other method in this class which uses 8 spaces (2-level indent). The comment on line 195 is indented at 16 spaces while the h2R = ... on line 196 starts at 8 — it looks like a mix of indent levels got pasted in.
✏️ Proposed fix: align indentation
     def get_hR(self, atomic_data):
-                # Initialize hR converters
-        h2R = Hr2HR(
+        # Initialize hR converters
+        h2R = Hr2HR(

coderabbitai · 2026-02-14T17:31:05Z

dptb/entrypoints/train.py

+                except Exception as e:
+                    raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}")


⚠️ Potential issue | 🟡 Minor

Chain the exception to preserve traceback context.

except Exception is overly broad, and the re-raised ValueError drops the original traceback. Use raise ... from e so the cause is visible during debugging.

- except Exception as e: - raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}") + except (ValueError, FileNotFoundError, OSError) as e: + raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}") from e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}")

except (ValueError, FileNotFoundError, OSError) as e:

raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}") from e

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 118-118: Do not catch blind exception: Exception

(BLE001)

[warning] 119-119: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

[warning] 119-119: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@dptb/entrypoints/train.py` around lines 118 - 119, The current except block in dptb/entrypoints/train.py that reads "except Exception as e: raise ValueError(f'Failed to parse orbital file {value} for element {elem}: {e}')" drops the original traceback; change it to chain the original exception by using "raise ValueError(f'Failed to parse orbital file {value} for element {elem}: {e}') from e" (and, if feasible, narrow the catch to specific exceptions instead of bare Exception) so the original error context for value and elem is preserved.

coderabbitai · 2026-02-14T17:31:06Z

dptb/postprocess/unified/calculator.py

+        elif self.overlap and hasattr(self.model, "orbital_files_content"):
+            from dptb.postprocess.ovp2c import compute_overlap
+            # write the orbital files content to a temporary file
+            orb_names = []
+            for sym in self.model.idp.type_names:
+                with open(f"./temp_{sym}.orb", "w") as f:
+                    f.write(self.model.orbital_files_content[sym])
+                orb_names.append(f"temp_{sym}.orb")
+            atomic_data = compute_overlap(atomic_data, self.model.idp, "./", orb_names)
+            # remove the temporary files
+            for orb_name in orb_names:
+                os.remove(orb_name)


⚠️ Potential issue | 🟠 Major

Temporary file handling is fragile and unsafe.

Three concerns with the current approach:

No cleanup on error: If compute_overlap (line 174) raises, the temp files are leaked because there's no try/finally.

Concurrency collision: Writing to ./temp_{sym}.orb in CWD means parallel processes with the same atom types will clobber each other's files.

CWD dependency: The hardcoded "./" path assumes the process CWD is writable and appropriate.

Use tempfile.TemporaryDirectory (or tempfile.mkdtemp) to isolate files and guarantee cleanup:

♻️ Proposed fix using tempfile

+ import tempfile + import os elif self.overlap and hasattr(self.model, "orbital_files_content"): from dptb.postprocess.ovp2c import compute_overlap - # write the orbital files content to a temporary file - orb_names = [] - for sym in self.model.idp.type_names: - with open(f"./temp_{sym}.orb", "w") as f: - f.write(self.model.orbital_files_content[sym]) - orb_names.append(f"temp_{sym}.orb") - atomic_data = compute_overlap(atomic_data, self.model.idp, "./", orb_names) - # remove the temporary files - for orb_name in orb_names: - os.remove(orb_name) + with tempfile.TemporaryDirectory() as tmpdir: + orb_names = [] + for sym in self.model.idp.type_names: + fname = f"temp_{sym}.orb" + with open(os.path.join(tmpdir, fname), "w") as f: + f.write(self.model.orbital_files_content[sym]) + orb_names.append(fname) + atomic_data = compute_overlap(atomic_data, self.model.idp, tmpdir, orb_names)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

elif self.overlap and hasattr(self.model, "orbital_files_content"):

from dptb.postprocess.ovp2c import compute_overlap

# write the orbital files content to a temporary file

orb_names = []

for sym in self.model.idp.type_names:

with open(f"./temp_{sym}.orb", "w") as f:

f.write(self.model.orbital_files_content[sym])

orb_names.append(f"temp_{sym}.orb")

atomic_data = compute_overlap(atomic_data, self.model.idp, "./", orb_names)

# remove the temporary files

for orb_name in orb_names:

os.remove(orb_name)

import tempfile

import os

elif self.overlap and hasattr(self.model, "orbital_files_content"):

from dptb.postprocess.ovp2c import compute_overlap

with tempfile.TemporaryDirectory() as tmpdir:

orb_names = []

for sym in self.model.idp.type_names:

fname = f"temp_{sym}.orb"

with open(os.path.join(tmpdir, fname), "w") as f:

f.write(self.model.orbital_files_content[sym])

orb_names.append(fname)

atomic_data = compute_overlap(atomic_data, self.model.idp, tmpdir, orb_names)

🧰 Tools

🪛 Ruff (0.15.0)

[error] 177-177: Undefined name os

(F821)

🤖 Prompt for AI Agents

In `@dptb/postprocess/unified/calculator.py` around lines 166 - 177, The temporary file handling around the overlap branch (when self.overlap and hasattr(self.model, "orbital_files_content")) is unsafe; replace the manual writes to "./temp_{sym}.orb" with a tempfile.TemporaryDirectory context so files are created inside an isolated temp dir (unique per invocation) and automatically cleaned up even if compute_overlap raises; write each orbital string from self.model.orbital_files_content into files under that temp dir, collect their basenames/paths into orb_names, call compute_overlap(atomic_data, self.model.idp, temp_dir_path, orb_names) (i.e. pass the temp dir instead of "./"), and remove the manual os.remove loop — using the TemporaryDirectory context ensures no concurrency collisions, CWD independence, and guaranteed cleanup on error.

coderabbitai · 2026-02-14T17:31:06Z

dptb/postprocess/unified/calculator.py

+            atomic_data = compute_overlap(atomic_data, self.model.idp, "./", orb_names)
+            # remove the temporary files
+            for orb_name in orb_names:
+                os.remove(orb_name)


⚠️ Potential issue | 🔴 Critical

os is not imported — NameError at runtime.

os.remove(orb_name) will crash because os is never imported in this file. This is confirmed by static analysis (Ruff F821).

🐛 Proposed fix: add missing import

Add at the top of the file:

from typing import Union, Tuple, Optional, Any from abc import ABC, abstractmethod +import os import torch

🧰 Tools

🪛 Ruff (0.15.0)

[error] 177-177: Undefined name os

(F821)

🤖 Prompt for AI Agents

In `@dptb/postprocess/unified/calculator.py` at line 177, The call to os.remove(orb_name) will raise NameError because the module never imports os; add "import os" to the top import section of dptb/postprocess/unified/calculator.py (near the other standard-library imports) so os.remove referenced in the code (search for os.remove(orb_name)) resolves correctly.

coderabbitai · 2026-02-14T17:31:06Z

dptb/postprocess/unified/calculator.py

+                os.remove(orb_name)
+        elif self.overlap:
+            if self.model.method == "e3tb":
+                log.warning("The overlap inferenced from model is not stable in singular basis, please ensure this is what you want.")


⚠️ Potential issue | 🔴 Critical

log is not defined — NameError at runtime.

log.warning(...) will crash because no logger is defined or imported. This is confirmed by static analysis (Ruff F821).

🐛 Proposed fix: add logger

Add near the top of the file:

+import logging +log = logging.getLogger(__name__)

🧰 Tools

🪛 Ruff (0.15.0)

[error] 180-180: Undefined name log

(F821)

🤖 Prompt for AI Agents

In `@dptb/postprocess/unified/calculator.py` at line 180, The call to log.warning in unified.calculator (the undefined symbol "log") will raise NameError; fix by adding a logger variable and import: import the logging module near the top of the file and create a module logger (e.g., assign log = logging.getLogger(__name__)) so calls like log.warning(...) and any other log.* uses reference a defined logger.

floatingCatty added 2 commits February 13, 2026 16:14

add Hr2HR function for sparse Hamiltonian output

89d7210

add test

a73ab33

floatingCatty changed the title ~~Vbcsr~~ Sparse Hr to HR forming Feb 13, 2026

floatingCatty requested review from AsymmetryChou and QG-phy February 13, 2026 21:36

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

add support of orbital files as basis.

c8c3317

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

floatingCatty added 2 commits February 13, 2026 18:16

add get hR

4fb5728

add wapper and eigenvalue solver support for mkl pardiso+arpack, and …

f8325d3

…FEAST

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

fix some robustness bugs

1f8a30d

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

add 2 center overlap support in postprocessing

90aa5c6

coderabbitai bot reviewed Feb 14, 2026

View reviewed changes

fix bug when vbcsr not istalled

e6552c5

	def __call__(self, data: AtomicDataDict.Type) -> AtomicDataDict.Type:
	def __call__(self, data: AtomicDataDict.Type) -> "ImageContainer":

		if pred_opts and pred_opts.get("method", "e3tb") != "e3tb":
		raise ValueError(f"Orbital files in 'basis' are only supported for the 'e3tb' method. Found method: {pred_opts.get('method')}")

		except Exception as e:
		raise ValueError(f"Failed to parse orbital file {value} for element {elem}: {e}")

Comments

Conversation

floatingCatty commented Feb 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

What sigma means (shift-invert)

What OPinv must be

How mode changes what ARPACK selects (which)

Minimal example (providing OPinv)

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

floatingCatty commented Feb 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

What `sigma` means (shift-invert)

What `OPinv` must be

How `mode` changes what ARPACK selects (`which`)

Minimal example (providing `OPinv`)