Optimize the calculation speed of `InterRDF_s` #5073

gitzhangch · 2025-06-29T06:55:32Z

Refactored bin index calculation in the _single_frame function of InterRDF_s to optimize redundant np.histogram calls.

Changes made in this Pull Request:

Refactored bin index calculation in InterRDF_s._single_frame to avoid multiple redundant np.histogram calls, improving performance.

PR Checklist

Issue raised/referenced?
Tests updated/added?
Documentation updated/added?
package/CHANGELOG file updated?
Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5073.org.readthedocs.build/en/5073/

…erRDF_s to optimize redundant np.histogram calls. Additionally, divided by N in _conclude to make the computation consistent with InterRDF.

github-actions

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

orbeckst

Thank you for the contribution! With any code rewrite for performance, the first thing is to make sure that the existing tests still pass. Have a look at the test output (you can run the tests locally, too) and modify your code so that it produces the same results as before.

If you have any reason to belief that the new code produces correct output while the old code is incorrect, please start a discussion.

codecov · 2025-07-01T15:18:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.69%. Comparing base (58b872b) to head (0e617a6).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #5073      +/-   ##
===========================================
+ Coverage    92.68%   92.69%   +0.01%     
===========================================
  Files          180      180              
  Lines        22446    22452       +6     
  Branches      3187     3186       -1     
===========================================
+ Hits         20804    20812       +8     
+ Misses        1167     1166       -1     
+ Partials       475      474       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

orbeckst

Do you have some minimal performance data (before/after)?

Can you add an entry to CHANGELOG where you summarize your changes? It looks to me that the current version does not change normalization and simply addresses performance? (That's fine and makes it easier to merge the PR... if results change then that's more tricky.)

package/MDAnalysis/analysis/rdf.py

Copilot

Pull Request Overview

This PR optimizes the calculation speed of InterRDF_s by refactoring bin index calculation to avoid redundant np.histogram calls and ensures result consistency with InterRDF by adding normalization.

Replaced multiple np.histogram calls with direct bin index calculation in _single_frame
Added normalization by dividing by N in _conclude method to match InterRDF behavior

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

package/MDAnalysis/analysis/rdf.py

orbeckst

See comments, in particular can you rewrite with np.digitize?

package/MDAnalysis/analysis/rdf.py

gitzhangch · 2025-10-04T04:08:41Z

See comments, in particular can you rewrite with np.digitize?

@orbeckst
np.digitize is more readable but slower; manual calculation is faster for equal-width binning.

In [1]: import numpy as np

In [2]: arr = np.random.uniform(0, 10, size=1_000_000)

In [3]: timeit np.digitize(arr, np.linspace(0, 10, 101), right=False) - 1
34.3 ms ± 945 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: timeit ((arr - 0) * 100 / (10 -0)).astype(np.int64)
3.37 ms ± 52.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

gitzhangch · 2025-10-04T07:53:50Z

@orbeckst

performance

Version	Time per 10 frame	Speedup
Original (for loop + np.histogram)	27.1 s ± 55.7 ms	1x
Optimized (NumPy vectorized)	158 ms ± 513 µs	170x

test code

>>> rdf1 = before(u, [[ag1, ag2]]) # alias InterRDF_s as before and after
>>> %timeit rdf1.run(start=0,stop=10)
27.1 s ± 55.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> rdf2 = after(u, [[ag1, ag2]])
>>> %timeit rdf2.run(start=0,stop=10)
158 ms ± 513 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Added new contributors and optimized performance of RDF calculation.

package/CHANGELOG

package/MDAnalysis/analysis/rdf.py

…and update CHANGELOG.

orbeckst

Can you please adjust variable names to make the code more readable?
I think we can boost the speed further by replacing np.isin().

See comments below. I think that's really it.

Can you do a quick benchmark of the final code version and post the result?

package/MDAnalysis/analysis/rdf.py

orbeckst · 2025-10-08T21:26:52Z

We just merged #4884 that enables parallelization. We do want to check that your optimized code also works in the parallel context.

…to gitzhangch-develop

- corrected version changed dates for PR #4884 in analysis.rdf to 2.10.0 - moved entry in CHANGELOG to 2.10.0

orbeckst · 2025-10-09T00:09:42Z

@gitzhangch when continuing on this PR, do a git pull first because I merged the latest develop and I also made some updates.

orbeckst

Looks great. Thanks for addressing my last comments.

Have you done a quick before/after benchmark? I'd like to see the improvement before I'll merge.

gitzhangch · 2025-10-10T03:20:31Z

before/after benchmark

@orbeckst
I performed a simple and quick parameter setup test. Since I don't use conda, the dev version of the package was installed in a separate test environment to benchmark the code separately before and after the update; for performance details, please refer to the following.

Because the atomic pair size set in the benchmark is relatively small, the performance is not as striking as before. The larger the natoms**2, the greater the computational difference between the old and new code.

Additionally, is it necessary to update the corresponding files in the benchmark folder?

code

import MDAnalysis

try:
    from MDAnalysisTests.datafiles import TPR, XTC
except:
    pass

try:
    from MDAnalysis.analysis.rdf import InterRDF_s
except:
    pass


class SimpleRdfBench(object):
    """Benchmarks for MDAnalysis.analysis.rdf"""

    params = ([20, 75, 200], [[0, 5], [0, 15], [0, 20]], [10, 100], [1, 3, 9])

    param_names = ["nbins", "range_val", "natoms", "npairs"]

    def setup(self, nbins, range_val, natoms, npairs):

        self.sel_str = "name OW"

        self.u = MDAnalysis.Universe(TPR, XTC)

        try:
            self.sel = self.u.select_atoms(self.sel_str)[:natoms]
        except AttributeError:
            self.sel = self.u.selectAtoms(self.sel_str)[:natoms]

        ags = [[self.sel, self.sel]] * npairs
        self.rdfs = InterRDF_s(u=self.u, ags=ags, nbins=nbins, range=range_val)

    def time_interrdf_s(self, nbins, range_val, natoms, npairs):
        """Benchmark a full trajectory parse
        by MDAnalysis.analysis.rdf.InterRDF_s
        """
        self.rdfs.run(stop=5)

result

before

[100.00%] ··· ======= =========== ======== ============= ============= =============
              --                                             npairs
              ---------------------------- -----------------------------------------
               nbins   range_val   natoms        1             3             9
              ======= =========== ======== ============= ============= =============
                 20      [0, 5]      10     16.2±0.03ms    25.1±0.1ms    52.0±0.5ms
                 20      [0, 5]     100      54.4±0.3ms     141±1ms       399±5ms
                 20     [0, 15]      10     18.0±0.08ms   30.7±0.04ms    68.5±0.2ms
                 20     [0, 15]     100       218±1ms       632±2ms      1.86±0.01s
                 20     [0, 20]      10     20.3±0.04ms    37.7±0.1ms    90.2±0.7ms
                 20     [0, 20]     100       426±3ms       1.25±0s       3.74±0s
                 75      [0, 5]      10     16.2±0.01ms   25.3±0.07ms    52.8±0.4ms
                 75      [0, 5]     100       56.6±1ms      150±4ms       426±20ms
                 75     [0, 15]      10     18.1±0.03ms    30.9±0.1ms    69.1±0.2ms
                 75     [0, 15]     100      220±0.8ms      643±6ms      1.89±0.02s
                 75     [0, 20]      10      20.4±0.1ms    37.8±0.1ms    89.7±0.3ms
                 75     [0, 20]     100       433±3ms      1.27±0.01s    3.79±0.03s
                200      [0, 5]      10      16.4±0.2ms    25.8±0.2ms   53.5±0.06ms
                200      [0, 5]     100       60.1±1ms      161±7ms       461±10ms
                200     [0, 15]      10     18.2±0.07ms   31.4±0.08ms    70.5±0.3ms
                200     [0, 15]     100       232±2ms       677±7ms      1.99±0.03s
                200     [0, 20]      10      20.8±0.2ms    38.7±0.1ms    92.7±0.9ms
                200     [0, 20]     100       455±5ms      1.33±0.01s    3.99±0.06s
              ======= =========== ======== ============= ============= =============

after

[100.00%] ··· ======= =========== ======== ============= ============= =============
              --                                             npairs
              ---------------------------- -----------------------------------------
               nbins   range_val   natoms        1             3             9
              ======= =========== ======== ============= ============= =============
                 20      [0, 5]      10     12.3±0.09ms   14.0±0.08ms   19.0±0.07ms
                 20      [0, 5]     100      14.6±0.1ms    20.5±0.3ms    39.1±0.8ms
                 20     [0, 15]      10     12.5±0.08ms   14.2±0.07ms    19.5±0.1ms
                 20     [0, 15]     100      19.2±0.1ms    34.5±0.4ms     79.8±2ms
                 20     [0, 20]      10     12.3±0.04ms   13.7±0.05ms   18.0±0.06ms
                 20     [0, 20]     100      17.7±0.2ms    29.7±0.5ms     66.9±2ms
                 75      [0, 5]      10      12.5±0.2ms    14.2±0.2ms    19.3±0.1ms
                 75      [0, 5]     100      16.0±0.3ms    25.2±0.7ms     52.3±4ms
                 75     [0, 15]      10      12.5±0.2ms    14.4±0.2ms    19.6±0.2ms
                 75     [0, 15]     100      20.7±0.3ms     38.8±1ms      100±10ms
                 75     [0, 20]      10     12.3±0.07ms    13.8±0.1ms    18.2±0.2ms
                 75     [0, 20]     100      19.4±0.3ms    34.2±0.7ms    82.7±10ms
                200      [0, 5]      10      12.5±0.1ms    14.3±0.2ms    19.5±0.1ms
                200      [0, 5]     100     18.7±0.04ms    35.2±0.9ms     86.0±6ms
                200     [0, 15]      10      12.5±0.1ms    14.4±0.1ms    19.8±0.2ms
                200     [0, 15]     100      24.7±0.9ms     48.4±3ms      152±20ms
                200     [0, 20]      10      12.4±0.1ms   13.7±0.01ms   18.2±0.03ms
                200     [0, 20]     100      22.4±0.3ms    42.2±0.7ms     122±20ms
              ======= =========== ======== ============= ============= =============

orbeckst

Could you please also add the timing benchmark to benchmarks/analysis/rdf.py? That's a great addition!

Please check that you're also in AUTHORS – I didn't look.

Thank you!

package/CHANGELOG

orbeckst · 2025-10-13T16:16:45Z

I toned down the claim of "100-fold" speedup to "> 20-fold", simply because that's what you demonstrate here. It may well be much faster for larger N and that's great but let's be conservative.

I really like that you created a new benchmark case for InterRDF_s and I really like you to add it to the code base, please. It's really good practice, especially for anything performance-related.

I'll just wait with merging until you had a chance to look at the PR and I'll do it as soon as you're done here. Excellent work @gitzhangch !

orbeckst · 2025-10-13T16:49:09Z

@gitzhangch in order to get these changes into the next release, would you be able to respond by tomorrow, Tuesday?

gitzhangch · 2025-10-13T17:47:45Z

@orbeckst I update AUTHORS and rdf benchmark.

orbeckst · 2025-10-13T17:49:02Z

Fantastic, will merge when CI is all green.

orbeckst · 2025-10-13T22:47:27Z

Thank you for your contribution @gitzhangch , it will be in release 2.10.0! 🎉

Refactored bin index calculation in the _single_frame function of Int…

8965375

…erRDF_s to optimize redundant np.histogram calls. Additionally, divided by N in _conclude to make the computation consistent with InterRDF.

github-actions bot reviewed Jun 29, 2025

View reviewed changes

orbeckst requested changes Jun 30, 2025

View reviewed changes

Adjust the code to ensure the test passes.

2760210

gitzhangch and others added 7 commits July 3, 2025 16:17

Fewer revisions

375ecc9

formt revisions

fa271c2

Merge branch 'MDAnalysis:develop' into develop

fe17228

fix for test

83c6815

fix for test

1cc5a92

fix for test

9ab47e0

fix for test

5c1a551

gitzhangch marked this pull request as ready for review September 29, 2025 14:36

gitzhangch requested a review from orbeckst September 29, 2025 14:44

orbeckst requested changes Sep 30, 2025

View reviewed changes

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

orbeckst self-assigned this Sep 30, 2025

orbeckst requested a review from Copilot October 2, 2025 18:08

Copilot AI reviewed Oct 2, 2025

View reviewed changes

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

orbeckst requested changes Oct 2, 2025

View reviewed changes

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

gitzhangch and others added 2 commits October 4, 2025 00:14

remove for loop

e8e131f

fix

c81faf5

Update CHANGELOG

1b587ec

Added new contributors and optimized performance of RDF calculation.

gitzhangch requested a review from orbeckst October 4, 2025 15:19

orbeckst requested changes Oct 4, 2025

View reviewed changes

package/CHANGELOG Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

Improve performance of InterRDF_s with vectorized NumPy calculations …

9272632

…and update CHANGELOG.

gitzhangch requested a review from orbeckst October 6, 2025 08:08

orbeckst requested changes Oct 7, 2025

View reviewed changes

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

package/MDAnalysis/analysis/rdf.py Outdated Show resolved Hide resolved

orbeckst added 2 commits October 8, 2025 16:58

Merge branch 'develop' of https://github.com/gitzhangch/mdanalysis in…

d236532

…to gitzhangch-develop

fixed change entries for PR #4884 on RDF parallelization

477b0e2

- corrected version changed dates for PR #4884 in analysis.rdf to 2.10.0 - moved entry in CHANGELOG to 2.10.0

orbeckst mentioned this pull request Oct 9, 2025

Parallelizes MDAnalysis.analysis.InterRDF and MDAnalysis.analysis.InterRDF_s #4884

Merged

5 tasks

gitzhangch added 3 commits October 9, 2025 21:18

update code, comment and variables

50f60f3

fix format

684ce0d

update format

cd59688

orbeckst approved these changes Oct 9, 2025

View reviewed changes

gitzhangch changed the title ~~Optimize the calculation speed and logic of InterRDF_s to ensure that the results are consistent with InterRDF~~ Optimize the calculation speed of InterRDF_s Oct 11, 2025

gitzhangch and others added 3 commits October 13, 2025 12:56

solve CHANGELOG conflict

7671841

update CHANGELOG

c03cafa

Merge branch 'develop' into develop

49204d7

orbeckst reviewed Oct 13, 2025

View reviewed changes

package/CHANGELOG Outdated Show resolved Hide resolved

package/CHANGELOG Show resolved Hide resolved

more conservative speedup claim in CHANGELOG

5233116

gitzhangch added 3 commits October 14, 2025 00:57

update AUTHORS and rdf benchmark

6f64b1b

Merge branch 'develop' of github.com:gitzhangch/mdanalysis into develop

b621acc

update benchmark for InterRDF_s

7d021d3

IAlibay mentioned this pull request Oct 13, 2025

Release v2.10 #5124

Closed

Merge branch 'develop' into develop

0e617a6

orbeckst merged commit 19e675b into MDAnalysis:develop Oct 13, 2025
24 checks passed

Optimize the calculation speed of InterRDF_s #5073

Optimize the calculation speed of InterRDF_s #5073

Uh oh!

Conversation

gitzhangch commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers Certificate of Origin

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gitzhangch commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitzhangch commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

performance

test code

Uh oh!

Uh oh!

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

orbeckst commented Oct 8, 2025

Uh oh!

orbeckst commented Oct 9, 2025

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

gitzhangch commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

before/after benchmark

code

result

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

orbeckst commented Oct 13, 2025

Uh oh!

orbeckst commented Oct 13, 2025

Uh oh!

gitzhangch commented Oct 13, 2025

Uh oh!

orbeckst commented Oct 13, 2025

Uh oh!

Uh oh!

orbeckst commented Oct 13, 2025

Optimize the calculation speed of `InterRDF_s` #5073

Optimize the calculation speed of `InterRDF_s` #5073

gitzhangch commented Jun 29, 2025 •

edited

Loading

codecov bot commented Jul 1, 2025 •

edited

Loading

gitzhangch commented Oct 4, 2025 •

edited

Loading

gitzhangch commented Oct 4, 2025 •

edited

Loading

gitzhangch commented Oct 10, 2025 •

edited

Loading