Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize compute_kda_ma #745

Merged
merged 3 commits into from
Sep 21, 2022

Conversation

liuzhenqi77
Copy link
Contributor

Changes proposed in this pull request:

  • Optimize compute_kda_ma

I was trying to speedup MKDAChi2() and made some changes to the matrix operations in compute_kda_ma, which could half the running time. Please feel free to make any revisions, as I could only test under my specific case.

  • nonzero_idx is not sorted after the change, but I guess it's not an issue.
  • I did not fully test against sum_overlap=True, in which counts could be a list.

I also found that parallelize this loop with joblib could speedup the process by another small percent. So, I was wondering if there is a way to pre-calculate ma_maps for all the files in a dataset and reuse them. I saw that many functions support using saved MA maps internally through self.inputs_["ma_maps"], but I did not find a way to patch into the input.

Thank you.

@welcome
Copy link

welcome bot commented Sep 21, 2022

Thanks for opening this pull request! We have detected this is the first time you have contributed to NiMARE. Please check out our contributing guidelines.
We invite you to list yourself as a NiMARE contributor, so if your name is not already mentioned, please modify the .zenodo.json file with your data right above Angie's entry. Example:

{
  "name": "Contributor, New",
  "affiliation": "Department of Psychology, Some University",
  "orcid": "<your id>"
},
{
  "name": "Laird, Angela R.",
  "affiliation": "Florida International University",
  "orcid": "0000-0003-3379-8744"
},

Of course, if you want to opt out this time there is no problem at all with adding your name later. You will be always welcome to add it in the future whenever you feel it should be listed.

@liuzhenqi77
Copy link
Contributor Author

Tried to fix the sum_overlap=True case. Unittests passed for me locally.

@tsalo tsalo requested a review from JulioAPeraza September 21, 2022 14:20
@codecov
Copy link

codecov bot commented Sep 21, 2022

Codecov Report

Base: 88.56% // Head: 88.55% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (7fc1122) compared to base (28bbda5).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #745      +/-   ##
==========================================
- Coverage   88.56%   88.55%   -0.01%     
==========================================
  Files          38       38              
  Lines        4371     4370       -1     
==========================================
- Hits         3871     3870       -1     
  Misses        500      500              
Impacted Files Coverage Δ
nimare/meta/utils.py 95.73% <100.00%> (-0.03%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@JulioAPeraza JulioAPeraza added the refactoring Requesting changes to the code which do not impact behavior label Sep 21, 2022
Copy link
Collaborator

@JulioAPeraza JulioAPeraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! These changes look good to me. I just have two small suggestions. I was able to reproduce the profiling results on different datasets.

I also found that parallelize this loop with joblib could speedup the process by another small percent.

That's great. We are currently profiling MKDAChi2() to see how we better handle the memory while computing multiple ma_maps in parallel.
Thanks!

if sum_overlap:
nonzero_to_append = counts[sphere_idx_inside_mask]
else:
nonzero_to_append = np.ones((len(sphere_idx_inside_mask),)) * counts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nonzero_to_append = np.ones((len(sphere_idx_inside_mask),)) * counts
nonzero_to_append = np.ones((len(sphere_idx_inside_mask),)) * value

With these changes, I think here we could use value and remove counts = value from line 125 so that counts will always be a list.

Comment on lines 134 to 135
if sum_overlap:
counts = counts[idx]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be merged with (or removed in favor of) the if sum_overlap: you have below.

Co-authored-by: Julio A. Peraza <52050407+JulioAPeraza@users.noreply.github.com>
@JulioAPeraza JulioAPeraza merged commit b7ba275 into neurostuff:main Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Requesting changes to the code which do not impact behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants