Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Fluxion match performance #1007

Merged
merged 2 commits into from
Feb 16, 2023
Merged

Conversation

milroy
Copy link
Member

@milroy milroy commented Feb 8, 2023

As reported in issue #1001 flux mini submit experiences pronounced slowdown for most of the Fluxion match policies except first. With the debugging help of @grondo and @trws we traced one source to an un-inlined accumulate operation.
Adding the inline hint to the function allows the compiler to properly inline it and performance improves by a factor of >100 for all policies except lonodex and hinodex (or, e.g., lonode with --exclusive -N1.
Another hotspot we encountered in debugging is the accumulate operation in calc_multilevel_scores:

return std::accumulate (m_multilevel_scores.begin (),

This O(n) accumulate operation is called for each finish_vtx, and even worse, currently scales linearly with the number of matches. This is because the accumulated vector isn't cleared after each traversal. We can replace the accumulate operation by a single int64_t which stores the prefix value.

While these changes improve performance for all policies, lonodex and hinodex are still slow. Future PRs will further improve performance by optimizing graph filtration and lookups on edge iteration.

Copy link
Member

@jameshcorbett jameshcorbett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Edit: sorry I didn't see all the slack conversation still ongoing, so review is a bit premature.

@milroy milroy changed the title WIP: Fix Fluxion match performance degradation Fix Fluxion match performance degradation Feb 15, 2023
@milroy milroy force-pushed the perf-fix branch 3 times, most recently from 5b06eeb to 5177b81 Compare February 15, 2023 03:09
@milroy
Copy link
Member Author

milroy commented Feb 16, 2023

@grondo and @garlick : @trws, @tpatki, @JaeseungYeom, and Zeke discussed perf traces and further performance improvements today.

We decided that those improvements will take longer to implement and will create further PRs. The contributions here will speed up most policies and as such can be reviewed.

@milroy milroy requested a review from grondo February 16, 2023 02:21
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM and already tested without issue on hetchy.

FYI - I pulled these 2 patches into our latest RPM build for TOSS 4 so we'll get these into the next release.

Just noticed a formatting error in the 2nd commit message, but otherwise this should get merged, IMO.

Perhaps we should also reword the PR title before merging since the title will be included in release notes (since this doesn't fully fix the perf degradation)

resource/policies/dfu_match_multilevel_id.hpp Show resolved Hide resolved
@milroy
Copy link
Member Author

milroy commented Feb 16, 2023

Perhaps we should also reword the PR title before merging since the title will be included in release notes (since this doesn't fully fix the perf degradation)

I'll rename to "Improve Fluxion match performance" since the changes improve performance for every policy I tried. Of course, the nodex policies are still slow and need to be improved further.

@milroy milroy changed the title Fix Fluxion match performance degradation Improve Fluxion match performance Feb 16, 2023
Problem: `perf` profiling together with a test replacement of `plus`
with an in-place addition reveals that the `plus` function is not
inlined by the compiler.

Add inline keyword to coerce the compiler to inline the function and
others in the namespace. The change improves match performance across a
run of the issue reproducer by a factor of >100 for several policies.
Problem: the current design of the multilevel ID match policy makes use
of a vector of factors (`m_multilevel_factors`) to keep track of all
the added factors and accumulates them to generate a prefix for the
overall vertex score. The O(n) operation is repeated for each vertex.
Furthermore, the vector is not cleared between match traversals, so the
operation becomes increasingly expensive.

Replace the vector with a single `int64_t` which contains the sum of all
vertex factors. Instead of popping the current vertex's factor from a
vector, simply subtract its factor stored in `score_factor_t`. The
change accelerates exclusive matching by a factor of about 6.
@codecov
Copy link

codecov bot commented Feb 16, 2023

Codecov Report

Merging #1007 (7aeee01) into master (f7bdcb5) will decrease coverage by 0.1%.
The diff coverage is 90.9%.

❗ Current head 7aeee01 differs from pull request most recent head b21c605. Consider uploading reports for the commit b21c605 to get more accurate results

@@           Coverage Diff            @@
##           master   #1007     +/-   ##
========================================
- Coverage    74.4%   74.4%   -0.1%     
========================================
  Files          86      86             
  Lines        9429    9434      +5     
========================================
+ Hits         7019    7021      +2     
- Misses       2410    2413      +3     
Impacted Files Coverage Δ
resource/evaluators/fold.hpp 14.2% <0.0%> (ø)
resource/policies/dfu_match_multilevel_id.hpp 66.6% <ø> (ø)
resource/policies/dfu_match_multilevel_id_impl.hpp 82.1% <100.0%> (-2.7%) ⬇️

@grondo
Copy link
Contributor

grondo commented Feb 16, 2023

Thanks! Feel free to set MWP when you are ready.

@milroy milroy added the merge-when-passing mergify.io - merge PR automatically once CI passes label Feb 16, 2023
@mergify mergify bot merged commit 35d3c96 into flux-framework:master Feb 16, 2023
@milroy milroy deleted the perf-fix branch February 16, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing mergify.io - merge PR automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants