Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HGEdge memory management / speed #453

Merged
merged 2 commits into from
Sep 12, 2021
Merged

Conversation

yakra
Copy link
Contributor

@yakra yakra commented Sep 12, 2021

closes yakra#175
closes yakra#176
closes yakra#182
Yes, there's still room to squeeze some more efficiency out of writing subgraphs.
Good RAM savings. Speedwise, we have several nickel-and-dime improvements that add up to like a quarter. ;P
Code reorganization better sets us up for potential future enhancements.
Most everything squashed down into one commit for cleaner commit history, as successive commits edited and re-edited the same code lines.

All affected tasks together:
all_Total

Broken down into the individual subtasks:
all_ctor1
all_ctor2
all_dtor
all_Subg

More detailed comments about the individual changes will be in the posts below.

* HGEdge memory management
* area graph speedup
* nix route_names_and_systems
* Args::numthreads cleanup

interactive rebase d04ac12af81e27d8e9886345b57ee5533cb9722e
@yakra
Copy link
Contributor Author

yakra commented Sep 12, 2021

HGEdge memory management

per #375 (comment)

  • no-build (left bars, light colors):
    1aedacb
  • "_80a" (middle bars, medium colors):
    The more straightforward "Operation Valgrind" stuff:
    • Fix memory leak in s_written arrays.
    • Don't create c_written or t_written arrays when not needed...
    • ...and then delete them as soon as no longer needed, rather than wait till the HGEdge itself is deleted.
    • Lose the custom destructor; handle deletions via HGEdge::detach instead.
  • "_80b" (right bars, dark colors):
    Combine all 3 arrays into one written array + a bitmask, for reduced RAM use & increased speed.

Only implementing this in C++, because:

  • Python has garbage collection, so no pressing need to worry about manually managing memory.
  • Implementing the bitmask in Python reduces performance, probably due to having to check at runtime whether a reference can even be used with the & or | operators. Waste of RAM notwithstanding, I'm leaving as-is, to keep a cleaner commit history & avoid hurting speed. May reevaluate this in the future though.

vg8_ctor1
_80a is unchanged from 1aedacb here. The different times we see are a result of noise in the measurements.
_80b is consistently a bit faster due to allocating & initializing 1/3 as many arrays.

vg8_ctor2
_80a is slowed down a bit by deleting 598319 arrays now rather than when HighwayGraph::clear() is called. Of course, this is offset by HighwayGraph::clear() being that much faster. Additionally, having negligible effect on speed, a conditional (slower!) avoids allocating & initializing (faster!) unnecessary c_written or t_written arrays. How many times? Probably low triple digits total. Again, we're pretty close to margin of error, as evidenced by lab3 appearing faster.
_80b is faster due to having 1/2 the allocations & inits of no-build, as well as 1/2 as many deletions for edges that only temporarily exist during the compression process.

vg8_dtor
_80a had those 598319 deletions offloaded to the compression process instead; that offsets this. We're slowed down some more however by deleting 822196 s_written arrays that didn't get deleted at all before. Why are faster times shown for some machines? I'll chalk that up to no longer having the overhead of the HGEdge destructor and a wee bit of extraneous code that came a long with it. A bit surprising the effect is this pronounced though.
_80b is faster due to deleting only 1 array now, not 2 or 3. Plus losing whatever overhead was associated with the old dtor.

vg8_Subg
Note how FreeBSD doesn't scale above 2 threads here. :( So it's not just noreaster and all the other processes it's running. This isn't a function of the hard disk (SSD!) being used; the "files" are being written to /dev/null. Well that's disappointing!
_80a is unchanged from 1aedacb here.
_80b is barely perceptibly faster through the noise. Instead of pulling 3 separate arrays in from main memory, we read 1 most of the time, and then it's in the CPU's cache.

vg8_Total
Sum of all 4 of the above tasks together.


Early dead branches:
• valgrind8A a653b47
• valgrind8B 029ce2d 07ae0bf

@yakra
Copy link
Contributor Author

yakra commented Sep 12, 2021

area graph edge-matching speed

closes yakra#176

A decent improvement for area graph speed -- although area graphs are already so fast it's barely any difference in the big picture.
Python required some changes left out of #404 that were a small speedup in their own right after all.

C++:
176_area

Python:

box before after
BT 3.220 1.910
lab1 1.910 1.130
lab1.5 1.460 0.900
lab2 2.290 1.350
lab3 2.980 1.810
lab4 2.200 1.380
BSDlab 3.8 2.27

@yakra
Copy link
Contributor Author

yakra commented Sep 12, 2021

nix route_names_and_systems

closes yakra#175

  • Python: Notwithstanding saving ~138 MiB RAM, once again this hurts speed, so going no-build & keeping a cleaner commit history; may reevaluate in the future.
  • C++:
    • RAM: Saves ~90 MiB.
    • Speed: Barely any change to subgraph writing itself, but once we factor in both HGEdge ctors and the dtor running faster, it's a win.

175_ctor1
175_ctor2
175_dtor
175_Subg
175_Total
left: before / right: after

@yakra
Copy link
Contributor Author

yakra commented Sep 12, 2021

Close yakra#182

Minor code organization.

@yakra
Copy link
Contributor Author

yakra commented Sep 12, 2021

Bit shifts in clinchedby_code

C++:
bs_Subg

Python:

all subgraphs:

box before after
BT 103.150 99.930
lab1 65.470 61.330
lab1.5 52.830 50.640
lab2 77.810 73.480
lab3 103.730 97.410
lab4 78.400 74.510
BSDlab 131.61 127.06

area graphs:

box before after
BT 1.910 1.720
lab1 1.130 0.970
lab1.5 0.900 0.820
lab2 1.350 1.220
lab3 1.810 1.570
lab4 1.380 1.250
BSDlab 2.27 2.02

@yakra yakra changed the title HGEdge efficiency HGEdge memory management / speed Sep 12, 2021
@jteresco jteresco merged commit cf41525 into TravelMapping:master Sep 12, 2021
@jteresco
Copy link
Contributor

Pulling this onto noreaster to use starting with tonight's update. Thanks!

@yakra yakra deleted the dreamcast branch September 17, 2021 17:19
@yakra yakra mentioned this pull request Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants