HGEdge memory management / speed #453

yakra · 2021-09-12T04:29:20Z

closes yakra#175
closes yakra#176
closes yakra#182
Yes, there's still room to squeeze some more efficiency out of writing subgraphs.
Good RAM savings. Speedwise, we have several nickel-and-dime improvements that add up to like a quarter. ;P
Code reorganization better sets us up for potential future enhancements.
Most everything squashed down into one commit for cleaner commit history, as successive commits edited and re-edited the same code lines.

All affected tasks together:

Broken down into the individual subtasks:

More detailed comments about the individual changes will be in the posts below.

* HGEdge memory management * area graph speedup * nix route_names_and_systems * Args::numthreads cleanup interactive rebase d04ac12af81e27d8e9886345b57ee5533cb9722e

yakra · 2021-09-12T04:30:06Z

`HGEdge` memory management

per #375 (comment)

no-build (left bars, light colors):
1aedacb
"_80a" (middle bars, medium colors):
The more straightforward "Operation Valgrind" stuff:
- Fix memory leak in s_written arrays.
- Don't create c_written or t_written arrays when not needed...
- ...and then delete them as soon as no longer needed, rather than wait till the HGEdge itself is deleted.
- Lose the custom destructor; handle deletions via HGEdge::detach instead.
"_80b" (right bars, dark colors):
Combine all 3 arrays into one written array + a bitmask, for reduced RAM use & increased speed.

Only implementing this in C++, because:

Python has garbage collection, so no pressing need to worry about manually managing memory.
Implementing the bitmask in Python reduces performance, probably due to having to check at runtime whether a reference can even be used with the & or | operators. Waste of RAM notwithstanding, I'm leaving as-is, to keep a cleaner commit history & avoid hurting speed. May reevaluate this in the future though.

_80a is unchanged from 1aedacb here. The different times we see are a result of noise in the measurements.
_80b is consistently a bit faster due to allocating & initializing 1/3 as many arrays.

_80a is slowed down a bit by deleting 598319 arrays now rather than when HighwayGraph::clear() is called. Of course, this is offset by HighwayGraph::clear() being that much faster. Additionally, having negligible effect on speed, a conditional (slower!) avoids allocating & initializing (faster!) unnecessary c_written or t_written arrays. How many times? Probably low triple digits total. Again, we're pretty close to margin of error, as evidenced by lab3 appearing faster.
_80b is faster due to having 1/2 the allocations & inits of no-build, as well as 1/2 as many deletions for edges that only temporarily exist during the compression process.

_80a had those 598319 deletions offloaded to the compression process instead; that offsets this. We're slowed down some more however by deleting 822196 s_written arrays that didn't get deleted at all before. Why are faster times shown for some machines? I'll chalk that up to no longer having the overhead of the HGEdge destructor and a wee bit of extraneous code that came a long with it. A bit surprising the effect is this pronounced though.
_80b is faster due to deleting only 1 array now, not 2 or 3. Plus losing whatever overhead was associated with the old dtor.

Note how FreeBSD doesn't scale above 2 threads here. :( So it's not just noreaster and all the other processes it's running. This isn't a function of the hard disk (SSD!) being used; the "files" are being written to /dev/null. Well that's disappointing!
_80a is unchanged from 1aedacb here.
_80b is barely perceptibly faster through the noise. Instead of pulling 3 separate arrays in from main memory, we read 1 most of the time, and then it's in the CPU's cache.

Sum of all 4 of the above tasks together.

Early dead branches:
• valgrind8A a653b47
• valgrind8B 029ce2d 07ae0bf

yakra · 2021-09-12T04:30:50Z

area graph edge-matching speed

closes yakra#176

A decent improvement for area graph speed -- although area graphs are already so fast it's barely any difference in the big picture.
Python required some changes left out of #404 that were a small speedup in their own right after all.

C++:

Python:

box	before	after
BT	3.220	1.910
lab1	1.910	1.130
lab1.5	1.460	0.900
lab2	2.290	1.350
lab3	2.980	1.810
lab4	2.200	1.380
BSDlab	3.8	2.27

yakra · 2021-09-12T04:31:15Z

nix route_names_and_systems

closes yakra#175

Python: Notwithstanding saving ~138 MiB RAM, once again this hurts speed, so going no-build & keeping a cleaner commit history; may reevaluate in the future.
C++:
- RAM: Saves ~90 MiB.
- Speed: Barely any change to subgraph writing itself, but once we factor in both HGEdge ctors and the dtor running faster, it's a win.

left: before / right: after

yakra · 2021-09-12T04:31:24Z

Close yakra#182

Minor code organization.

yakra · 2021-09-12T04:31:47Z

Bit shifts in `clinchedby_code`

C++:

Python:

all subgraphs:

box	before	after
BT	103.150	99.930
lab1	65.470	61.330
lab1.5	52.830	50.640
lab2	77.810	73.480
lab3	103.730	97.410
lab4	78.400	74.510
BSDlab	131.61	127.06

area graphs:

box	before	after
BT	1.910	1.720
lab1	1.130	0.970
lab1.5	0.900	0.820
lab2	1.350	1.220
lab3	1.810	1.570
lab4	1.380	1.250
BSDlab	2.27	2.02

jteresco · 2021-09-12T16:27:10Z

Pulling this onto noreaster to use starting with tonight's update. Thanks!

yakra added 2 commits September 7, 2021 12:45

HGEdge efficiency

082e993

* HGEdge memory management * area graph speedup * nix route_names_and_systems * Args::numthreads cleanup interactive rebase d04ac12af81e27d8e9886345b57ee5533cb9722e

bit shift in clinchedby_code

f2c3ed2

yakra added graph generation code organization speed RAM labels Sep 12, 2021

yakra changed the title ~~HGEdge efficiency~~ HGEdge memory management / speed Sep 12, 2021

jteresco merged commit cf41525 into TravelMapping:master Sep 12, 2021

yakra deleted the dreamcast branch September 17, 2021 17:19

yakra mentioned this pull request Oct 25, 2021

concurrency detection / graphs #451

Merged

yakra mentioned this pull request Apr 13, 2022

graph generation speedups #515

Merged

yakra mentioned this pull request Feb 5, 2023

poor graph generation performance on FreeBSD #518

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HGEdge memory management / speed #453

HGEdge memory management / speed #453

yakra commented Sep 12, 2021 •

edited

Loading

yakra commented Sep 12, 2021 •

edited

Loading

yakra commented Sep 12, 2021

yakra commented Sep 12, 2021

yakra commented Sep 12, 2021

yakra commented Sep 12, 2021 •

edited

Loading

jteresco commented Sep 12, 2021

HGEdge memory management / speed #453

HGEdge memory management / speed #453

Conversation

yakra commented Sep 12, 2021 • edited Loading

yakra commented Sep 12, 2021 • edited Loading

HGEdge memory management

yakra commented Sep 12, 2021

area graph edge-matching speed

yakra commented Sep 12, 2021

nix route_names_and_systems

yakra commented Sep 12, 2021

Close yakra#182

yakra commented Sep 12, 2021 • edited Loading

Bit shifts in clinchedby_code

jteresco commented Sep 12, 2021

yakra commented Sep 12, 2021 •

edited

Loading

yakra commented Sep 12, 2021 •

edited

Loading

`HGEdge` memory management

yakra commented Sep 12, 2021 •

edited

Loading

Bit shifts in `clinchedby_code`