Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.list processing & DB speedups #619

Merged
merged 4 commits into from
Apr 1, 2024
Merged

Conversation

yakra
Copy link
Contributor

@yakra yakra commented Mar 31, 2024

Closes yakra#253
Closes #466
Closes yakra#278
Closes yakra#213

Squash 85b2a32, 794a0be, 90f08a1
* Nix HighwaySegment::clin_mtx; use Route::mtx
* Lock/unlock once per store_traveled_segments call
* Consolidate lock/unlock ops w/calls to mark_label(s)_in_use
Squash 41fdc3a (cp 39db917), 4b4813e (cp f968c5b), 1d34eba
* get_trim_line() only as needed; remember it
* std::string fields
* Nix orig_line; use intact lines[l]
* Fix bug that flagged tab delimiters as invalid chars in userlog errors for invalid fields
@yakra
Copy link
Contributor Author

yakra commented Mar 31, 2024

44d62f0 ReadListThread mutex efficiency

Before:
Every HighwaySegment had a mutex, which was locked when adding a traveler to its clinched_by set during .list processing.
Overkill.
With > 1.2 M segments in the system, the odds of 2 being credited for the same traveler concurrently are vanishingly small, even if many routes aren't traveled at all & others are used much more frequently.
With ~ 101 K routes in the system, the odds of 2 being credited for the same traveler concurrently are still very small, even if yadda yadda.

After:
Locking & unlocking a mutex for every segment is expensive.
Instead, handle this at the route level: Lock Route::mtx, call Route::store_traveled_segments & credit the traveler for all their segments on that chopped route in the .list line, then unlock Route::mtx.
This reduces the number of lock/unlock operations by a factor of 15.9.
Route::mtx is already used for locking labels_in_use & unused_alt_labels, but reusing this rather than adding a new one for locking HighwaySegment::clinched_by has little to no effect on contention.
Besides saving RAM, another advantage is we can keep on consolidating, and put the calls to store_traveled_segments inside the same lock as mark_label(s)_in_use. This last bit doesn't save much time though.

RAM:
This lets us scrap the HighwaySegment mutex completely, reducing sizeof(HighwaySegment).
1228371 segments...
* sizeof(std::mutex) of 8 on FreeBSD = 9.4 MB.
* sizeof(std::mutex) of 40 on Linux = 47.9 MB.

List_mtx
Big improvement for FreeBSD! It's competitive with CentOS now, even outperforming it on the low end.

Secondary benefits:
Even where the code itself doesn't change, smaller HighwaySegment objects can yield modest performance boosts, I presume due to probability of different data we need being on the same cache line.
This effect is most noticeable on BiggaTomato, my oldest machine with the least memory bandwidth.
A couple selected tasks where the effect is most visible:

@yakra
Copy link
Contributor Author

yakra commented Mar 31, 2024

f5a4e85 reduce string construction

3 problems:

  1. trim_line is a copy of the .list line with leading & trailing whitespace trimmed off, for use in userlog messages.
    This is always constructed, but only needed a small percentage of the time.
  2. fields consists of pointers into the original char buffer, with inter-field whitespace zeroed out.
    The label hash tables and labels_in_use, unused_alt_labels & duplicate_labels sets use std::strings for keys.
    This means any lookups, additions or deletions involve constructing a std::string out of a char*, which gets a bit redundant.
  3. orig_line is a carbon copy of the original .list line, with all whitespace intact, for use in splitregion .list files.
    We could use the original char* buffer for this, but that's altered & used for fields instead.

3 solutions:

  1. Use get_trim_line() to allocate once, as soon as needed. If needed > once, get_trim_line() remembers that a string was created previously, and returns it.
  2. Create fields as std::strings from the get-go, avoiding the unnecessary conversions. This leaves the original char buffer intact, meaning...
  3. The C string from the original buffer can be used for splitregion .list files. No need to copy into orig_line anymore.

Bugfix:
Along the way, fixed a bug that flagged tabs as invalid characters in userlog error messages for invalid fields.

List_str_zoom
A couple variations of the same chart:

@yakra
Copy link
Contributor Author

yakra commented Mar 31, 2024

Both commits combined, before & after

List_both

@jteresco
Copy link
Contributor

Ready to merge, or is more on the way?

@yakra
Copy link
Contributor Author

yakra commented Mar 31, 2024

Ready to merge.

@yakra
Copy link
Contributor Author

yakra commented Mar 31, 2024

I do have a couple more commits on the way now.

@yakra yakra changed the title .list processing speedups .list processing & DB speedups Mar 31, 2024
@yakra
Copy link
Contributor Author

yakra commented Mar 31, 2024

DB table speedups

waypoints (5e8ebb8)

First, let's get rid of Waypoint::point_num entirely; it's not needed.
While this doesn't reduce sizeof(Waypoint) due to alignment requirements, it does lay the foundation for smaller Waypoints in the future.

The only change to the waypoints table routine is we no longer write point_num to memory in the Waypoint object.
While this is objectively faster, in practice it's not enough to notice. The effect is obscured by noise in the data.

w
There's no difference between 5e8ebb8 & 0ff0af1. I almost left 0ff0af1 off this chart, but the FreeBSD example especially illustrates what the margin of error can be.


segments

phase 1 (5e8ebb8)

  • With point_num no longer stored in the Waypoint object, we can use an induction variable that's kept synced up as we iterate through Routes.
  • This allows us to not only avoid looking up point_num from a Waypoint, but also looking up waypoint1 and waypoint2 from a HighwaySegment. This is a Good Thing; I'm looking at removing those variables in the future.
  • Scrap the HighwaySegment::csv_line function completely, in favor of inserting its constituent components into sqlfile stream.
    • Writing 6 strings & 3 integers is more expensive than writing one string...
    • ...but this is outweighed by not having to construct 11 temporary std::string objects.

phase 2 (0ff0af1)

When constructing clinched_list for the clinched table afterward, similarly avoid all the std::string construction & concatenation, and instead create a std::pair out of segment_num and t->traveler_name.data().

s


clinched (0ff0af1)

This table takes 1.5-2x as long to write now, but that's the price we pay for greatly speeding up the segments table.
The slowdown here + the speedup in the segments table hints at the pros and cons of avoiding string construction and doing more ofstream insertions, the combined effects of which we saw in segments phase 1.
It's not an exact comparison though. Different numbers of string constructions & file writes, and we weren't creating a vector of pairs there.

  • Before: Write 3 strings to sqlfile.
    (It could have easily been one string, with the parentheses & newline included when originally constructed, but hey. *shrug*)
  • After: Write 4 strings + 1 integer that has to be converted to string as we do so.

c
Same vertical scale as the clinched table above and the 2 tables combined below, for context.
Same thing zoomed in. Similar to the waypoints table, 5e8ebb8 had no effect on the clinched table, and we can see the noise in the data in the first 2 commits.


segments + clinched combined time

Total

@yakra
Copy link
Contributor Author

yakra commented Mar 31, 2024

Ready to merge.

@yakra yakra added the database label Mar 31, 2024
@jteresco jteresco merged commit fd96ac6 into TravelMapping:master Apr 1, 2024
@jteresco
Copy link
Contributor

jteresco commented Apr 1, 2024

will use this for tonight's site update, thanks @yakra as always for all the work on this!

@rickmastfan67
Copy link

@yakra when you get the chance, can ya check your PM's on the forum. Thanks. :)

@yakra
Copy link
Contributor Author

yakra commented Apr 1, 2024

Haha, been thinking about how I need to get back there soon.

@rickmastfan67
Copy link

Haha, been thinking about how I need to get back there soon.

hehehe. Some people over there think you've passed along to the next life already. :P (j/k of course)

@yakra yakra deleted the y253ll branch April 1, 2024 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants