Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph generation: how many/which users have clinched each segment #196

Closed
jteresco opened this issue Mar 12, 2019 · 40 comments
Closed

Graph generation: how many/which users have clinched each segment #196

jteresco opened this issue Mar 12, 2019 · 40 comments

Comments

@jteresco
Copy link
Contributor

This could be a high-priority project for me in the next week or so if it's not any harder than I'm thinking. I'd like my students to be able to use it very soon.

I'd like to enable more algorithms and variations on algorithms to use the graphs. One additional bit of data that could be very interesting to work with could be the number of users or better yet a set of users who have clinched the highway segment(s) represented by each edge. This could allow all kinds of new problems to be solved interesting academically, to TM project users, or hopefully both.

From a first quick look at this, it looks like it's not hard to remember the user list for each edge, as the edges are initially created from segments, segments have a list of who clinched them, and in almost all cases, a collapsed edge should have the same list of users who have clinched each original edge. So then when the graphs are being written, the extra information could be written with each edge entry. The simplest way would be to list, maybe comma-separated, the users, but that's a lot of extra text in the files. A better way might be to assign each user a number 0 to n-1. Then the clinched info in the graphs is anonymous unless you have the mapping of users to numbers. But then each number really just represents if the user with that number has clinched the segment/edge in question. One bit of information. This means that same info could be stored in a much more compact form: a hex number with n bits, so ceiling(n/4) hex digits, where each bit corresponds to whether a specific user has clinched that segment.

@jteresco
Copy link
Contributor Author

I should bring in @yakra's recent site update changes first and go from that code.

@jteresco
Copy link
Contributor Author

This would be a new version number in the header of TMG files, and a new graph type.

I'd only do this for collapsed graphs. I am rarely using the simple format graphs anymore.

@yakra
Copy link
Contributor

yakra commented Mar 12, 2019

I'd only do this for collapsed graphs. I am rarely using the simple format graphs anymore.

#67 (comment)

@yakra wrote:

Collapsed graphs could be tricky:
http://tm.teresco.org/hb/index.php?units=miles&u=terescoj&r=nh.us004
Zoom in on NH120_N & NH120_S

@jteresco wrote:

Good point. Each segment would have to be all or nothing.

@yakra wrote:

Not a big deal, AIUI now. A hidden vertex can become a visible one, as already done with VISIBLE_HIDDEN_COLOC or HIDDEN_JUNCTION cases.

Thankfully, cases of this happening in the real world should be relatively rare, and we won't be saddled with bloated & ugly graphs.

How do we do this efficiently? Spitballing...

  • Store a canonical HighwaySegment for each HighwayGraphEdgeInfo, the same one passed to its constructor. Simple Simon.
  • When first constructed, collapsed edges are simply a copy of the corresponding simple edges. For each HighwayGraphCollapsedEdgeInfo, we can store a canonical HighwaySegment, or a clinched_by set, whatever's clever. So far so good.
  • Compressing edges: When we find a hidden vertex with 2 incident collapsed edges, compare the edges' respective clinched_by sets. Sets of different size/length -> mismatch. If same size, we have a mismatch as soon we find a traveler in one set who's not in the other.
    • If mismatch, mark vertex as visible and continue without collapsing.
    • If match, collapse edges normally.

@yakra
Copy link
Contributor

yakra commented Mar 12, 2019

One bit of information. This means that same info could be stored in a much more compact form: a hex number with n bits, so ceiling(n/4) hex digits, where each bit corresponds to whether a specific user has clinched that segment.

  • 0-9 + A-F = 16 bits. Ceiling(238/4) = 60 chars.
  • 0-9 + A-V = 32 bits. Ceiling(238/5) = 48 chars. 12 B saved * 838089 lines = 10057068 B saved in tm-master.tmg.
  • 0-9 + A-V + a-v + )-( etc. = 64 bits. Ceiling(238/6) = 40 chars. 20 B saved * 838089 lines = 16761780 B saved in tm-master.tmg.

On the downside:

  • The 2nd & even more so the 3rd options are less human readable, for anyone looking at the TMG code & trying to decipher it. Hex is simply more intuitive.
  • The 3rd option, depending on how it's implemented, could start to make things a little more computationally intensive.

A better way might be to assign each user a number 0 to n-1. Then the clinched info in the graphs is anonymous unless you have the mapping of users to numbers.

Should be easy to include. In a traditional TMG file, our number of lines is known, 2+v+e. Add one more line after all this simply listing the corresponding user IDs in order.


It makes sense to me to include the hex string before the shaping points' coords, as there are an indeterminate number of shaping points before end of line.
For example:
383842 383839 SH43 1596dee5feeef29cb1e4220eeb77425e791cf180b2d0445faeb5972236fb -39.322691 174.412524 -39.31532 174.455826 -39.294067 174.482262

@yakra
Copy link
Contributor

yakra commented Mar 12, 2019

OK, I've got a draft implemented in C++.

MEX-QROO-region.tmg:

TMG 2.0 collapsed
3 2
MEX180D@MEX180 21.097814 -86.975434
MEX180D@MEX307 21.017955 -86.854477
MEX180D@YUC/QROO 20.878541 -87.608242
0 2 MEX180D 000000000000000001004000000000000000200000000000020000010000 20.97242 -87.208314 20.950859 -87.293158 20.897907 -87.441688
0 1 MEX180D 000000000000000001004000000000000000000000000000000000000000
1 1995hoo 25or6to4 7_8 Aseluit Based8 Beerman ChrisZwolle DJCane JamesMD Qiviuq ThelwallViaduct Ugnaught2 a3 aaroads afarina along angelsfreeek atsiatas barefoot_driver baugh17 bejacob benjamingrove47 bhemphill bickendan bm7 bobcobb bogdymol boris bubaclex bulldog1979 cabiness42 capn chaddean charliezeb chesspi cinx cl94 clark1mt clong codyg1985 compdude787 cougar1989 crosboro7 cyhighways dantheman dave1693 dcharlie dcm55343 dctransit deathtopumpkins dfilpus dgolub dharwood dlsterner dnthrox dognm16 dough4872 dougtone drebbin37 drfrankenstein dsaneus dscarroll dsm4cy duke87 dylucktrocket eidderf electronym epzik8 es92 eth eucitizen extremebandman finnedude foresthills93 formulanone froggie georgiaroadgeek gimmicks gman2337 golubcar gpw griffith happy5214 harvey highplainstrvlr hoopitypoop hoss6884 htm_duke hurricanehink ibexto imgoph inagaddadavida intelati49 intheam iowahighways jackgaynor jayhawkco jbum86MASTER jeffm jerseyman4 jimvette jjbers johninkingwood jonfmorse jordancarrpeterson jpinyan jstalica julmac justjake jweeks jwood keithcavey kjslaughter klundgren kramer ks0stm kurzov lakewy17 lamsalfl lkefct lowenbrau mapcat mapmikey mariethefoxy markkos1992 master_son mc49399 mefailenglish meoberto mepaulrye michih mikeandkristie mirage45331 mojavenc morriswa moshitea mr_matt_k mvak36 mwasleski myselfalso navigator neilbert neroute2 new_friends_gr ngrier niels njroadhorse noelbotevera norheim nrm1221 ntallyn ohroadscholar okroads opspe oscar osu97gp osu_lsu paddy pahighways panda80 paulthemapguy peperodriguez2710 philimon philman pianocello130 pnrrth polabare presidentman quidditch33 radison ran4sh rebelgtp regoarrarr rgentry rickmastfan67 riehlj riiga rlee roadgeek99 roadgeek_adam roadguy2 robringstad roukan route56 rschen7754 sammi sbeaver44 scheidler scott_stieglitz sercamaro sfisher si404 sipes23 skatcher skoutr1 slorydn1 sneezy snowedin snowmatt8 sounderbruce spinoza squatchis ssgtcrusty superblu2007 synkdorbit tag42481 tarheel61581 tbwillie tckma terescoj terminal_moraine the_spui_ninja thefxexpert thehighwayman3561 thing342 tikester tjj25 tomhart twinsfan87 ua747sp usaidhello valedc03ls vcap36 vdeane verruckte_dan verumsolum vespertine vinoman64 voyager105 w9tam wadsteckel whopperman wphiii xcvxcvxcv xorodejon yakra yipcoyote 

To break down the hex string:

  • Each character stores info for traveler n thru traveler n+3.
  • The first character stores traveler 0 thru traveler 3,
  • The second character stores traveler 4 thru traveler 7, etc.
  • For each character, the low-order bit stores traveler n, and the high bit traveler n+3.
  • We only have 238 travelers, so the final character stores travelers 236 & 237 in the lower bits, with the higher two bits set to zero.

Parsing the first one above, for example:

  • The first segment has more non-0 characters, and more travelers clinching it. This segment, connecting vertex 0 & vertex 2, represents MEX-QROO MEX180D YUC/QROO MEX180.
  • There's a 1 in the position storing travelers 68 - 71. Ones bit on, stores traveler 68, epzik8.
  • There's a 4 in the position storing travelers 80 - 83. Fours bit on, stores traveler 82, griffith.
  • There's a 2 in the position storing travelers 144 - 147. Twos bit on, stores traveler 145, ngrier.
  • There's a 2 in the position storing travelers 196 - 199. Twos bit on, stores traveler 197, sneezy.
  • There's a 1 in the position storing travelers 220 - 223. Ones bit on, stores traveler 220, ua747sp.
  • The same five travelers we see at http://travelmapping.net/user/region.php?units=miles&rg=MEX-QROO
Character in array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 0 0 0 0
-----------------------------------------------------------------------------------------------------------------------
Travelers stored
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2 3 3 4 4 4 5 5 6 6 6 7 7 8 8 8 9 9 0 0 0 1 1 2 2 2 3 3
0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
0 0 1 1 1 2 2 3 3 3 4 4 5 5 5 6 6 7 7 7 8 8 9 9 9 0 0 1 1 1 2 2 3 3 3 4 4 5 5 5 6 6 7 7 7 8 8 9 9 9 0 0 1 1 1 2 2 3 3 3
3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9 3 7 1 5 9
-----------------------------------------------------------------------------------------------------------------------
                                  ^     ^                               ^                         ^           ^
                                  |     |                               |                         |           |
                                  |     |                               |                         |           \__ 220	ua747sp
                                  |     |                               |                         |
                                  |     |                               |                         \______________ 197	sneezy
                                  |     |                               |
                                  |     |                               \________________________________________ 145	ngrier
                                  |     |
                                  |     \________________________________________________________________________ 82	griffith
                                  |
                                  \______________________________________________________________________________ 68	epzik8

NH-region.tmg:

TMG 2.0 collapsed
2044 2318
...
NH120/US4@+x4plex 43.64186 -72.252298        #line 1407, vertex 1404, visible
...
1295 1404 US4,NH120 200400004200400110010000000200900004028000000000200020000006
...
1404 1194 US4,NH120 200400004200400110010000000200900004028000000000200000000006

There's a diff in the character representing travelers 208-211. Twos bit stores traveler 209: terescoj.

Notes:

  • This involved changing HighwaySegment::clinched_by from a forward_list to an unordered_set, which resulted in a significant performance hit while Augmenting travelers for detected concurrent segments. Going back to a list would mean slightly more involved coding, and a potential performance hit when compressing edges. How much of a potential performance hit is an as-yet unanswered question.
  • The compression portion of graph setup still takes only 1.4s on BiggaTomato. Not sure what it took before, but... Not bad.
  • Subgraph generation still goes by blazingly fast. This when computing the clinchedby_code for each edge as it's written. I haven't checked on how fast it's going now, or how fast it was going before, but a possible speed enhancement would be to compute & store a clinchedby_code once for each HighwaySegment after Augmenting travelers for detected concurrent segments. This code can be retrieved (rather than recalculated) and written when its corresponding edge is written.
  • The bad news: I'm experiencing occasional crashes during multithreaded subgraph writing. I'll have to go back & see if this was a problem in the existing code, or newly introduced, and then see what needs fixing.
  • Worse news: Upon closer inspection, I see I'm getting non-deterministic hex codes. There's a bug... somewhere. Maybe in edge compression, maybe in calculating the codes themselves, maybe somewhere else...

@yakra
Copy link
Contributor

yakra commented Mar 12, 2019

significant performance hit while Augmenting travelers for detected concurrent segments.

This was due to a bug in the reworked HighwaySegment::add_clinched_by. Fixed.

@yakra
Copy link
Contributor

yakra commented Mar 13, 2019

Grumble. Hex codes are looking more deterministic, but I can tell the final character frequently gets mangled. With 238 list files, we should only have the ones bit or twos bit set here, and thus have nothing > 3. But I'm seeing lots of 6s
I'm not credited for NH NH121A HamRd FreRd, but I am credited for NH NH121A OdeRd HamRd...
Mumble. Edge compression, calculating codes, assigning traveler numbers... there's a lot of moving parts here...

Buggy code at: https://github.com/yakra/DataProcessing/tree/tmg2_cpp
Diffs from the current master C++: https://github.com/yakra/DataProcessing/compare/tmg2_cpp

@yakra
Copy link
Contributor

yakra commented Mar 13, 2019

std::string HighwaySegment::clinchedby_code(std::list<TravelerList*> *traveler_lists)
{	std::string code;
	std::vector<unsigned char> clinch_vector(ceil(traveler_lists->size()/4)*4, 0);
	for (TravelerList* t : clinched_by)
		clinch_vector[t->traveler_num] = 1;
	for (unsigned short t = 0; t < traveler_lists->size(); t += 4)
	{	unsigned char nibble = 0;
		nibble |= clinch_vector[t];
		nibble |= clinch_vector[t+1]*2;
		nibble |= clinch_vector[t+2]*4;
		nibble |= clinch_vector[t+3]*8;
		if (nibble < 10) nibble += '0';
		else nibble += 55;
		code.push_back(nibble);
	}
	return code;
}

->

std::string HighwaySegment::clinchedby_code(std::list<TravelerList*> *traveler_lists)
{	std::string code;
	std::vector<unsigned char> clinch_vector(ceil(traveler_lists->size()/4)*4, 0);
	for (TravelerList* t : clinched_by)
		clinch_vector.at(t->traveler_num) = 1;
	for (unsigned short t = 0; t < traveler_lists->size(); t += 4)
	{	unsigned char nibble = 0;
		nibble |= clinch_vector.at(t);
		nibble |= clinch_vector.at(t+1)*2;
		nibble |= clinch_vector.at(t+2)*4;
		nibble |= clinch_vector.at(t+3)*8;
		if (nibble < 10) nibble += '0';
		else nibble += 55;
		code.push_back(nibble);
	}
	return code;
}

...and now I can see stuff breaking:

[83.2] Writing master TM collapsed graph file, tm-master.tmg.
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 236) >= this->size() (which is 236)

OK, now I'm getting somewhere...

Oddly, in the same session, we have processed 238 traveler list files....

@yakra
Copy link
Contributor

yakra commented Mar 13, 2019

traveler_lists->size() -> double(traveler_lists->size())

@jteresco
Copy link
Contributor Author

Wow, thanks @yakra for jumping on this! It would be really great to get a preliminary set of graphs, even if not perfect, generated this week so I could share the files and file format with my students who might want to work with this data for an upcoming project.

@yakra
Copy link
Contributor

yakra commented Mar 13, 2019

Just committed that last bugfix; graphs are looking good so far.

Next up, uploading the latest code to noreaster, compiling, and producing some graphs there.

@yakra
Copy link
Contributor

yakra commented Mar 13, 2019

@jteresco
Copy link
Contributor Author

This is fantastic.

A couple quick requests if you have time to make it happen.

  • let's only generate the collapsed format for these
  • let's give the format a new name, like "traveled"

@yakra
Copy link
Contributor

yakra commented Mar 13, 2019

let's only generate the collapsed format for these

Already done; I never changed over the simple format.
The new format did replace the existing collapsed format, though,
Are you thinking, retain the existing collapsed format, and add the new "traveled" one as well, making three versions of each graph instead of two?

The tricky bit:

  • Hidden traveler-segment endpoints such as NH US4 +NH120 would be hidden in a traditional collapsed graph, and visible in the new traveled format.
    • bool is_hidden could change to int visibility
  • We'd need two sets of collapsed edges, one collapsed around these vertices and one not.
    • This would take a bit more brainstorming, but is not impossible.
    • Unless we decide it's OK to leave these visible in collapsed graphs, but that seems imperfect & inelegant, going against the original purpose of collapsed graphs.

let's give the format a new name, like "traveled"

NH-region-traveled.tmg, containing TMG 2.0 traveled in the header?
Do the new graphs look OK otherwise?

@jteresco
Copy link
Contributor Author

Yes, I'm thinking 3 for each instead of 2. I don't want to just replace the old ones since there's a lot of code, HDX and some of my Java code for class stuff, that I don't want to have to rewrite to ignore the new information.

I'll worry later about getting the graphs download page to find the new stuff.

I hope to look at the new graphs and try some things out with them in the next day or two.

@jteresco
Copy link
Contributor Author

Quick thought that you are free to ignore, @yakra. Many of our graphs will have lots of segments with very few travelers, meaning many 0 digits. I wonder about an alternate "sparse" format for the codes in those cases, where we instead have a list of actual traveler numbers, maybe again in hex to save a few characters. Plus side: save a little space. Negative side: complication of our format and the need to calculate both possibilities on graph generation and the ability to decode both possibilities on any code that loads in the graphs.

@yakra
Copy link
Contributor

yakra commented Mar 13, 2019

  • We'd need two sets of collapsed edges, one collapsed around these vertices and one not.
    • This would take a bit more brainstorming, but is not impossible.

On the plus side, I recently upgraded BiggaTomato to 8 GB, so I've less to worry about if we're introducing more data.

But, on to more brainstorming...
A HighwayGraphEdgeInfo and a HighwayGraphCollapsedEdgeInfo are largely the same data structure, just that a HighwayGraphCollapsedEdgeInfo has the added intermediate_points list. Which can of course be empty. Indeed, a collapsed edge with no intermediate points is written the same as its corresponding simple edge once it makes it into the .tmg.

An idea to save RAM, that could also simplify (?) things if we start to get into 3+ graph data sets:
Get rid of the simple edge class, and just use collapsed edge objects.

  • Original, uncollapsed edges would stay in the simple dataset, and also exist in the collapsed set where appropriate.
  • When compressing edges, the new collapsed edges would be added to the new collapsed/traveled datasets. The original edges would be removed from these datasets, and remain in the simple dataset.
  • Exact mechanics TBD.

Yes, I'm thinking 3 for each instead of 2. I don't want to just replace the old ones since there's a lot of code, HDX and some of my Java code for class stuff, that I don't want to have to rewrite to ignore the new information.

I left the original collapsed_tmg_line function in; it's just unused at the moment.
Just have to sort out how to work with the 3 datasets.

Quick thought that you are free to ignore, @yakra. Many of our graphs will have lots of segments with very few travelers, meaning many 0 digits. I wonder about an alternate "sparse" format for the codes in those cases,

One thought I had was run-length encoding for zeros. Follow a 0 by a character (0-F?) encoding how many (additional) zeros there are.

where we instead have a list of actual traveler numbers, maybe again in hex to save a few characters. Plus side: save a little space. Negative side: complication of our format and the need to calculate both possibilities on graph generation and the ability to decode both possibilities on any code that loads in the graphs.

Thinking about this some more, I bet the RLE scheme might save more space, and remove the need to determine which encoding scheme to read/write.

@jteresco
Copy link
Contributor Author

For the moment, it would be really great and get me through the initial usage this semester if you could generate the set again but with "traveled" instead of "collapsed" in both the filenames and the first line of each file. Then we can think about how best to proceed longer-term.

@jteresco
Copy link
Contributor Author

Also, don't feel any obligation to generate new graphs based on my previous comment. The ones there can easily be used by my students. I might just rename them and change the one word at the top and go with them.

I can't wait to have time to enhance HDX to support the new format and experiment with ways to visualize this info.

@yakra
Copy link
Contributor

yakra commented Mar 14, 2019

run-length 0 encoding

First draft: http://yakra.teresco.org/tmtools_demos/tmg_2.0a/ (same link as above)
RLE encoded: http://yakra.teresco.org/tmtools_demos/tmg_2.0a_rle/

When a 0 is encountered in a clinchedby_code string, the following character encodes the number of additional 0s to follow. Thus
1000 998 TCHPEI,PE1 20B2021001403200101404200C0B207 46.190675 -63.37085
expands to
1000 998 TCHPEI,PE1 20000000000002000101400002010040000020C000000000000200000000 46.190675 -63.37085
(same example, with color-coding for a visual aid, on the forum)

Run lengths of up to 16 (encoded as 0F) are supported. So right now, with 238 travelers, a segment clinched by nobody
000000000000000000000000000000000000000000000000000000000000
compresses to
0F0F0F0B.

Sure, we could try to encoder longer strings of zeros, but that would probably involve the use of a delimiter (unless we move beyond standard hex digits).
• This would use more space for strings of 1-16 zeros,
• and yield no savings for strings of 17-32 zeros (0F00 -> 010| ... 0F0F -> 01F|).
• We'd only save space on strings > 32 zeros long.
Untraveled (by anyone) segments aside, how many of cases these are there likely to be? They could possibly even outweighed by the shorter strings requiring a longer encoding.
Space savings could be minimal, maybe even slightly increased.
And then, things start to get a little more computationally complex too.
KISS principle.

Overall graph size is reduced by ~35%.
The only file to get larger (due to more 0->00 conversions than strings of 3+ zeros) was usai-system.tmg, which grew from 2188732 to 2235543 B.

@jteresco
Copy link
Contributor Author

I have what I need for now - I grabbed the graphs in tmg_2.0a, renamed the collapsed files and replaced "collapsed" with "traveled". Many, many thanks for getting this off the ground so quickly.

@jteresco
Copy link
Contributor Author

Regarding RLE vs. the hex string, I can see advantages to either. The hex string is a little simpler to deal with, but 35% savings in graph size is something to consider.

On the RLE side, I agree limiting to a single hex digit per "run" makes a lot of sense to me.

As far as usability for my purposes, I think the hex string will be easier to explain and a tad easier to use.

I'm going to build a Java program today to read in these graphs as a starting point for my students' projects.

@yakra
Copy link
Contributor

yakra commented Mar 15, 2019

As far as usability for my purposes, I think the hex string will be easier to explain and a tad easier to use.

OK, let's go with that then. If the students are just starting to learn about this, let's not go confusing matters with extra info or procedures.

I'm going to build a Java program today to read in these graphs as a starting point for my students' projects.

I can't wait to have time to enhance HDX to support the new format and experiment with ways to visualize this info.

Same here; it'll be fun to see the results.

I have what I need for now - I grabbed the graphs in tmg_2.0a, renamed the collapsed files and replaced "collapsed" with "traveled".

OK, good to hear. I'll save that change for when I make the changes for 3 graph sets, and avoid cluttering the commit history with short-lived edits that get undone in the next version.

Odds & ends

  • You only said "a new version number in the header"; I made the "2.0" assumption. Would you prefer a different number, or is 2.0 fine?
  • Thoughts on adding "-collapsed" to TMG filenames, E.G. AB-region-collapsed.tmg? More descriptive as to what type of graph the one with the vanilla filename is, and more convenient if I want to mv *collapsed.tmg foo/, or rm *collapsed.tmg, etc.

@yakra
Copy link
Contributor

yakra commented Mar 15, 2019

Just dumping this here in case I want to refer to it in the future.

std::string HighwaySegment::clinchedby_code(std::list<TravelerList*> *traveler_lists)
{	// Return a hexadecimal string encoding which travelers have clinched this segment, for use in "traveled" graph files
	// Each character stores info for traveler #n thru traveler #n+3
	// The first character stores traveler 0 thru traveler 3,
	// The second character stores traveler 4 thru traveler 7, etc.
	// For each character, the low-order bit stores traveler n, and the high bit traveler n+3.
	std::string code;
	std::vector<unsigned char> clinch_vector(ceil(double(traveler_lists->size())/4)*4, 0);
	for (TravelerList* t : clinched_by)
		clinch_vector.at(t->traveler_num) = 1;

	bool prev_zero = 0;
	for (unsigned short t = 0; t < traveler_lists->size(); t += 4)
	{	unsigned char nibble = 0;
		nibble |= clinch_vector.at(t);
		nibble |= clinch_vector.at(t+1)*2;
		nibble |= clinch_vector.at(t+2)*4;
		nibble |= clinch_vector.at(t+3)*8;
		if (nibble < 10) nibble += '0';
		else nibble += 55;
		// run-length encoding for zeros
		if (nibble == '0')
		     {	if (code.empty() || code.back() == 'F')
				prev_zero = 0;
			if (prev_zero)
				if (code.back() == '9') code.back() = 'A';
				else code.back()++;
			else {	prev_zero = 1;
				code += "00";
			     }
		     }
		else {	prev_zero = 0;
			code.push_back(nibble);
		     }
	}
	return code;
}

More efficient to move the "int"->printable char conversion into the final else block, but that's all academic if this won't be implemented anyway.

@yakra
Copy link
Contributor

yakra commented Mar 17, 2019

Note to self, remember to:

  • double-check & make sure I'm using the visibility variable correctly.
  • update GraphListEntry as needed, and step thru graph_vector by threes.
  • write_master_tmg_traveled

@yakra
Copy link
Contributor

yakra commented Mar 18, 2019

Progress report:

  • Using only one edge class now. HighwayGraphEdgeInfo is gone; HighwayGraphCollapsedEdgeInfo renamed to HGEdge.
  • Vertices now have 3 sets of incident edges:
    std::list<HGEdge*> incident_s_edges; // simple
    std::list<HGEdge*> incident_c_edges; // collapsed
    std::list<HGEdge*> incident_t_edges; // traveled

siteupdate.cpp compiles, but will not run properly yet, as I still need to address the items in the list above.

@yakra
Copy link
Contributor

yakra commented Mar 20, 2019

The new code is mostly in place. There are several bugs to sort out, and the first one is a doozy.
Before I get into detail on what's happening, I'll just note what I've been using for testing:
https://github.com/TravelMapping/HighwayData/tree/23c9c806d0432a796bf5fcfd90b3f7d5b4cce9d4
https://github.com/TravelMapping/UserData/tree/2be0b9f0e13c14d05b399cf5c76f9c3e0097258f

@yakra
Copy link
Contributor

yakra commented Mar 22, 2019

Current code is at https://github.com/TravelMapping/DataProcessing/tree/tmg2trav_bugs <--branch deleted
or for a more permanent link, here here. <--link still works
tmg2_cpp branch, also deleted, was at a6b53ef

I think I'm writing, and maybe also reading, thru a rogue pointer.
The symptoms are varied, and... interesting.
I'll skip the detailed description unless anyone really wants to know, but suffice it to say that things are pretty broken and unusable right now.
I have some ideas where to start looking, but fixing this should take a bit of time & a good amount of new debug code.

It may be more productive in the near-term to try an implementation in siteupdate.py
I may bounce back & forth between that & debugging this...

@yakra
Copy link
Contributor

yakra commented Mar 24, 2019

HGEdge::HGEdge(HGVertex *vertex)
{ // build by collapsing two existing edges around a common
// hidden vertex waypoint, whose information is given in
// vertex
written = 0;
// we know there are exactly 2 incident edges, as we
// checked for that, and we will replace these two
// with the single edge we are constructing here
HGEdge *edge1 = vertex->incident_c_edges.front();
HGEdge *edge2 = vertex->incident_c_edges.back();

Cue the "One cannot simply" meme.
We'll need separate construction processes for collapsed and traveled edges. When creating a new traveled edge, we need to look at vertex->incident_t_edges instead, because incident collapsed edges & incident traveled edges can differ.

I don't believe this bug is responsible for the full set of symptoms I was seeing; I think it should only result in garbage graph data. Still needs to be fixed though.

How to most efficiently do this will require a bit of thought, for C++ and Python alike.


Meanwhile, I've been working on the Python implementation.
Taking a more staged, step-by-step approach.

  • Several class & variable names changed, in preparation for the eventual full implementation.
  • Simple & collapsed edges combined into a single HGEdge class.
    • The simple edge constructor (with a few modifications) replaces the initial collapsed edge constructor that copied over the data from the corresponding simple edge.
  • Nothing is broken so far... TMK ;)
    • All graphs have the correct number of vertices and edges in the header.
    • I have yet to inspect graphs in the HDX.
  • Implementation of the traveled graphs has not begun yet. Gotta make sure the changes leading up to it don't break anything first.
  • https://github.com/yakra/DataProcessing/tree/tmg2_Python

@yakra
Copy link
Contributor

yakra commented Mar 25, 2019

Hidden vertices not collapsed in traveled graphs

These have exactly two incident edges, whose sets of clinched_by travelers are not identical.
(HighwayData @ a113955992c373c22e7d5e053b4c26912fe0ee97)
(UserData @ bab03f1477df35fcbe6c2684423b0cf967a3e6e6)

route
@
waypoint
segment 1 cl.
by
only on
segment 1
segment 2 cl.
by
only on
segment 2
ak.ak002
@
+SuzAve
AK AK2 HagAve +SuzAve 6 bhemphill AK AK2 +SuzAve +X235384 5  
ca.i605
@
+10A
CA I-605 10 +10A 37 johninkingwood CA I-605 +10A 11 36  
co.co145
@
+X05
CO CO145 ManAve +X05 10   CO CO145 +X05 CR578 11 oscar
ct.ct012
@
+GunRd
CT CT12 CT184 +GunRd 17 froggie CT CT12 +GunRd CryLakRd 16  
eng.a0057
@
+X07
ENG A57 +X06 +X07 0   ENG A57 +X07 +X08 1 si404
eng.a0551
@
+X02
ENG A551 A554_E +X02 0   ENG A551 +X02 A553 1 si404
eng.a3051
@
+X01
ENG A3051 A334 +X01 2 si404 ENG A3051 +X01 SwaLn 1  
ky.ky2373
@
+End
KY KY2373 RigRd +End 2   KY KY2373 +End KY3076 3 ohroadscholar
mi.us023
@
+MI108
MI US23 HisMillCrkSP +MI108 14   MI US23 +MI108 I-75(338) 16 afarina bhemphill
nh.nh120
@
+x4plex
NH US4 NH120_N +NH120 16 terescoj NH US4 +NH120 NH120_S 15  
nj.i080
@
+47
NJ I-80 47A +47 83 jayhawkco mr_matt_k NJ I-80 +47 47B 81  
nj.i676
@
+5A
NJ I-676 5 +5A 69   NJ I-676 +5A 5B 71 justjake rlee
nl.nl510
@
+X442854
NL NL510 +X797607 +X442854 1 oscar NL NL510 +X442854 +X721066 0  
nl.nl510
@
+X721066
NL NL510 +X442854 +X721066 0   NL NL510 +X721066 +X614767 1 oscar
sc.i077
@
+9
SC I-77 9A +9 67 rlee SC I-77 +9 9B 66  
svk.d001
@
+1A
SVK E58 1(D1) +1A(D1) 10 cinx SVK E58 +1A(D1) 4(D1) 9  
tn.us011
@
+I-40(380)
TN US11 MonRd +I-40(380) 13 Ugnaught2 TN US11 +I-40(380) BucDr 13 jonfmorse
wa.wa020
@
+x05
WA WA20 MainSt_Cou +x05 15 chaddean WA WA20 +x05 LibRd 15 mapcat
wa.wa542
@
+x110
WA WA542 +x108 +x110 8 terescoj WA WA542 +x110 +x114 7  
yt.yt010
@
+DolVarCrk
YT YT10 +X195719 +DolVarCrk 1 oscar YT YT10 +DolVarCrk +X553466 0  

I'll give these a look over, and for the ones in my regions, see about either unhiding them or collapsing into visible points.

@yakra
Copy link
Contributor

yakra commented Mar 27, 2019

Python progress report:

  • Successfully (I think) split up the creation of the 3 (simple/collapsed/traveled) vertex/edge sets.
  • tm-master-traveled.tmg has 20 more vertices & edges than tm-master-collapsed.tmg.
  • The 20 extra vertices are the ones listed in the previous comment, per a DIFF of just the vertex TMG lines.
  • I didn't check the differences in the edges, as that's not so quickly & easily inspected. But the fact that we have 20 more checks out. Presuming this is OK.
  • traveler_num and hex codes are not yet implemented.
  • subgraphs are not yet implemented.
  • https://github.com/yakra/DataProcessing/commits/tmg2_Python/siteupdate

@yakra
Copy link
Contributor

yakra commented Mar 27, 2019

subgraphs & traveler_num implemented.
I'm finding creating the hex codes a bit more awkward in Python. Giving my latest attempt a try.

@yakra
Copy link
Contributor

yakra commented Mar 27, 2019

Hex codes are in place. The first couple I checked out look OK, but doing a little more QA is still a good idea.
Afterwards...

  • a bit of code cleanup
  • time trials
  • potential speed improvements

Once the Python changes get merged into master (or I suppose even before), I'll start the C++ implementation over, mirroring the changes commit-by-commit.

@yakra
Copy link
Contributor

yakra commented Mar 29, 2019

Did more QA. Looks good.

Time trials on BiggaTomato:
Graph generation altogether takes ~ 48-50% more time.

  • Subgraph generation takes ~ 54-57% more time.
    • Area graphs, already fairly slow on Python, take ~5% more time.
    • Other subgraphs take between 111% more (continents) & 180% more (system graphs).
  • Writing tm-master-traveled.tmg takes ~7.6% more time than writing tm-master-simple.tmg & tm-master.tmg combined.
  • Some of this is due to the simple fact of writing a new file for each trio of graphs. Some of this is due to generating all the hex codes.
  • Creating edges started out at 17.9s in a7a3b3d. Getting rid of the separate simple edge class & simplifying initial edge construction (15c7bd6) got us down to 11.1s. With the introduction of the new set of traveled edges, we're back up to 16.0s.
  • Compressing collapsed edges: 2.1s -> 3.4s.

Potential speed improvements:

Subgraph generation still goes by blazingly fast. This when computing the clinchedby_code for each edge as it's written. I haven't checked on how fast it's going now, or how fast it was going before, but a possible speed enhancement would be to compute & store a clinchedby_code once for each HighwaySegment ... This code can be retrieved (rather than recalculated) and written when its corresponding edge is written.

This conflicts with #199. We can have one enhancement or the other, but not both.
I'll still code this anyway, in a separate branch, because I'm curious to see the results.

@yakra
Copy link
Contributor

yakra commented Mar 29, 2019

Storing & retrieving clinchedby_codes yields roughly 10% time savings, or 18.2s on noreaster.

Another way of looking at it is that full TMG 2.0 traveled implementation added 66s of processing time vice 15c7bd6, so we've managed to trim back > 27% of that. But still... outside this context, 18.2s savings altogether is a bit underwhelming.

Edit: Add in > 28 MB of RAM for the strings, and I'm less than wild about this.
Permanent links, for when I delete the branch:
4728328
yakra@769b7eb

I'll have a crack at #199, and see what filesize savings and processing time penalty it yields.
Edit: #199 takes only 4.7s (3%) more on BiggaTomato. Let's go that route...

@yakra
Copy link
Contributor

yakra commented Apr 3, 2019

The C++ redo is going well.
Just squished an obscure bug that resulted in traveled edges with missing intermediate points.
Same stage of development as siteupdate.py was in #196 (comment):

* Raw collapsed or traveled TMGs are not directly DIFFable, because vertices, edges and intermediate points are not ordered deterministically. This required an intermediate step, a separate program to sort & reorder TMGs into a "canonical" file before DIFFing.

@yakra
Copy link
Contributor

yakra commented Apr 4, 2019

C++:

  • traveler_num & hex codes implemented.
  • tm-master-traveled.tmg looks good, with no DIFF from the version produced by siteupdate.py in traveled graph implementation (Python) #201.
  • But about maybe 6% of the time, the program will crash while writing tm-master-traveled.tmg, with nothing output to the shell to tell me what's happening. This will require some marathon debugging.
    • Edit: In multi-threaded operation, The traveler_lists container could get corrupted, due to being written without a mutex. Fixed.
    • Right now, running a shell script that keeps executing siteupdate in a loop until it crashes. SFSG with 2 successful runs. I'll leave it running while at work; if all goes well I'll get ~50 runs out of it.
    • After that, a bunch of extra debug output will need to be removed or commented out.
  • Oh, yes. Home from work to find the shell script still chugging away, with >56 complete runs, > 11:20:49.5 wall time, with no crashes.

@yakra
Copy link
Contributor

yakra commented Apr 24, 2019

HidClPtsLog

Dead-end branch, to be deleted.
Temporary log file used to make sure I was on the right track during Python development.
It can be automatically merged back to into yakra/tmg2_Python, but won't work without small modifications, as the is_hidden variable no longer exists.

tmg2_
Python
HidCl
PtsLog
commit
fbd4d60 Merge pull request #71 from yakra/tmg2_Python
71bbd81 comment: region -> segment
6c30ad4 hidden_clinch_points.log
4669828 HGEdge: store canonical HighwaySegment instead of region

Rather than have this as its own discrete loop, it could be integrated into edge compression.
How much extra time does it take up?

@yakra
Copy link
Contributor

yakra commented Jun 20, 2019

I might have left this open till both #201 & #209 got merged, IIRC.
@jteresco, OK to close this?

@jteresco
Copy link
Contributor Author

Yes, I think so. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants