Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

false negatives for DISCONNECTED_ROUTE #610

Closed
yakra opened this issue Oct 29, 2023 · 9 comments · Fixed by #617
Closed

false negatives for DISCONNECTED_ROUTE #610

yakra opened this issue Oct 29, 2023 · 9 comments · Fixed by #617

Comments

@yakra
Copy link
Contributor

yakra commented Oct 29, 2023

HighwayData @ 0d775809a0456ed81bbfa5a62632b7fa6d72477c
UserData @ bc50e000f29148e60761613879ca41e84bc4a0ea
DataProcessing @ 03a36cab6f56c4f060bc41ce4739707a03969a15

https://forum.travelmapping.net/index.php?topic=3512.msg33027#msg33027

This brings us to an even bigger problem...
I'd expect that this would be throwing up DISCONNECTED_ROUTE errors all over the place.
But it's not. There are only 2 outside the US.
This means there's a bug in the datacheck routines and/or a scenario in the underlying highway data that I didn't anticipate. The debug session for this may be a pretty deep dive, one I don't have the time to get into right away.

First thoughts:

  • A connection that initially passes the datacheck is later invalidated after a Route is reversed. Do case studies.
  • Taking note of when we attempt to reverse routes that are already reversed will be instructive.
  • flippedroutes.log will be valuable. Lots of cases to study.
@yakra
Copy link
Contributor Author

yakra commented Feb 28, 2024

#7239 may or may not have changed some of the symptoms. Check before & after.

@yakra
Copy link
Contributor Author

yakra commented Mar 2, 2024

  • A connection that initially passes the datacheck is later invalidated after a Route is reversed. Do case studies.

This happens.
https://github.com/TravelMapping/HighwayData/blob/2decc64/hwy_data/_systems/indnh_con.csv#L14

  1. First, we try to connect indup.nh044jaj & indup.nh044-e.
    The 1st one's end matches to the 2nd one's beginning, and we know both Routes are in the right direction for the ConnectedRoute; nothing needs to be reversed. So far so good.
  2. Next, we try to connect indup.nh044-e & indup.nh044-w.
    As noted on the forum, bad data; these are 2 halves of a split alignment like I-35E & I-35W.
    The 1st one's end does not match the 2nd one's beginning, but both have the same starting & ending points.
    Either route could in theory be reversed, yielding a loop.
    In practice, the beginnings are checked before the ends for matching coords, so indup.nh044-e gets reversed.
    Thereby breaking the first connection, with indup.nh044jaj. There's our false negative.

  • Taking note of when we attempt to reverse routes that are already reversed will be instructive.

I was semi-expecting this to happen, but adding some debug text shows it doesn't.
(I could create some scenarios where it does...)
It makes sense -- each chopped route other than the 1st & last is checked twice:
• With the route before it
• With the route after it
Under normal conditions, once a route is checked the 1st time, we know what direction it should be in, forward or reversed, based on its relationship with the previous route. When we check it the 2nd time, either it doesn't need to be reversed or already has been.

Nope:
Check & potentially reverse chopped route 0 by itself, then only reverse the 2nd route in each pair.
Won't work when Route Q & Route R have an NMP at their shared border point. If R should be reversed, we have no way of knowing, even if it has a good connection with Route S.

Not quite:

auto flag = [&]()
{	Datacheck::add(r, r->con_beg()->label, "", "",
		       "DISCONNECTED_ROUTE", q->con_end()->root_at_label());
	Datacheck::add(q, q->con_end()->label, "", "",
		       "DISCONNECTED_ROUTE", r->con_beg()->root_at_label());
	cr.disconnected = 1;
	q->set_disconnected();
	r->set_disconnected();
};
r->con_mismatch();
if ( q->points.size > 1 && r->points.size > 1 && !r->con_beg()->same_coords(q->con_end()) )
    if      ( q->con_beg()->same_coords(r->con_beg()) )
	if  ( q->is_disconnected() )
		q->set_reversed();
	else	flag();
    else if ( q->con_end()->same_coords(r->con_end()) )
	r->set_reversed();
    else if ( q->con_beg()->same_coords(r->con_end()) )
    {	if  ( q->is_disconnected() )
		q->set_reversed();
	else	flag();
	r->set_reversed();
    }
    else	flag();
  • 36 new errors (18 pairs) in 27 (mostly unexpected, incl. eastern US) regions.
  • Any diffs if I use HighwayData @ 069aa8a? In db on lab2. Never mind.
  • To make searching easier, modify no-build & build so that only Q gets the datacheck added.
  • Review all 18 new errors & make sure everything makes sense.
    Nope. False positives for routes @ beginning, e.g. va.us033.
  • r->con_beg() -> r->points.begin() and r->con_end() -> &r->points.back()?

Maybe?

  • if ( q->is_disconnected() ) -> if ( q->is_disconnected() || q == cr.routes[0])
  • or better yet reversed for optimal short-circuiting
  • Review all 5 new errors & make sure everything makes sense.
  • Or, keep as above, with a separate check for chopped route 0 by itself?
    Nah, don't bother. Would only save time when Q is reversed under no-build (IE, when q->con_beg() has a coord match). This happens only 19 times @ 2decc64. This would increase bloat & maintenance hassles, decrease readability, and probably be bad for instruction cache.

More?
Could store signed char Route::direction where -1 = reverse, 0 = unknown/TBD, and 1 = forward.
Probably won't be necessary though. The proposal above should work. Hopefully. Haven't gotten into the gory details yet.


ToDo: Finish looking at India.

  • flipped routes
    egrep -f <(grep ^ind _387c/logs/flippedroutes.log | sed -r 's~(.*)~\1,|\1$~') ~/tm/HighwayData/hwy_data/_systems/indnh_con.csv
    The ones othe than NH44 make sense in the HB; that's all the attention I'm giving them.
  • loops / split alignments
    if	( q->con_beg()->same_coords(r->con_beg()) )
    {	if ( q->con_end()->same_coords(r->con_end()) )
    		std::cout << "DEBUG\tDisco\tLoop\t" << q->root << '\t' << r->root << std::endl;
    	q->set_reversed();
    }
    DEBUG	Disco	Loop	indmh.nh044-w	indmh.nh044-e
    DEBUG	Disco	Loop	indmp.nh044-w	indmp.nh044-e
    DEBUG	Disco	Loop	indup.nh044-e	indup.nh044-w <--Done, case study above.
    

@yakra
Copy link
Contributor Author

yakra commented Mar 3, 2024

5 new chopped routes in 3 connected routes

  • chng;G4213;;;chnhb.g4213,chnsn.g4213,chnhb.g4213fan
    Erroneously T-shaped route.
    chnhb.g4213 end = chnsn.g4213 begin: Both stay forward.
    chnsn.g4213 begin = chnhb.g4213fan end: Both reversed, breaking chnsn.g4213.
  • indnh;NH44;;;indtn.nh044,indka.nh044,indap.nh044,indtg.nh044,indmh.nh044-w,indmh.nh044-e,indmh.nh044ama,indmp.nh044seo,indmp.nh044-wseo,indmp.nh044-eseo,indmp.nh044,indup.nh044,indmp.nh044cha,indup.nh044til,indmp.nh044ban,indup.nh044jha,indmp.nh044dat,indmp.nh044-w,indmp.nh044-e,indmp.nh044gwa
    See below
  • indnh;NH44;;;indup.nh044jaj,indup.nh044-e,indup.nh044-w,indup.nh044mat,indhr.nh044far,inddl.nh044,indhr.nh044,indpb.nh044,indhp.nh044cha,indpb.nh044kau,indhp.nh044moh,indpb.nh044pat
    See aboveand below

@yakra
Copy link
Contributor Author

yakra commented Mar 3, 2024

ver descr log diffs objdump diffs
q lambda; temporarily disable Datacheck::add(r,... only report q for each pair don't care
q2 if (q->is_disconnected()) 18 new errors, incl FPs @ 1st Route, e.g. va.us033 don't care
q3 or q == cr.roots[0] no FPs, just 5 new errors now don't care
q3r if (q == cr.roots[0] or q->is_disconnected()) none don't care
q4 if reversing both, reverse r iff q chnhb.g4213fan not flipped don't care
q5 combine conditionals none Some MOV or LEA operands differ by 8 or 16 B. No more or fewer operations though; no different size. Weird.
q6 r->con_beg() -> r->points.begin() and r->con_end() -> &r->points.back() none TBD
Lose the extraneous ternaries and setting up the function call. Can only be a Good Thing.
Call to begin() most likely optimized away.
q7 q_beg & q_end via ternary, not function call LOL don't bother none don't care
q8 reordered for optimized short-circuiting see below don't care
should be the same # ops as q6 though
q9 Q-xor-R cases inspect S none don't care

@yakra
Copy link
Contributor Author

yakra commented Mar 5, 2024

debug text

//r->con_mismatch();
std::cout << "DEBUG: "; // This only works single-threaded
std::cout << (( q->points.size > 1 && r->points.size > 1 && !r->points.begin()->same_coords(q->con_end()) ) ? '@' : '!');
std::cout << (( q->con_beg()->same_coords(r->points.begin()) ) ? 'Q'  : '_');
std::cout << (( q->con_end()->same_coords(&r->points.back()) ) ? 'R'  : '_');
std::cout << (( q->con_beg()->same_coords(&r->points.back()) ) ? "B " : "_ ");
std::cout << q->root << '\t' << r->root << std::endl;

//if ( q->points.size > 1...
grep 'DEBUG: @' ST/logs/siteupdate.log | egrep -v 'Q__|_R_|__B'
DEBUG: @___ wy.nentrd	mt.nentrd
DEBUG: @___ indtg.nh044	indmh.nh044-w
DEBUG: @QR_ indmh.nh044-w	indmh.nh044-e
DEBUG: @QR_ indmp.nh044-w	indmp.nh044-e
DEBUG: @QR_ indup.nh044-e	indup.nh044-w

Top 2: Disconnected in the truest sense: gap in the ConnectedRoute, sans chopped Route
Bottom 3: W-E split, as above

grep 'DEBUG: !' ST/logs/siteupdate.log | egrep -v '___'
DEBUG: !__B deubb.a010	deube.a010
DEBUG: !__B ks.i435	mo.i435
DEBUG: !__B indmp.nh044-wseo	indmp.nh044-eseo

Top 2: Full beltways in 2 regions. Not errors.
Bottom 1: Would have been @QR except indmp.nh044-wseo got reversed previously, making a W-E split look like a beltway.

@yakra
Copy link
Contributor Author

yakra commented Mar 5, 2024

_q8: reordered for optimized short-circuiting

if ( q->points.size > 1 && r->points.size > 1 && !q->con_end()->same_coords(r->points.begin()) )
	if	( q->con_end()->same_coords(&r->points.back()) )
		r->set_reversed();
	else if ( q->con_beg()->same_coords(&r->points.back()) )
	  if	( q == cr.roots[0] || q->is_disconnected() )
	  {	q->set_reversed();
		r->set_reversed();
	  }
	  else	flag();
	else if ( q->con_beg()->same_coords(r->points.begin())
	     && ( q == cr.roots[0] || q->is_disconnected() ))
		q->set_reversed();
	else	flag();

~/diff _q5/logs/flippedroutes.log _q8/logs/flippedroutes.log

35c35
< indmh.nh044-w
---
> indmh.nh044-e
45a46,47
> indmp.nh044-e
> indup.nh044-w
  1. Able to reverse indmh.nh044-w xor indmh.nh044-e. First one tested gets the nod.
  2. indmp.nh044-e can't connect to the routes before & after at the same time. It'll cause an error, whether as R or Q.
    • As above, either this xor indmp.nh044-w is to be reversed.
    • _q5 first attempts to reverse Q, sees it's not reversible, and flags an error.
    • _q8 first reverses R, then moves on to the next Q+R pair. It then attempts to reverse both Q and R, sees Q is not reversible, and flags an error.
      The same error happens; it just takes one step longer to detect, during which time we get one more reversal.
  3. indup.nh044-w works the same way, except that during the 2nd step _q8 only tries to reverse Q alone rather than both Q & R.

~/diff _q5/logs/datacheck.log _q8/logs/datacheck.log

5722d5721
< indmh.nh044-e;NH44_S;;;DISCONNECTED_ROUTE;indmh.nh044ama@MH/MP
5791a5791
> indmp.nh044-e;NH44_S;;;DISCONNECTED_ROUTE;indmp.nh044gwa@NH44-W/44-E
5795d5794
< indmp.nh044-w;NH44_N;;;DISCONNECTED_ROUTE;indmp.nh044-e@NH44_S
6111d6109
< indup.nh044-e;NH19/44-W;;;DISCONNECTED_ROUTE;indup.nh044-w@NH44/44-E
6113a6112
> indup.nh044-w;NH44/44-E;;;DISCONNECTED_ROUTE;indup.nh044mat@NH19
  • indup.nh044-e and indup.nh044-w are described in no.3 for the diffs above.
  • indmp.nh044-e and indmp.nh044-w are described in no.2 for the diffs above.
  • For the indmh.nh044-e & -w pair, we're able to reverse either Q xor R.
    Q specifically because in the previous iteration, it was flagged as disconnected, and thus doesn't have any specific direction "locked in", just as if it were the beginning of the route.
    • _q5 first reverses Q (indmh.nh044-w), then moves on to the next Q+R pair.
      Our new Q (indmh.nh044-e) needs to be reversed to link up with R (indmh.nh044ama), but we can't because its direction was locked in by the previous iteration.
      Thus an error is flagged.
    • _q8 first reverses R (indmh.nh044-e), leaving Q as-is.
      We have a loop route, and move on to the next Q+R pair.
      Our new Q has just been reversed, and can now link up with R. R gets flipped, and on we go.
      We're fine until the next error in the play-by-play, which is handled the same way by both siteupdate_q5 & _q8.
    • If the chopped routes were to have their points in the opposite order, the reverse would be the case:
      _q5 would be fine, and
      _q8 would flag an error.

That last one's an interesting case.
Why should point order affect whether an error is flagged or not?
Shouldn't siteupdate try to connect what it can, and only flag an error when it truly can't?
More on this in a future comment below.

@yakra
Copy link
Contributor Author

yakra commented Mar 5, 2024

play-by-play

Bold: W/E split alignments causing errors
🔴 diffs: in flippedroutes.log / datacheck.log for _q5, not _q8
🟢 diffs: in flippedroutes.log / datacheck.log for _q8, not _q5

Q R   no-build   _q5   _q8
indtn.nh044 indka.nh044   fwd match   fwd match   fwd match
indka.nh044 indap.nh044 1 R reversed 1 R reversed 1 R reversed
indap.nh044 indtg.nh044 2 R reversed 2 R reversed 2 R reversed
indtg.nh044 indmh.nh044-w   no conn;
FlagErr
  no conn;
FlagErr
  no conn;
FlagErr
🔴indmh.nh044-w 🟢indmh.nh044-e 3 Q reversed

3 Q reversed
(disc.)
3 R reversed

indmh.nh044-e indmh.nh044ama 4 Q reversed   Q not 0/disc;
🔴FlagErr🔴
4 R reversed
indmh.nh044ama indmp.nh044seo 6 both reversed

5 both reversed
(Q disc.)
5 R reversed

indmp.nh044seo indmp.nh044-wseo 7 R reversed 6 R reversed 6 R reversed
indmp.nh044-wseo indmp.nh044-eseo   match   match   match
indmp.nh044-eseo indmp.nh044 9 both reversed   Q not 0/disc;
FlagErr
  Q not 0/disc;
FlagErr
indmp.nh044 indup.nh044 10 R reversed

8 both reversed
(Q disc.)
8 both reversed
(Q disc.)
indup.nh044 indmp.nh044cha 11 R reversed 9 R reversed 9 R reversed
indmp.nh044cha indup.nh044til 12 R reversed 10 R reversed 10 R reversed
indup.nh044til indmp.nh044ban 13 R reversed 11 R reversed 11 R reversed
indmp.nh044ban indup.nh044jha 14 R reversed 12 R reversed 12 R reversed
indup.nh044jha indmp.nh044dat 15 R reversed 13 R reversed 13 R reversed
indmp.nh044dat indmp.nh044-w   match   match   match
indmp.nh044-w 🟢indmp.nh044-e 16 Q reversed   Q not 0/disc;
🔴FlagErr🔴
14 R reversed
indmp.nh044-e indmp.nh044gwa   fwd match   fwd match   Q not 0/disc;
🟢FlagErr🟢
Q R   no-build   _q5   _q8
indup.nh044jaj indup.nh044-e   fwd match   fwd match   fwd match
indup.nh044-e 🟢indup.nh044-w 1 Q reversed   Q not 0/disc;
🔴FlagErr🔴
1 R reversed
indup.nh044-w indup.nh044mat   fwd match   fwd match   Q not 0/disc;
🟢FlagErr🟢
... ...   fwd match   fwd match   fwd match

@yakra
Copy link
Contributor Author

yakra commented Mar 8, 2024

When either Q xor R can be reversed

Things get... interesting.
Unpredictable, unless you know the algorithm. Counterintuitive.
Case study: indmh.nh044-e, Nagpur, Maharashtra, at the bottom of this post.

What are Q and R?

Q and R refer to a pair of two adjacent chopped routes in a _con.csv line, to be inspected for connectivity.

usame;ME113;;;me.me113,nh.me113con,me.me113fry,nh.me113cha,me.me113whi
======================+===========+===========+===========+===========+
1st iteration____Q____|_____R_____|___________|___________|___________|
2nd iteration_________|_____Q_____|_____R_____|___________|___________|
3rd iteration_________|___________|_____Q_____|_____R_____|___________|
4th iteration_________|___________|___________|_____Q_____|_____R_____|

What happens?

When Q xor R can be reversed so their endpoints line up, the code I've been testing thus far just... picks one.
Which one can vary depending on:
• whether Q or R is inspected first, and
• point order in the .wpt files.

When?

When both routes'
• starting points have the same coords, and
• ending points have the same coords.

Does it matter?

In the Nagpur example, this happens because E & W split alignments are erroneously included in the same ConnectedRoute, like trying to include I-35E & I-35W along with the rest of I-35.
How much does it matter if there's one extra datacheck error pointing to the same underlying error in the data?

The matching coords could happen in real life, in a two-region beltway. Two of these exist, though this doesn't come into play as both have consistent point ordering in both chopped routes. Consistent with the Route-in-ConnectedRoute order too, so both pass the connectivity check with nothing needing to be reversed at all.
For cases like these, the "just pick one" approach produces working ConnectedRoutes, even if the final point order in the HB isn't what the contributor intended. Without a 3rd chopped route, there's nothing to fail to connect to.

But what if...?

Shouldn't siteupdate try to connect what it can, and only flag an error when it truly can't?

Could a legitimate ConnectedRoute be affected by the wrong chopped Route being selected for reversal?
In Nagpur...
The ConnectedRoute is "restarting" after a gap caused by a missing chopped Route. This is functionally the same as the beginning of a route in terms of what can & can't be reversed. So let's imagine we have a route starting off with the same loop shape.
In TM...
Loop routes do exist. Multi-region routes exist too, obviously.
Put them together and there's our extremely-unlikely-but-still-technically-possible scenario:
What if three different regions intersect perfectly where 3 legs of a loop route do, at a point like this or this?
Yielding a scenario like:

Upriver Oblast			  '	 Outback Oblast

				  .
		ALN-UR GT42
	     _.-------------._
	  .-'		      `-. '
	 /			 \     ALN-OB GT42
 .   .  0  .   .   .   .   .   .  0-------------------0
	 \			 / 
	  `-._		     _.-' .
	      `-------------'
		ALN-DR GT42
				  '

Downriver Oblast		  .

Note that problems only arise if the loop is at the beginning of the route. West-to-east in the picture.
Going east-to-west, the "tail" locks in the directions of the subsequent routes, and we're all good.

Answering the "Do we reverse Q or R?" question means finding out which direction R needs to go to link up to the next chopped route (if there is one), yielding code looking something like:

if ( q->points.size > 1 && r->points.size > 1 && !r->points.begin()->same_coords(q->con_end()) )
	if	( q->con_end()->same_coords(&r->points.back()) )	// R can be reversed
	  if	( q->con_beg()->same_coords(r->points.begin())		// Can Q be reversed instead?
	  &&	( q == cr.roots[0] || q->is_disconnected() )		// Is Q not locked into one direction?
	  &&	  i+1 < cr.roots.size()					// Is there another chopped route after R?
	  &&	(    r->points.back().same_coords(cr.roots[i+1]->points.begin())	// And does its beginning
		  || r->points.back().same_coords(&cr.roots[i+1]->points.back()) ))	// or end link to R as-is?
		q->set_reversed();
	  else	r->set_reversed();
	else if ( q->con_beg()->same_coords(&r->points.back()) )	// Q & R can both be reversed together
	  if	( q == cr.roots[0] || q->is_disconnected() )		// as long as Q's direction is unknown
	  {	q->set_reversed();
		r->set_reversed();
	  }
	  else	flag();
	else if ( q->con_beg()->same_coords(r->points.begin())		// Only Q can be reversed
	     && ( q == cr.roots[0] || q->is_disconnected() ))		// as long as its direction is unknown
		q->set_reversed();
	else	flag();

Meh.
This part of the code goes from 10 to 13 to 19 lines. :(
Gives me a little heartburn to include all these extra lines for something of so little impact, that'll probably never occur in the real world.
But it's The Right Thing To Do for the sake of robustness. For preventing inconsistent datachecks depending on point order and whether Q or R is inspected first, and preventing potential datacheck errors that make contributors scratch their heads and say "What the? Disconnected here? These 2 routes should connect just fine!"

Thankfully, this extra check will happen very rarely.
It's not in a critical inner loop or anything.
Of about 100k* routes in the system, there are about 3300* connections to inspect.
Of these, the Q-xor-R check will only happen 28* times.
All but 3* short-circuit immediately upon seeing Q can't be reversed, only R.
Of these, the Dabra & Agra splits short-circuit right after that because Q's direction is locked in.
Only Nagpur makes it to the end of the big conditional, to find R must be reversed to link to "S".

*Working from an old HighwayData revision from Oct 30. Newer revisions will have more connections to check and fewer (zero?) Q-xor-R cases, as indnh_con.csv got cleaned up bit.

Anyway.

  • I'll compile & run this.
  • I expect no diffs from siteupdate_q8's output.
  • The big compound conditional is probably already ordered optimally but check it out anyway.

O HAY! All this reminds me!

@yakra yakra added the pending label Mar 9, 2024
@yakra
Copy link
Contributor Author

yakra commented Mar 14, 2024

A solution for this is ready. Holding off until #615 is merged.

yakra added a commit to yakra/DataProcessing that referenced this issue Mar 20, 2024
rebase 49288bc
itself a rebase of c627aec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant