Comparisons on source locations yield incorrect results #1203

msteindorfer · 2018-07-27T08:30:25Z

According to the documentation < on source locations is assumed to exhibit the following behaviour (although it doesn't on Rascal Version: 0.11.1-SNAPSHOT):

Yields true if the location value of Exp1 is strictly textually contained in the location value of Exp2, and false otherwise.

rascal>|file:///a/b| < |file:///a|
bool: true
rascal>|file:///a| < |file:///a/b|
bool: false

rascal>|file:///a/b| < |file:///a/c| // assumed to yield `false` since not textually contained
bool: true
rascal>|file:///a/c| < |file:///a/b|
bool: false

The text was updated successfully, but these errors were encountered:

DavyLandman · 2018-07-27T10:41:56Z

The comparison operators on source locations are a bit confusing. We have discussed changing the semantics, but haven't found a good general solution.

To summarize how they currently work:

Difference uri: string compare on the uri
Same uri: containment of offsets and length.

Personally the second case is confusing, as what happens when the are in the same file, but not in the same set of lines?

For most cases, I've ended up manually implementing the right comparison.

msteindorfer · 2018-07-27T11:19:07Z

@DavyLandman thanks summary / status report.

Regarding 2: containment of offset/length is quite useful e.g. when reasoning over M3 models. Say I'd like to find function calls within a function body, I can query (somewhat simplified) call < body.

Regarding 1: this issue is about "difference uri" not working. |file:///a/b| < |file:///a/c| should never go through, since there are is no offset/length information, but the URIs differ. Right, that is a bug?

DavyLandman · 2018-07-27T13:37:44Z

regarding 2: yup, it's useful, but if you want to see if a location occurs before another location it doesn't work.

regarding 1: it does a textual comparison of the uri part of the string. so the same as "file://a/b" < "file:///a/c". We have chosen at the time to give it an order, so that you can always call compare on it. This behavior is useful for sorting for example.

eh-adblockplus-org · 2018-08-04T16:18:41Z

Pardon the interjection. The underlying problem is that < is being used for two fundamentally different kinds of orderings. File order is lexicographic and is a total order. Locations are a kind of interval with lower bound and upper bound. Location order is thus a particular kind of subset relationship and is a partial order. Furthermore, there's another partial order for intervals that makes sense, which is that the upper bound of the lower interval is less than the lower bound of the upper interval.

If it were me, I'd use in for location-intervals, since that's the operator already used for subset relationship for sets and lists. I'd also deprecate < for this sense.

jurgenvinju · 2019-03-30T21:21:02Z

Plus one on the in suggestion.

mahills · 2019-03-30T21:39:27Z

I like that idea as well

PaulKlint · 2019-03-30T21:58:27Z

I think there is confusion here:

we use< for subset on two sets, for sublist on two lists and the like.
we use in for is-element-in

I have been thinking about a library module for source locations that has function like:

before(loc a, loc): a and b are in the same file and text of a comes before that of b.
after(loc a, lob): similar
overlaps(loca , loc b): textual overlap
contains(loc a, loc b) text of b is contained in that of a
disjoint(loc a, loc b): no textual overlap

Would this not lead to more precise and more easily understandable operations on locations?

mahills · 2019-03-30T22:04:32Z

I think that’s fine as well. I know that the operator syntax for location comparisons was added later, and I’m fine with this being supported by a library instead. That may be better, since it prevents users from importing their own view of what “less than” means (for instance). Regardless, the current semantics are confusing, so I think it would even be an improvement to just fix that in the short term.

PaulKlint · 2019-03-30T22:07:32Z

I fully agree that the current situation needs improvement.

jurgenvinju · 2019-04-03T14:10:43Z

We should definitely be a lot more clear about the Semantics of < in the documentation.

It is only total on numbers and strings, or tuples and lists which can reach eventually only numbers and strings and lists and tuples. For the other datatypes < is partial. We had to adapt the sort function to properly deal with partial orderings.

What a "natural ordering" for source locations is, is debatable. The URI part can be totally ordered lexicographically, which makes sense. The offset-length thing will always be a point of discussion. as they can be considered to be "ranges" it makes sense to order them via an inclusion lattice just like we did with sets.

However, other applications want to strictly know that something started before something else and that is not the Semantics of range inclusion.

Since this has generated so much confusion over the years I'm inclined to propose a complete lexicographic ordering where the numbers of offsets and lengths are simple ordered pairwise like numbers and from left to right: offset first, then length. This is pretty meaningless but consistent. For more Semantics when sorting, one can project the offset and length fields and create one's own ordering function to pass to sort.

msteindorfer added bug rascal-shell labels Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparisons on source locations yield incorrect results #1203

Comparisons on source locations yield incorrect results #1203

msteindorfer commented Jul 27, 2018

DavyLandman commented Jul 27, 2018

msteindorfer commented Jul 27, 2018

DavyLandman commented Jul 27, 2018

eh-adblockplus-org commented Aug 4, 2018

jurgenvinju commented Mar 30, 2019

mahills commented Mar 30, 2019

PaulKlint commented Mar 30, 2019

mahills commented Mar 30, 2019

PaulKlint commented Mar 30, 2019

jurgenvinju commented Apr 3, 2019

Comparisons on source locations yield incorrect results #1203

Comparisons on source locations yield incorrect results #1203

Comments

msteindorfer commented Jul 27, 2018

DavyLandman commented Jul 27, 2018

msteindorfer commented Jul 27, 2018

DavyLandman commented Jul 27, 2018

eh-adblockplus-org commented Aug 4, 2018

jurgenvinju commented Mar 30, 2019

mahills commented Mar 30, 2019

PaulKlint commented Mar 30, 2019

mahills commented Mar 30, 2019

PaulKlint commented Mar 30, 2019

jurgenvinju commented Apr 3, 2019