Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparisons on source locations yield incorrect results #1203

Open
msteindorfer opened this issue Jul 27, 2018 · 10 comments
Open

Comparisons on source locations yield incorrect results #1203

msteindorfer opened this issue Jul 27, 2018 · 10 comments

Comments

@msteindorfer
Copy link
Member

According to the documentation < on source locations is assumed to exhibit the following behaviour (although it doesn't on Rascal Version: 0.11.1-SNAPSHOT):

Yields true if the location value of Exp1 is strictly textually contained in the location value of Exp2, and false otherwise.

rascal>|file:///a/b| < |file:///a|
bool: true
rascal>|file:///a| < |file:///a/b|
bool: false

rascal>|file:///a/b| < |file:///a/c| // assumed to yield `false` since not textually contained
bool: true
rascal>|file:///a/c| < |file:///a/b|
bool: false
@DavyLandman
Copy link
Member

The comparison operators on source locations are a bit confusing. We have discussed changing the semantics, but haven't found a good general solution.

To summarize how they currently work:

  1. Difference uri: string compare on the uri
  2. Same uri: containment of offsets and length.

Personally the second case is confusing, as what happens when the are in the same file, but not in the same set of lines?

For most cases, I've ended up manually implementing the right comparison.

@msteindorfer
Copy link
Member Author

@DavyLandman thanks summary / status report.

Regarding 2: containment of offset/length is quite useful e.g. when reasoning over M3 models. Say I'd like to find function calls within a function body, I can query (somewhat simplified) call < body.

Regarding 1: this issue is about "difference uri" not working. |file:///a/b| < |file:///a/c| should never go through, since there are is no offset/length information, but the URIs differ. Right, that is a bug?

@DavyLandman
Copy link
Member

regarding 2: yup, it's useful, but if you want to see if a location occurs before another location it doesn't work.

regarding 1: it does a textual comparison of the uri part of the string. so the same as "file://a/b" < "file:///a/c". We have chosen at the time to give it an order, so that you can always call compare on it. This behavior is useful for sorting for example.

@eh-adblockplus-org
Copy link

Pardon the interjection. The underlying problem is that < is being used for two fundamentally different kinds of orderings. File order is lexicographic and is a total order. Locations are a kind of interval with lower bound and upper bound. Location order is thus a particular kind of subset relationship and is a partial order. Furthermore, there's another partial order for intervals that makes sense, which is that the upper bound of the lower interval is less than the lower bound of the upper interval.

If it were me, I'd use in for location-intervals, since that's the operator already used for subset relationship for sets and lists. I'd also deprecate < for this sense.

@jurgenvinju
Copy link
Member

Plus one on the in suggestion.

@mahills
Copy link
Member

mahills commented Mar 30, 2019

I like that idea as well

@PaulKlint
Copy link
Member

I think there is confusion here:

  • we use< for subset on two sets, for sublist on two lists and the like.
  • we use in for is-element-in

I have been thinking about a library module for source locations that has function like:

  • before(loc a, loc): a and b are in the same file and text of a comes before that of b.
  • after(loc a, lob): similar
  • overlaps(loca , loc b): textual overlap
  • contains(loc a, loc b) text of b is contained in that of a
  • disjoint(loc a, loc b): no textual overlap

Would this not lead to more precise and more easily understandable operations on locations?

@mahills
Copy link
Member

mahills commented Mar 30, 2019

I think that’s fine as well. I know that the operator syntax for location comparisons was added later, and I’m fine with this being supported by a library instead. That may be better, since it prevents users from importing their own view of what “less than” means (for instance). Regardless, the current semantics are confusing, so I think it would even be an improvement to just fix that in the short term.

@PaulKlint
Copy link
Member

I fully agree that the current situation needs improvement.

@jurgenvinju
Copy link
Member

We should definitely be a lot more clear about the Semantics of < in the documentation.

It is only total on numbers and strings, or tuples and lists which can reach eventually only numbers and strings and lists and tuples. For the other datatypes < is partial. We had to adapt the sort function to properly deal with partial orderings.

What a "natural ordering" for source locations is, is debatable. The URI part can be totally ordered lexicographically, which makes sense. The offset-length thing will always be a point of discussion. as they can be considered to be "ranges" it makes sense to order them via an inclusion lattice just like we did with sets.

However, other applications want to strictly know that something started before something else and that is not the Semantics of range inclusion.

Since this has generated so much confusion over the years I'm inclined to propose a complete lexicographic ordering where the numbers of offsets and lengths are simple ordered pairwise like numbers and from left to right: offset first, then length. This is pretty meaningless but consistent. For more Semantics when sorting, one can project the offset and length fields and create one's own ordering function to pass to sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants