syntax: fixed ICEs and incorrect line nums when reporting Spans at the end of the file. #12670

dmski · 2014-03-03T12:07:57Z

CodeMap.span_to_* perform a lookup of a BytePos(sp.hi), which lands into the next filemap if the last byte of range denoted by Span is also the last byte of the filemap, which results in ICEs or incorrect error reports.

    Example:
        ````

        pub fn main() {
            let mut num = 3;
            let refe = &mut num;
            *refe = 5;
            println!("{}", num);
        }````

(note the empty line in the beginning and the absence of newline at the end)

The above would have caused ICE when trying to report where "refe" borrow ends.
The above without an empty line in the beginning would have reported borrow end to be the first line.

Most probably, this is also responsible for (at least some occurrences of) issue #8256.

The issue is fixed by always adding a newline at the end of non-empty filemaps in case there isn't a new line there already.

alexcrichton · 2014-03-04T08:19:45Z

Can you amend the title and commit message to be a little more specific in the first phrase? "converting Spans to stuff" makes it difficult to know what's being fixed.

Also, cc @nick29581

nrc · 2014-03-04T08:32:20Z

Is this only ever an issue when a file ends without a newline? (It has been bothering me for a while that we require a newline, so thanks for looking into it). I am pretty tired and will take a proper look tomorrow. In the mean time, could we avoid all this by just adding a newline character to all filemaps that don't end with one? Or some similar but nicer hack?

dmski · 2014-03-04T13:49:40Z

@alexcrichton: done.

dmski · 2014-03-04T14:16:49Z

@nick29581: The specific issue that motivated this fix would have been fixed by adding a newline character to every filemap without one, and indeed this was my original idea for the fix too. Dismissed it as being a hack, however, as this will still leave span_to_* slightly broken in that they'll still treat Spans in a way different from what lexer/parser think and will always try to look up 1 byte further than needed when looking up line nums and char positions. This might cause some issue with unicode in the future (it doesn't seem to cause them now however, and i can't at the moment think of unicode example which will break with current code).

My second idea for a nicer hack was to just subtract BytePos(1) from sp.hi when looking up positions (so that look up lands into the byte range of given Span instead of one byte after it), but that had problems with multi-byte characters and caused unreasonable test breakages.

The current fix is the nicest i could come up with so far, and it isn't too bad in my opinion (obviously :) ). It doesn't change signatures of any existing public methods for codemap (most of changes are in private methods), only changes the behaviour of span_to_* (to look up up to sp.hi index, rather than at sp.hi index). It can also be considered a proper fix and not a hack as it makes span_to_* treat Spans the same way lexer and parser do, which can help preventing any issues that might arise from the disconnect between what lexer/parser think of spans and what codemap does.

nrc · 2014-03-06T01:17:41Z

We discussed this a bit. You are quite right that your second idea won't work due to MBCs. I think we still prefer the first 'hack' to this solution though. I don't understand your point "this will still leave span_to_* slightly broken in that they'll still treat Spans in a way different from what lexer/parser think" - shouldn't we always have half-open intervals? In general, I think functions which deal with a 'point' shouldn't know about or deal with open vs closed ranges and all functions which deal with ranges should assume half-open ranges (unless I'm missing a use case).

dmski · 2014-03-06T17:26:59Z

On 6 Mar 2014, at 5:17, Nick Cameron wrote:

We discussed this a bit. You are quite right that your second idea
won't work due to MBCs. I think we still prefer the first 'hack' to
this solution though. I don't understand your point "this will still
leave span_to_* slightly broken in that they'll still treat Spans in a
way different from what lexer/parser think" - shouldn't we always have
half-open intervals?
We should! What i was trying to say is that by looking up at
BytePos(sp.hi) span_to_* tried to look at what's clearly outside of the
range denoted by span.
In general, I think functions which deal with a 'point' shouldn't know
about or deal with open vs closed ranges and all functions which deal
with ranges should assume half-open ranges (unless I'm missing a use
case).
I see the point - the distinction between open vs. closed ranges is
really only used when looking up line numbers and filemap indices now.
Lines are naturally padded by newlines, so not affected, and filemap
indices are fixed by adding a newline (and open vs closed ranges on
point lookups can be implemented if there's ever a use case for that).

Have now amended PR description and commit to implement the fix with
adding a newline.

nrc · 2014-03-07T00:26:03Z

src/libsyntax/codemap.rs

        assert_eq!(cp4, CharPos(15));
    }
+
+    #[test]


Could you add comments stating what these tests are testing please?

Added. Have also removed two tests (from the 5 added by me earlier) - they've been redundant, basically testing what t6 does (i.e. multibyte chars).

nrc · 2014-03-07T00:31:12Z

I fear this will not work - the lexer will call next_line with the byte offsets it expects, then Codemap adds a char and so the byte offsets for the following filemap will be wrong. I think you need for the lexer to add the newlines.

dmski · 2014-03-07T21:02:54Z

When codemap::StringReader is constructed, it's provided with the filemap that would have been already constructed in CodeMap::new_filemap and padded with newline if needed. StringReader.pos is then initially set to filemap.start_pos (lexer.rs:79), so if filemap.start_pos is correct then lexer's next_line calls will also use correct offsets.

filemap.start_pos is calculated in CodeMap::new_filemap by adding previous filemap's source length to previous filemap's start_pos (codemap.rs:274), so this takes into account any padding that the previous filemap might've had.

So, unless i'm missing something, padding with newlines this way won't result in incorrect offsets for newlines. I've also checked that newline offsets are correct manually, just to be sure: https://gist.github.com/dmski/9418492 (which outputs https://gist.github.com/dmski/9420111 for me).
Not sure if that should be added as a test somewhere - what do you think?

nrc · 2014-03-07T23:40:30Z

So, I think this is OK then. BUT you must add a comment to new_line because this is pretty scary. The input to new_line is relative to the start of the Codemap, but taking into account the fact that the Codemap might have been manipulated. So the only safe way to call the method is with a value calculated from what codemap thinks is the start of the filemap plus what the client thinks is the start of the line. I.e., you are providing are value relative to the start of the codemap, but you can only rely on your knowledge to be correct relative to the start of the filemap.

Also, I think in the tests, you change the new_line calls which is correct, but the change to fm3.next_line(BytePos(34)); should be to fm3.next_line(BytePos(35)); because two newline chars have been added.

alexcrichton · 2014-03-09T19:53:57Z

@nick29581 does this look good to you? I'd be willing to merge this as-is, but you have more experience in this area than I do.

…e end of the file. CodeMap.span_to_* perform a lookup of a BytePos(sp.hi), which lands into the next filemap if the last byte of range denoted by Span is also the last byte of the filemap, which results in ICEs or incorrect error reports. Example: ```` pub fn main() { let mut num = 3; let refe = &mut num; *refe = 5; println!("{}", num); }```` (note the empty line in the beginning and the absence of newline at the end) The above would have caused ICE when trying to report where "refe" borrow ends. The above without an empty line in the beginning would have reported borrow end to be the first line. Most probably, this is also responsible for (at least some occurrences of) issue rust-lang#8256. The issue is fixed by always adding a newline at the end of non-empty filemaps in case there isn't a new line there already.

dmski · 2014-03-09T22:33:10Z

Have now added a warning on FileMap.next_line.
As for next_line calls in tests - they should be correct, as the second
filemap in test data is empty (i thought it'd be a good idea to leave empty filemaps empty), and new_filemap will only add a newline
to non-empty filemaps, so all offsets are +1 after the first filemap.

On 8 Mar 2014, at 3:40, Nick Cameron wrote:

So, I think this is OK then. BUT you must add a comment to new_line
because this is pretty scary. The input to new_line is relative to
the start of the Codemap, but taking into account the fact that the
Codemap might have been manipulated. So the only safe way to call the
method is with a value calculated from what codemap thinks is the
start of the filemap plus what the client thinks is the start of the
line. I.e., you are providing are value relative to the start of the
codemap, but you can only rely on your knowledge to be correct
relative to the start of the filemap.

Also, I think in the tests, you change the new_line calls which is
correct, but the change to fm3.next_line(BytePos(34)); should be to
fm3.next_line(BytePos(35)); because two newline chars have been
added.

Reply to this email directly or view it on GitHub:
#12670 (comment)

nrc · 2014-03-10T02:24:53Z

r+

@dmski aha! I had missed that you did not add a newline to empty filemaps. I agree that that is the right thing to do. Thanks for iterating over the patch with me, I appreciate the effort and I'm glad this is fixed.

@alexcrichton could you give this your magic r+ please? I fear my r+ is not enough for Bors.

CodeMap.span_to_* perform a lookup of a BytePos(sp.hi), which lands into the next filemap if the last byte of range denoted by Span is also the last byte of the filemap, which results in ICEs or incorrect error reports. Example: ```` pub fn main() { let mut num = 3; let refe = &mut num; *refe = 5; println!("{}", num); }```` (note the empty line in the beginning and the absence of newline at the end) The above would have caused ICE when trying to report where "refe" borrow ends. The above without an empty line in the beginning would have reported borrow end to be the first line. Most probably, this is also responsible for (at least some occurrences of) issue #8256. The issue is fixed by always adding a newline at the end of non-empty filemaps in case there isn't a new line there already.

fix: Fix attribute macros on assoc items being discarded with disabled proc macros Fixes rust-lang/rust-analyzer#12669

@Jarcho

…is uncertain (rust-lang#13889) fixes rust-lang#12670 Continuation of rust-lang#12688. r? @Jarcho if you don't mind? changelog: [`manual_unwrap_or_default`] fix wrong suggestions when condition type is uncertain

nrc reviewed Mar 7, 2014
View reviewed changes

bors closed this Mar 10, 2014

bors merged commit 43a8f7b into rust-lang:master Mar 10, 2014

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 25, 2022

Auto merge of rust-lang#12670 - Veykril:assoc-attrs, r=Veykril

8489cd7

fix: Fix attribute macros on assoc items being discarded with disabled proc macros Fixes rust-lang/rust-analyzer#12669

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

syntax: fixed ICEs and incorrect line nums when reporting Spans at the end of the file. #12670

syntax: fixed ICEs and incorrect line nums when reporting Spans at the end of the file. #12670

Uh oh!

dmski commented Mar 3, 2014

Uh oh!

alexcrichton commented Mar 4, 2014

Uh oh!

nrc commented Mar 4, 2014

Uh oh!

dmski commented Mar 4, 2014

Uh oh!

dmski commented Mar 4, 2014

Uh oh!

nrc commented Mar 6, 2014

Uh oh!

dmski commented Mar 6, 2014

Uh oh!

nrc Mar 7, 2014

Uh oh!

dmski Mar 7, 2014

Uh oh!

nrc commented Mar 7, 2014

Uh oh!

dmski commented Mar 7, 2014

Uh oh!

nrc commented Mar 7, 2014

Uh oh!

alexcrichton commented Mar 9, 2014

Uh oh!

dmski commented Mar 9, 2014

Uh oh!

nrc commented Mar 10, 2014

Uh oh!

Uh oh!

syntax: fixed ICEs and incorrect line nums when reporting Spans at the end of the file. #12670

syntax: fixed ICEs and incorrect line nums when reporting Spans at the end of the file. #12670

Uh oh!

Conversation

dmski commented Mar 3, 2014

Uh oh!

alexcrichton commented Mar 4, 2014

Uh oh!

nrc commented Mar 4, 2014

Uh oh!

dmski commented Mar 4, 2014

Uh oh!

dmski commented Mar 4, 2014

Uh oh!

nrc commented Mar 6, 2014

Uh oh!

dmski commented Mar 6, 2014

Uh oh!

nrc Mar 7, 2014

Choose a reason for hiding this comment

Uh oh!

dmski Mar 7, 2014

Choose a reason for hiding this comment

Uh oh!

nrc commented Mar 7, 2014

Uh oh!

dmski commented Mar 7, 2014

Uh oh!

nrc commented Mar 7, 2014

Uh oh!

alexcrichton commented Mar 9, 2014

Uh oh!

dmski commented Mar 9, 2014

Uh oh!

nrc commented Mar 10, 2014

Uh oh!

Uh oh!