Skip to content

syntax: fixed ICEs and incorrect line nums when reporting Spans at the end of the file. #12670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 10, 2014

Conversation

dmski
Copy link
Contributor

@dmski dmski commented Mar 3, 2014

CodeMap.span_to_* perform a lookup of a BytePos(sp.hi), which lands into the next filemap if the last byte of range denoted by Span is also the last byte of the filemap, which results in ICEs or incorrect error reports.

    Example:
        ````

        pub fn main() {
            let mut num = 3;
            let refe = &mut num;
            *refe = 5;
            println!("{}", num);
        }````

(note the empty line in the beginning and the absence of newline at the end)

The above would have caused ICE when trying to report where "refe" borrow ends.
The above without an empty line in the beginning would have reported borrow end to be the first line.

Most probably, this is also responsible for (at least some occurrences of) issue #8256.

The issue is fixed by always adding a newline at the end of non-empty filemaps in case there isn't a new line there already.

@alexcrichton
Copy link
Member

Can you amend the title and commit message to be a little more specific in the first phrase? "converting Spans to stuff" makes it difficult to know what's being fixed.

Also, cc @nick29581

@nrc
Copy link
Member

nrc commented Mar 4, 2014

Is this only ever an issue when a file ends without a newline? (It has been bothering me for a while that we require a newline, so thanks for looking into it). I am pretty tired and will take a proper look tomorrow. In the mean time, could we avoid all this by just adding a newline character to all filemaps that don't end with one? Or some similar but nicer hack?

@dmski
Copy link
Contributor Author

dmski commented Mar 4, 2014

@alexcrichton: done.

@dmski
Copy link
Contributor Author

dmski commented Mar 4, 2014

@nick29581: The specific issue that motivated this fix would have been fixed by adding a newline character to every filemap without one, and indeed this was my original idea for the fix too. Dismissed it as being a hack, however, as this will still leave span_to_* slightly broken in that they'll still treat Spans in a way different from what lexer/parser think and will always try to look up 1 byte further than needed when looking up line nums and char positions. This might cause some issue with unicode in the future (it doesn't seem to cause them now however, and i can't at the moment think of unicode example which will break with current code).

My second idea for a nicer hack was to just subtract BytePos(1) from sp.hi when looking up positions (so that look up lands into the byte range of given Span instead of one byte after it), but that had problems with multi-byte characters and caused unreasonable test breakages.

The current fix is the nicest i could come up with so far, and it isn't too bad in my opinion (obviously :) ). It doesn't change signatures of any existing public methods for codemap (most of changes are in private methods), only changes the behaviour of span_to_* (to look up up to sp.hi index, rather than at sp.hi index). It can also be considered a proper fix and not a hack as it makes span_to_* treat Spans the same way lexer and parser do, which can help preventing any issues that might arise from the disconnect between what lexer/parser think of spans and what codemap does.

@nrc
Copy link
Member

nrc commented Mar 6, 2014

We discussed this a bit. You are quite right that your second idea won't work due to MBCs. I think we still prefer the first 'hack' to this solution though. I don't understand your point "this will still leave span_to_* slightly broken in that they'll still treat Spans in a way different from what lexer/parser think" - shouldn't we always have half-open intervals? In general, I think functions which deal with a 'point' shouldn't know about or deal with open vs closed ranges and all functions which deal with ranges should assume half-open ranges (unless I'm missing a use case).

@dmski
Copy link
Contributor Author

dmski commented Mar 6, 2014

On 6 Mar 2014, at 5:17, Nick Cameron wrote:

We discussed this a bit. You are quite right that your second idea
won't work due to MBCs. I think we still prefer the first 'hack' to
this solution though. I don't understand your point "this will still
leave span_to_* slightly broken in that they'll still treat Spans in a
way different from what lexer/parser think" - shouldn't we always have
half-open intervals?
We should! What i was trying to say is that by looking up at
BytePos(sp.hi) span_to_* tried to look at what's clearly outside of the
range denoted by span.
In general, I think functions which deal with a 'point' shouldn't know
about or deal with open vs closed ranges and all functions which deal
with ranges should assume half-open ranges (unless I'm missing a use
case).
I see the point - the distinction between open vs. closed ranges is
really only used when looking up line numbers and filemap indices now.
Lines are naturally padded by newlines, so not affected, and filemap
indices are fixed by adding a newline (and open vs closed ranges on
point lookups can be implemented if there's ever a use case for that).

Have now amended PR description and commit to implement the fix with
adding a newline.

assert_eq!(cp4, CharPos(15));
}

#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comments stating what these tests are testing please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Have also removed two tests (from the 5 added by me earlier) - they've been redundant, basically testing what t6 does (i.e. multibyte chars).

@nrc
Copy link
Member

nrc commented Mar 7, 2014

I fear this will not work - the lexer will call next_line with the byte offsets it expects, then Codemap adds a char and so the byte offsets for the following filemap will be wrong. I think you need for the lexer to add the newlines.

@dmski
Copy link
Contributor Author

dmski commented Mar 7, 2014

When codemap::StringReader is constructed, it's provided with the filemap that would have been already constructed in CodeMap::new_filemap and padded with newline if needed. StringReader.pos is then initially set to filemap.start_pos (lexer.rs:79), so if filemap.start_pos is correct then lexer's next_line calls will also use correct offsets.

filemap.start_pos is calculated in CodeMap::new_filemap by adding previous filemap's source length to previous filemap's start_pos (codemap.rs:274), so this takes into account any padding that the previous filemap might've had.

So, unless i'm missing something, padding with newlines this way won't result in incorrect offsets for newlines. I've also checked that newline offsets are correct manually, just to be sure: https://gist.github.com/dmski/9418492 (which outputs https://gist.github.com/dmski/9420111 for me).
Not sure if that should be added as a test somewhere - what do you think?

@nrc
Copy link
Member

nrc commented Mar 7, 2014

So, I think this is OK then. BUT you must add a comment to new_line because this is pretty scary. The input to new_line is relative to the start of the Codemap, but taking into account the fact that the Codemap might have been manipulated. So the only safe way to call the method is with a value calculated from what codemap thinks is the start of the filemap plus what the client thinks is the start of the line. I.e., you are providing are value relative to the start of the codemap, but you can only rely on your knowledge to be correct relative to the start of the filemap.

Also, I think in the tests, you change the new_line calls which is correct, but the change to fm3.next_line(BytePos(34)); should be to fm3.next_line(BytePos(35)); because two newline chars have been added.

@alexcrichton
Copy link
Member

@nick29581 does this look good to you? I'd be willing to merge this as-is, but you have more experience in this area than I do.

…e end of the file.

CodeMap.span_to_* perform a lookup of a BytePos(sp.hi), which lands into the next filemap if the last byte of range denoted by Span is also the last byte of the filemap, which results in ICEs or incorrect error reports.

    Example:
        ````

        pub fn main() {
            let mut num = 3;
            let refe = &mut num;
            *refe = 5;
            println!("{}", num);
        }````

(note the empty line in the beginning and the absence of newline at the end)

The above would have caused ICE when trying to report where "refe" borrow ends.
The above without an empty line in the beginning would have reported borrow end to be the first line.

Most probably, this is also responsible for (at least some occurrences of) issue rust-lang#8256.

The issue is fixed by always adding a newline at the end of non-empty filemaps in case there isn't a new line there already.
@dmski
Copy link
Contributor Author

dmski commented Mar 9, 2014

Have now added a warning on FileMap.next_line.
As for next_line calls in tests - they should be correct, as the second
filemap in test data is empty (i thought it'd be a good idea to leave empty filemaps empty), and new_filemap will only add a newline
to non-empty filemaps, so all offsets are +1 after the first filemap.

On 8 Mar 2014, at 3:40, Nick Cameron wrote:

So, I think this is OK then. BUT you must add a comment to new_line
because this is pretty scary. The input to new_line is relative to
the start of the Codemap, but taking into account the fact that the
Codemap might have been manipulated. So the only safe way to call the
method is with a value calculated from what codemap thinks is the
start of the filemap plus what the client thinks is the start of the
line. I.e., you are providing are value relative to the start of the
codemap, but you can only rely on your knowledge to be correct
relative to the start of the filemap.

Also, I think in the tests, you change the new_line calls which is
correct, but the change to fm3.next_line(BytePos(34)); should be to
fm3.next_line(BytePos(35)); because two newline chars have been
added.


Reply to this email directly or view it on GitHub:
#12670 (comment)

@nrc
Copy link
Member

nrc commented Mar 10, 2014

r+

@dmski aha! I had missed that you did not add a newline to empty filemaps. I agree that that is the right thing to do. Thanks for iterating over the patch with me, I appreciate the effort and I'm glad this is fixed.

@alexcrichton could you give this your magic r+ please? I fear my r+ is not enough for Bors.

bors added a commit that referenced this pull request Mar 10, 2014
CodeMap.span_to_* perform a lookup of a BytePos(sp.hi), which lands into the next filemap if the last byte of range denoted by Span is also the last byte of the filemap, which results in ICEs or incorrect error reports.

        Example:
            ````

            pub fn main() {
                let mut num = 3;
                let refe = &mut num;
                *refe = 5;
                println!("{}", num);
            }````

(note the empty line in the beginning and the absence of newline at the end)

The above would have caused ICE when trying to report where "refe" borrow ends.
The above without an empty line in the beginning would have reported borrow end to be the first line.

Most probably, this is also responsible for (at least some occurrences of) issue #8256.

The issue is fixed by always adding a newline at the end of non-empty filemaps in case there isn't a new line there already.
@bors bors closed this Mar 10, 2014
@bors bors merged commit 43a8f7b into rust-lang:master Mar 10, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 25, 2022
fix: Fix attribute macros on assoc items being discarded with disabled proc macros

Fixes rust-lang/rust-analyzer#12669
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 6, 2025
…is uncertain (rust-lang#13889)

fixes rust-lang#12670

Continuation of rust-lang#12688. r? @Jarcho if you don't mind?

changelog: [`manual_unwrap_or_default`] fix wrong suggestions when
condition type is uncertain
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants