Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle CRLF properly in the lexer #14400

Merged
merged 3 commits into from
Jun 19, 2014
Merged

Conversation

lilyball
Copy link
Contributor

The lexer already ignores CRLF in between tokens, but it doesn't
properly handle carriage returns inside strings and doc comments. Teach
it to treat CRLF as LF inside these tokens, and to disallow carriage
returns that are not followed by linefeeds. This includes handling an
escaped CRLF inside a regular string token the same way it handles an
escaped LF.

This is technically a breaking change, as bare carriage returns are no
longer allowed, and CRLF sequences are now treated as LF inside strings
and doc comments, but it's very unlikely to actually affect any
real-world code.

This change is necessary to have Rust code compile on Windows the same
way it does on Unix. The mozilla/rust repository explicitly sets eol=lf
for Rust source files, but other Rust repositories don't. Notably,
rust-http cannot be compiled on Windows without converting the CRLF line
endings back to LF.

[breaking-change]

@lilyball
Copy link
Contributor Author

The alternative to treating a bare CR as an error is to treat a bare CR as if it were an LF. I went with treating it as an error because I thought a bare CR was more likely to be a mistake (since the two common EOL formats are LF and CRLF).

@sfackler
Copy link
Member

tidy is unhappy: https://travis-ci.org/mozilla/rust/jobs/25928956

@lilyball
Copy link
Contributor Author

What the... those files have the same license all the other test files do. And those CR characters are supposed to be ignored.

Edit: s/tho/those/. Wat.

@lilyball
Copy link
Contributor Author

Tidy has some nice bugs regarding CRLF handling in a repo without core.autocrlf set.

@emberian
Copy link
Member

I hate having to support CRLF. But this looks good to me.

@lilyball
Copy link
Contributor Author

So do I. It's an abomination.

@alexcrichton
Copy link
Member

Closing due to inactivity, but it seems like this is something that definitely needs to be fixed!

cc #14979

@lilyball
Copy link
Contributor Author

@alexcrichton How about reviewing it instead of closing it? That's all it's waiting for.

@lilyball lilyball reopened this Jun 18, 2014
@emberian
Copy link
Member

r=me with rebase, thanks a lot for this! Sorry it took so long :(

Teach StringReader how to emit errors for arbitrary spans, so we don't
need to modify peek_span. This allows for emitting errors without having
a &mut borrow of the StringReader.
@lilyball
Copy link
Contributor Author

Rebased (which largely means rewritten, due to the intervening refactoring). Already passed make check-stage1-syntax and make check-stage1-{rpass,cfail}. Currently running the full make check suite.

@lilyball
Copy link
Contributor Author

Failure:

src/test/run-pass-fulldeps/lexer-crlf-doc-comments-syntax.rs:15:1: 15:21 error: can't find crate for `syntax`
src/test/run-pass-fulldeps/lexer-crlf-doc-comments-syntax.rs:15 extern crate syntax;
                                                                ^~~~~~~~~~~~~~~~~~~~

Why is this a failure? It's a fulldeps test. Shouldn't the syntax crate exist?

@lilyball
Copy link
Contributor Author

I'm guessing this is an issue with cross-compiling.

@sfackler
Copy link
Member

Fulldeps tests don't work on android since it's cross-compiled. Add in a //ignore-android or whatever it is.

@lilyball
Copy link
Contributor Author

@sfackler I could probably also just turn it into a regular #[test] in libsyntax.

The lexer already ignores CRLF in between tokens, but it doesn't
properly handle carriage returns inside strings and doc comments. Teach
it to treat CRLF as LF inside these tokens, and to disallow carriage
returns that are not followed by linefeeds. This includes handling an
escaped CRLF inside a regular string token the same way it handles an
escaped LF.

This is technically a breaking change, as bare carriage returns are no
longer allowed, and CRLF sequences are now treated as LF inside strings
and doc comments, but it's very unlikely to actually affect any
real-world code.

This change is necessary to have Rust code compile on Windows the same
way it does on Unix. The mozilla/rust repository explicitly sets eol=lf
for Rust source files, but other Rust repositories don't. Notably,
rust-http cannot be compiled on Windows without converting the CRLF line
endings back to LF.

[breaking-change]
bors added a commit that referenced this pull request Jun 19, 2014
The lexer already ignores CRLF in between tokens, but it doesn't
properly handle carriage returns inside strings and doc comments. Teach
it to treat CRLF as LF inside these tokens, and to disallow carriage
returns that are not followed by linefeeds. This includes handling an
escaped CRLF inside a regular string token the same way it handles an
escaped LF.

This is technically a breaking change, as bare carriage returns are no
longer allowed, and CRLF sequences are now treated as LF inside strings
and doc comments, but it's very unlikely to actually affect any
real-world code.

This change is necessary to have Rust code compile on Windows the same
way it does on Unix. The mozilla/rust repository explicitly sets eol=lf
for Rust source files, but other Rust repositories don't. Notably,
rust-http cannot be compiled on Windows without converting the CRLF line
endings back to LF.

[breaking-change]
@bors bors merged commit 8a8e497 into rust-lang:master Jun 19, 2014
@lilyball lilyball deleted the lexer_crlf_handling branch June 19, 2014 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants