Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move line continuation handling to io.c #2215

Merged
merged 3 commits into from
Apr 11, 2018

Conversation

fingolfin
Copy link
Member

@fingolfin fingolfin commented Feb 27, 2018

Resolve #2201.

@fingolfin fingolfin added do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) topic: kernel release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes labels Feb 27, 2018
@fingolfin
Copy link
Member Author

A test was failing, which did this:

gap> EvalString("1234\\\r\n567");
1234567

The problem came from a difference in behavior between (kernel) string streams and file streams in io.c. For the former, both CR and LF were treated as line terminators; for the latter, it depends on the OS, so Unix, only LF is a line terminator. When reading a CRLF from a file, we thus see \\\r\n\0 in the line buffer, and can correctly treat this as a single ending in a line continuation. But when reading from the string, the buffer would end with \\\r\0; to correctly deal with this, one needs to call GetLine again to get the missing \n. The old line continuation code sometimes did that, sometimes not.

To fix this, I changed the string stream code to ignore all \r and only accept \n as line terminator. This is a bit of a hack, and if a file uses isolated CRs, it could lead to some changed error behavior, but otherwise I think it's fine.

This still is not quite perfect, as regular file streams could in border cases also produce a line buffer ending in \\\r\0, namely when we see a very long (~32k) line which ends in a CRLF, where the CR (plus a trailing zero) just fits into the buffer, but the LF does not. That's not great, but OTOH it only causes a problem if this super long line ends in a line continuation, too. And the bad result then is an unexpected error, i.e. no silent data loss, so I am OK with it. We need to look closer at lines overflowing the buffer at some point anyway.

Another fix is that I change the prompt when we do an explicit line continuation (which is much easier now that we handle them in a single place). As a result, this example (in master)

gap> 123\
gap> 456;
123456
gap> "123\
gap> "456";
Syntax error: ; expected
"456";
   ^

becomes this:

gap> 123\
> 456;
123456
gap> "123\
> 456";
"123456"

@fingolfin
Copy link
Member Author

BTW, I am aware that this PR is missing tests, I'll work on that.

Copy link
Member

@markuspf markuspf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the missing tests, I think this is ok. The only thing that irked me a bit was the #\ change in lib/stbcckt.gi, but after playing around a bit I realised that the change in the scanner that causes this change to be necessary makes it more consistent.

@codecov
Copy link

codecov bot commented Mar 29, 2018

Codecov Report

Merging #2215 into master will increase coverage by 0.12%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master    #2215      +/-   ##
==========================================
+ Coverage   73.13%   73.26%   +0.12%     
==========================================
  Files         481      482       +1     
  Lines      246510   265819   +19309     
==========================================
+ Hits       180286   194739   +14453     
- Misses      66224    71080    +4856
Impacted Files Coverage Δ
src/scanner.c 91% <66.66%> (+0.11%) ⬆️
src/io.c 71.58% <93.93%> (-0.82%) ⬇️
src/compiler.c 59.06% <0%> (-28.24%) ⬇️
src/saveload.c 46.11% <0%> (-19.34%) ⬇️
src/iostream.c 45.84% <0%> (-12.27%) ⬇️
src/vecffe.c 66.87% <0%> (-5.19%) ⬇️
src/records.c 63.14% <0%> (-5.13%) ⬇️
src/vars.c 81.49% <0%> (-4.8%) ⬇️
src/exprs.c 89.51% <0%> (-4.1%) ⬇️
src/listfunc.c 77.93% <0%> (-3.74%) ⬇️
... and 48 more

@fingolfin
Copy link
Member Author

I have now changed this, as discussed on issue #2201, to not handle line continuations inside of comments, which thus made it possible to drop the changes to lib/stbcckt.gi again. This means that the current behavior of GAP is not changed. I figure this is the best way to go: If we really want to change how we deal with line continuations, fine, we can do it; but let's not do it as part of this PR, but rather "intentionally", in another PR.

In any case, this PR here still lacks explicit test cases; I'll work on them when I am back from vacation. Also, it contains a ton of refactoring to the scanner code which was enabled by the (re)moved line continuation code. I could put that into a separate PR, to simplify reviewing, too.

@fingolfin fingolfin removed the do not merge PRs which are not yet ready to be merged (e.g. submitted for discussion, or test results) label Apr 9, 2018
@fingolfin
Copy link
Member Author

I have now added tests. Indeed, now the first commit adds some tests for line continuations. Then, as mentioned in #2201, a new helper function IGNORE_REST_OF_LINE is added to io.c, which the next commit uses to move line continuation handling to io.c; in the same commit, the new test file is adjusted: first off, the behaviour of triple quotes with line continuations changes; but as discussed in #2201, this is deemed to be an improvement (or even a fix?).
Also, a few more tests are added, which showcase uses of line continuation which were not supported before, such as breaking keywords and multi character operators such as := along a line continuation (this is of course not something I'd recommend people should do, but it opens the possibility of simplifying line wrapping rules for width constrained GAP output, e.g. in tests...)

The rest then is a ton of refactoring and simplification in the scanner, which is enabled by the code simplification thanks to the moving of the line continuation code.

Another improvement is that now the line continuation code always changesthe GAP prompt to the line continuation prompt >. I think this will fix some bugs and perhaps allow further simplifications...

Thing is, all these extra commits likely make reviewing this harder than it is. I'll look into moving them into one or multiple additional PRs, and reducing this PR here to the bare minimum, but that'll have to wait for another day.

In the meantime, feel free to already review parts or all of this PR.

src/io.c Outdated
if (!*STATE(In))
GetLine();
else if (STATE(In)[1] == '\n') {
STATE(In) += 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here which indicates that a line continuation was detected?

Also: the term "line continuation" should be documented somewhere (resp. be replaced by a different, documented name, if somebody has a good suggestion?)

src/io.c Outdated
STATE(Prompt) = "";
}
else if (STATE(In)[1] == '\r')
STATE(In) += (STATE(In)[2] == '\n') ? 3 : 2;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention here that this is about handling Windows/DOS style CR+LF?!?

src/io.c Outdated
@@ -220,6 +253,14 @@ Char PEEK_CURR_CHAR(void)
return *STATE(In);
}

void IGNORE_REST_OF_LINE(void)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestioins for a better name? SKIP_TO_END_OF_LINE? SKIP_LINE? ...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SKIP_TO_END_OF_LINE sounds good to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to SKIP_TO_END_OF_LINE

src/scanner.c Outdated
/* handle escape sequences */
if ( c == '\\' ) {
c = GET_NEXT_CHAR();
/* if next is another '\\' followed by '\n' it must be ignored */
while ( c == '\\' && PEEK_NEXT_CHAR() == '\n' ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR gets rid of two out of three uses of PEEK_NEXT_CHAR. The only remaining one is in GetNumber and is used to disambiguate floats from range expressions, i.e. [1.4] vs. [1..4].

This greatly simplifies dealing with CR+LF Windows line ends, and also
removes a behavioral difference between string streams and regular
streams: the former treated an isolate CR as a line terminator, and
stopped filling the buffer after encountering one, so for a CRLF, one
would get a buffer ending in just CR; while for a regular file stream,
one would see a buffer ending CR followed by LF.
@fingolfin
Copy link
Member Author

I have now reduced this PR to its bare minimum. Further refactoring and fixes will come in separate PRs.

src/io.c Outdated

// handle line continuation, i.e., backslash followed by new line or CRLF;
// and also the end of a line in general
while (*STATE(In) == '\\' || *STATE(In) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, why do we handle null characters like backslashes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, we just have to keep reading for both: for null chars, we have to read more data via GetLine(), then start over; and for backslash, we have to see if next comes a lineend, and if so, skip the backslash and the lineend. And in theory, either of these can happen after the other, hence the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that make sense? Would a better comment help? Suggestions welcome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this again, I didn't realise that the second part of 201 and 202 are the same case. Maybe change 202 to if(*STATE(In) == 0), which would make it more obvious that is matches the second condition in 201?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do, and will try to write some comments, too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@fingolfin fingolfin force-pushed the mh/line-continuation branch 2 times, most recently from 6a13163 to 40e08f5 Compare April 10, 2018 20:34
... and not in the scanner. This is much simpler, and also ensures
uniform treatment of line continuations everywhere. Several new test
cases are added to demonstrate this.

This leads to one change in behavior: line continuations inside of
triple quoted strings are now handled, while before they would just
insert a backslash followed by a newline into the string. This change
is intentional. A test case is adjusted accordingly.
@ChrisJefferson ChrisJefferson merged commit c926389 into gap-system:master Apr 11, 2018
@fingolfin fingolfin deleted the mh/line-continuation branch April 11, 2018 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants