Skip to content

Conversation

@aepsilon
Copy link
Contributor

Rewrote the scalar dumper with an emphasis on rigor, based on a
careful reading of the YAML 1.2 spec and testing against LibYAML.

Corrections

Block style chomping accounts for the extra newline
added by subsequent elements or at the end.
Fixes #238.

Top-level block scalars are indented.
Fixes dumper side of #221.

Folded style should be reliable now. The old dumper was not aware of
the subtle rules for more-indented lines.
More-indented lines no longer cause extra newlines to be added.
(The old dumper also had an off-by-one error that dropped
the last character from a long first word on a line.)
Fixes #215, #222. Removes #217 workaround.

Literal and folded styles can represent arbitrary "printable" strings,
including leading/trailing whitespace.
(Hopefully this improves readability for many use cases.)

NB. For the 'construct-string-types' dumper test, the strings are
dumped correctly. The test fails because a loader bug
loses a newline when using the keep "+" chomping indicator.

Additional improvements

Added test suite covering several former bugs, new edge cases,
and new expected behaviors. All previous applicable tests are included.

Styles are chosen in a well-defined order.

  • Plain and single are preferred for single lines under the width limit.
  • Folded is preferred when a line is longer than the width limit
    (if enabled) and does not begin with a space.
  • Literal is used for all other multi-line "printable" strings.
  • Double-quoted is only used when a string is unrepresentable otherwise.

Plain style is relaxed to include more characters and strings.
Progress on #183.

As indent increases, line width decreases down to min(lineWidth, 40).
Setting lineWidth to -1 disables wrapping.
Previously there was unexpected behavior past 40 indent:
narrow lineWidth suddenly jumped to 40,
and -1 (no limit) suddenly enforced a limit of 40.

Rewrote the scalar dumper with an emphasis on rigor, based on a
careful reading of the YAML 1.2 spec and testing against LibYAML.

Corrections:

Block style chomping accounts for added the extra newline
added by subsequent elements or at the end.
Fixes nodeca#238.

Top-level block scalars are indented.
Fixes dumper side of nodeca#221.

Folded style should be reliable now. The old dumper was not aware of
the subtle rules for more-indented lines.
More-indented lines no longer cause extra newlines to be added.
(The old dumper also had an off-by-one error that dropped
the last character from a long first word on a line.)
Fixes nodeca#215, nodeca#222. Removes nodeca#217 workaround.

Literal and folded styles can represent arbitrary "printable" strings,
including leading/trailing whitespace.

NB. For the 'construct-string-types' dumper test, the strings are
dumped correctly. The test fails because a loader bug
loses a newline when using the keep "+" chomping indicator.

Additional:

Added test suite covering several former bugs, new edge cases,
and new expected behaviors. All previous applicable tests are included.

Styles are chosen in a well-defined order.
Plain and single are preferred for single lines under the width limit.
Folded is preferred when a line is longer than the width limit
  (if enabled) and does not begin with a space.
Literal is used for all other multi-line "printable" strings.
Double-quoted is only used when a string is unrepresentable otherwise.

Plain style is relaxed to include more characters and strings.
Progress on nodeca#183.

As indent increases, line width decreases down to min(lineWidth, 40).
Setting lineWidth to -1 disables wrapping.
Previously there was unexpected behavior past 40 indent:
narrow lineWidth suddenly jumped to 40,
and -1 (no limit) suddenly enforced a limit of 40.


test('Javascript functions are not folded', function () {
test.skip('Javascript functions are not folded', function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those tests skipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @dervus expected, the actual problem was the folded style dumper was buggy. It tried to break more-indented lines, which resulted in newlines being added. Folding more-indented lines is not allowed in YAML.

Now that the folder is fixed it should work for all (applicable) string data regardless of content.
This line tests for a workaround that doesn't fix the root problem (see #222).

Although I think it's better that the root problem is fixed, I can add the workaround back in if desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, tests were added to check that fns & regexes are not broken. I don't mean tests should stay intact, but those should continue to work. There are no need to disable those cimpletely, for historic reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naturally, we want to make sure they continue to work.

The dumper doesn't differentiate between tags anymore. The current test demands no folded style, even though the data is dumped correctly. Keeping the test as-is means putting the workaround back in the dumper.

I could change the test to dump then load several examples of functions and regexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, update tests to reflect your changes. No workaround in dumper needed.

@puzrin
Copy link
Member

puzrin commented Mar 31, 2016

Very impressing. This week surprised me so much - 2 PRs (here & in pako), impossible without deep understanding of subject by authors.

@dervus could you take a look?

@dervus
Copy link
Collaborator

dervus commented Mar 31, 2016

I can take a look on the weekend. Very busy right now, sorry.

Updated test for nodeca#217.
It now checks that folded style preserves JS functions and regexps.

Replaced String.prototype.repeat with a helper function,
for compatibility with node 0.10.
@aepsilon
Copy link
Contributor Author

aepsilon commented Apr 1, 2016

Thanks for the compliment and the quick responses. I've updated the tests.

Regarding the loader bug, it seems to occur whenever the scalar is anything but trailing empty lines:

> yaml.safeLoad('a: |+\n\n\n')
{ a: '\n\n' }
> yaml.safeLoad('a: |+\n  \n\n')
{ a: '\n\n' }
> yaml.safeLoad('a: |+\n b\n\n')
{ a: 'b\n' }

> yaml.safeLoad('a: >+\n\n\n')
{ a: '\n\n' }
> yaml.safeLoad('a: >+\n  \n\n')
{ a: '\n\n' }
> yaml.safeLoad('a: >+\n b\n\n')
{ a: 'b\n' }

var CHAR_TAB = 0x09; /* Tab */
var CHAR_LINE_FEED = 0x0A; /* LF */
var CHAR_CARRIAGE_RETURN = 0x0D; /* CR */
// var CHAR_CARRIAGE_RETURN = 0x0D; /* CR */ // unused
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to keep it in any form here if it's not in use. It can be just re-added in future if needed.

@dervus
Copy link
Collaborator

dervus commented Apr 3, 2016

Also assert.equal in tests could be replaced with assert.strictEqual for consistency. But I don't really care about it.

Overall that's very good piece of work. Thank you, @aepsilon!

@puzrin, it's good for merge. However, we need to fix that loader bug before releasing this. I'll try to look into it during next week.

@puzrin puzrin merged commit 4644c6f into nodeca:master Apr 4, 2016
@puzrin
Copy link
Member

puzrin commented Apr 4, 2016

e0c4e4e - updated asserts use everywhere.

@aepsilon aepsilon deleted the rewrite-scalar-dump branch April 4, 2016 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants