Skip to content

Conversation

@sekur
Copy link

@sekur sekur commented Oct 24, 2015

fixes an issue where javascript string literals get folded and causing eval issues when some of the lines exceed the hard-coded max line length of 80

…g issues when some of the lines exceed the hard-coded max line length of 80
@puzrin
Copy link
Member

puzrin commented Oct 24, 2015

Could you add test?

@dervus
Copy link
Collaborator

dervus commented Oct 24, 2015

The dumper must know nothing about javascript literals. JS function type as any other non-core type is an optional plugin. I don't like idea of adding such hacks. This must be solved via type metadata.

@dervus
Copy link
Collaborator

dervus commented Oct 24, 2015

Also, I don't understand how folding may cause eval issues. Dumber must serialize data in the way that will be reconstructed without differences, no matter of scalar style. If it doesn't, then it's general serialization bug of folding scalars, no relation to js/function type.

@sekur
Copy link
Author

sekur commented Oct 24, 2015

In my view the max line length of 80 characters being hard-coded in the dumper code without any ability for customizing that value is the real problem. But that will require more extensive changes.

This simply bypasses that logic for serializing js/function strings which was already serialized properly by the Type but arbitrarily "folded" by the dumper logic to some X column width.

I will provide the test scenario as requested but the issue basically happens when a serialized output from Function.toString() contains lines that exceed 80 character length constraint and the folder then arbitrarily wraps that function line to the next. Sometimes it gets lucky but other times it does strange things.

Simplest example is:

var some_long_string = "this is a long string that will eventually exceed the fixed 80 character length";

The fold logic will cut this line into two. JavaScript does not have multi-line string continuation so it will become invalid JavaScript.

Not all scalars can be folded in same ways without it likely breaking syntax. What if you had another YAML scalar as a string in there?

Anyhow I don't see how dumper can properly deal with scalar data without introspecting the type it represents.

@puzrin
Copy link
Member

puzrin commented Oct 24, 2015

It worth checking dumped types to allow folding for strings only, because current method is not safe. For example, it can fail on long RegExp.

@puzrin
Copy link
Member

puzrin commented Oct 24, 2015

Anyway, it worth to make test to demonstrate problem and disable it via it.skip() (to keep suite passing). That can be merged separately.

@dervus
Copy link
Collaborator

dervus commented Oct 24, 2015

Again. Dumper has nothing to do with types. It works with sequences of characters, and it always serializes those without any damage. Yes, for case of js/funciton there could be unpleasant styling damage, but it doesn't affect the correctness of YAML document at all. I agree that some specific types may not want to be represented using some scalar styles, but this must be specified via type metadata, not hardcoded in the dumper.

Also, line length for folded scalars should be configurable. If not, then it seems I overlooked that when I was reviewing the pull request...

@puzrin
Copy link
Member

puzrin commented Oct 24, 2015

@dervus https://github.com/nodeca/js-yaml/blob/master/lib/js-yaml/dumper.js#L259 it's hardcoded, because nobody needed to change it. I postponed api change until someone can prove that it's really needed.

@dervus
Copy link
Collaborator

dervus commented Oct 24, 2015

Hm. I see. Probably I was disagree back then. :D I personally prefer 120 line length limit for example.

@puzrin
Copy link
Member

puzrin commented Oct 24, 2015

I personally prefer 120 line length limit for example.

I doubt you work with yaml files :)

@sekur
Copy link
Author

sekur commented Oct 24, 2015

@dervus - I agree this particular hack to explicitly exclude custom type js/function would not be the ideal resolution but it IS a custom-tag inherently supported by js-YAML currently and the one most affected by this folding logic. And most minimalistic change.

Ideally, the Type definition can have a flag to either opt-in/out to allow wrap during dump. But that will require all custom-types in present and future to define that. I would go for Type to explicitly opt-in to be wrappable rather than otherwise.

But the dump routine (particularly writeScalar) only has state/object/level etc and the object already contains the type.represent so only other variable at hand to use was state.tag

Another option would be to allow dump API to accept custom serialization function.

@dervus
Copy link
Collaborator

dervus commented Oct 24, 2015

Representer function doesn't control how data will be serialized further. It just converts an arbitrary JS object into a YAML primitive: sequence, mapping, or scalar.

@puzrin
Copy link
Member

puzrin commented Nov 11, 2015

@dervus closing as rejected?

@sekur
Copy link
Author

sekur commented Nov 11, 2015

Whether or not this PR itself is rejected the underlying bug should be addressed.

We need a way to have JavaScript functions bypass the folding logic during dump.

I'd be happy to submit a different PR but would like some guidance on how we can make it work without this type of hackish way.

Peter Lee

On Nov 10, 2015, at 10:59 PM, Vitaly Puzrin notifications@github.com wrote:

@dervus closing as rejected?


Reply to this email directly or view it on GitHub.

@puzrin
Copy link
Member

puzrin commented Nov 11, 2015

I'd create a new ticket to cleanup. Rejecting PR does not means rejecting everything - it's a technical process of keeping tracker clean, nothing else. AFAIK, you need to dump FNs as !!js/function type, and make sure that folding is applied to string type only.

@dervus could you help?

@dervus
Copy link
Collaborator

dervus commented Nov 11, 2015

I already describe the solution twice. Type class should be extended with new option to restrict scalar styles on dump. Such option should be used in js/function type definition.

Since solution proposed here is unacceptable — yes, it's closed as rejected.

@dervus dervus closed this Nov 11, 2015
@sekur
Copy link
Author

sekur commented Nov 11, 2015

If we create a new Type option to explicitly specify NOT to apply styling during dump, then the assumption is that by default the folding styling will be applied to other custom Types.

I don't think that's the right assumption - we should expect that the proper string serialization output would be produced by the custom Type "represent" function.

I agree more with Vitaly on this point that the style/folding is only safe with "string" and that's the only type that such folding/style should be applied.

If we DO add a new meta option to Type, then it should be to state that the new custom Type is OK to apply folding/style, not the other way.

Peter Lee

On Nov 11, 2015, at 3:42 AM, Vladimir Zapparov notifications@github.com wrote:

I already describe the solution twice. Type class should be extended with new option to restrict scalar styles on dump. Such option should be used in js/function type definition.

Since solution proposed here is unacceptable — yes, it's closed as rejected.


Reply to this email directly or view it on GitHub.

@dervus
Copy link
Collaborator

dervus commented Nov 11, 2015

I don't think that's the right assumption - we should expect that the proper string serialization output would be produced by the custom Type "represent" function.

No. Represent function is not about styling.

I agree more with Vitaly on this point that the style/folding is only safe with "string" and that's the only type that such folding/style should be applied.

For what reason? Styling does not affect data correctness. Types way be different. Doing any assumptions which styles suites types better would be wrong. For example there could be markup language custom types which prefer folded style while being non-string objects. For that reason we will never make any assumptions. Why? Because dumper must not care about presenting you data pretty. It only care about presenting it correct. Not correct in terms of any custom types, but it terms of YAML. Because we're writing YAML document, not JavaScript, not Python, not Brainfuck, not whatever else.

@sekur
Copy link
Author

sekur commented Nov 11, 2015

Let me ask another way. Why are we applying styling in "dump" for YAML to begin with? What is the reason for folding into some X size column? For readability? But readability should NOT affect the ability to parse it back to the correct form.

If the dump "styling" affects ability to chain output of dump into parse and NOT get the original back, then it IS affecting data correctness.

As you say, yes, it produces "correct" YAML output but nevertheless it still breaks js-yaml parse so if anything I would say don't fold data for style reasons at all because it creates an unnecessary problem.

If users want dump to apply "style" on the output, they should pass that in as an option request when calling dump and the routine should not assume to apply style as a default. As you've stated, we should NOT assume that users calling dump wants any styling to begin with.

Peter Lee

On Nov 11, 2015, at 11:29 AM, Vladimir Zapparov notifications@github.com wrote:

I don't think that's the right assumption - we should expect that the proper string serialization output would be produced by the custom Type "represent" function.

No. Represent function is not about styling.

I agree more with Vitaly on this point that the style/folding is only safe with "string" and that's the only type that such folding/style should be applied.

For what reason? Styling does not affect data correctness. Types way be different. Doing any assumptions which styles suites types better would be wrong. For example there could be markup language custom types which prefer folded style while being non-string objects. For that reason we will never make any assumptions. Why? Because dumper must not care about presenting you data pretty. It only care about presenting it correct. Not correct in terms of any custom types, but it terms of YAML. Because we're writing YAML document, not JavaScript, not Python, not Brainfuck, not whatever else.


Reply to this email directly or view it on GitHub.

@puzrin
Copy link
Member

puzrin commented Nov 11, 2015

I've created #217. It seems enougth to restrict folding to strings only. No need to overcomplicate things.

@sekur
Copy link
Author

sekur commented Nov 11, 2015

@puzrin, thanks for getting this issue resolved. This will allow me to unbundle js-yaml dep inside my module and get back to life. :-)

aepsilon added a commit to aepsilon/js-yaml that referenced this pull request Mar 31, 2016
Rewrote the scalar dumper with an emphasis on rigor, based on a
careful reading of the YAML 1.2 spec and testing against LibYAML.

Corrections:

Block style chomping accounts for added the extra newline
added by subsequent elements or at the end.
Fixes nodeca#238.

Top-level block scalars are indented.
Fixes dumper side of nodeca#221.

Folded style should be reliable now. The old dumper was not aware of
the subtle rules for more-indented lines.
More-indented lines no longer cause extra newlines to be added.
(The old dumper also had an off-by-one error that dropped
the last character from a long first word on a line.)
Fixes nodeca#215, nodeca#222. Removes nodeca#217 workaround.

Literal and folded styles can represent arbitrary "printable" strings,
including leading/trailing whitespace.

NB. For the 'construct-string-types' dumper test, the strings are
dumped correctly. The test fails because a loader bug
loses a newline when using the keep "+" chomping indicator.

Additional:

Added test suite covering several former bugs, new edge cases,
and new expected behaviors. All previous applicable tests are included.

Styles are chosen in a well-defined order.
Plain and single are preferred for single lines under the width limit.
Folded is preferred when a line is longer than the width limit
  (if enabled) and does not begin with a space.
Literal is used for all other multi-line "printable" strings.
Double-quoted is only used when a string is unrepresentable otherwise.

Plain style is relaxed to include more characters and strings.
Progress on nodeca#183.

As indent increases, line width decreases down to min(lineWidth, 40).
Setting lineWidth to -1 disables wrapping.
Previously there was unexpected behavior past 40 indent:
narrow lineWidth suddenly jumped to 40,
and -1 (no limit) suddenly enforced a limit of 40.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants