Skip to content

Improve JSON payload handling #4128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 28, 2022
Merged

Conversation

mrkishi
Copy link
Member

@mrkishi mrkishi commented Feb 25, 2022

This fixes security issues related to script elements carrying JSON payloads, while refactoring the internal API so it's less likely to be used incorrectly.

Changes to the internal API

Since escape_json_in_html worked in chunks of data, it was easy to mistake it for a safe method regardless of context. However, the transformed payload wouldn't be useable everywhere in a document. The only place it could safely be placed is inside a <script> tag.

Therefore, instead of escaping values ad-hoc and later interpolating strings to assemble the final script tag, payloads are now kept as objects until the time to render them to HTML. At that point, calling render_json_payload_script will render the whole script element to html, at once.

This change is intended to minimize the risk that escaped values are accidentally mishandled. As a matter of fact, we previously forgot to escape statusText and headers, current attack vectors.

It also uses sveltekit:data-{property} as attribute names for the payload's associated key-value data. While this is not final and is open to bikeshed, it avoids conflicts with user attributes as was previously the case.

Changes to escaped characters

These changes are not strictly required, but since we're now ensuring the payload is in a script tag, it is possible to reduce the amount of escaped characters. Some information:

  • Escaping < and > is not strictly necessary. These have no meaning to HTML parsers inside script elements, with the exception of the literal sequence </script, which could be handled by escaping any of those characters. [1]

  • \u2028 and \u2029 are blips of history. They used to cause issues because they weren't valid characters inside JS strings. However, not only were they always valid in non-js script elements, they are also valid in JS string since the "Subsume JSON" proposal was standardized in ES2019. [2]

  • Care must be taken about HTML comments. Even though <script> is famously a raw text element that will parse "</script>" as a valid closing tag, this is not true inside HTML comments. A rogue <!-- can deactivate parsing until the next -->, after which text will continue to be consumed by the script element. [3]

With this in mind, we can limit the escaped characters to <, but \u2028 and \u2029 are escaped for the sake of older browsers.

[1] https://html.spec.whatwg.org/multipage/syntax.html#cdata-rcdata-restrictions
[2] https://github.com/tc39/proposal-json-superset
[3] https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escaped-state

Please don't delete this checklist! Before submitting the PR, please make sure you do the following:

  • It's really useful if your PR references an issue where it is discussed ahead of time. In many cases, features are absent for a reason. For large changes, please create an RFC: https://github.com/sveltejs/rfcs
  • This message body should clearly illustrate what problems it solves.
  • Ideally, include a test that fails without this PR but passes with it.

Tests

  • Run the tests with pnpm test and lint the project with pnpm lint and pnpm check

Changesets

  • If your PR makes a change that should be noted in one or more packages' changelogs, generate a changeset by running pnpx changeset and following the prompts. All changesets should be patch until SvelteKit 1.0

This fixes security issues related to script elements carrying JSON
payloads, while refactoring the internal API so it's less likely to be
used incorrectly.

Changes to the internal API
===========================

Since `escape_json_in_html` worked in chunks of data, it was easy to
mistake it for a safe method regardless of context. However, the
transformed payload wouldn't be useable everywhere in a document. The
only place it could safely be placed is inside a `<script>` tag.

Therefore, instead of escaping values ad-hoc and later interpolating
strings to assemble the final script tag, payloads are now kept as
objects until the time to render them to HTML. At that point, calling
`render_json_payload_script` will render the whole script element to
html, at once.

This change is intended to minimize the risk that escaped values are
accidentally mishandled. As a matter of fact, we previously forgot to
escape `statusText` and `headers`, current attack vectors.

It also uses `sveltekit:data-{property}` as attribute names for the
payload's associated key-value data. While this is not final and is open
to bikeshed, it avoids conflicts with user attributes as was previously
the case.

Changes to escaped characters
=============================

These changes are not strictly required, but since we're now ensuring
the payload is in a script tag, it is possible to reduce the amount of
escaped characters. Some information:

 - Escaping `<` and `>` is not strictly necessary. These have no meaning
   to HTML parsers inside script elements, with the exception of the
   literal sequence `</script`, which could be handled by escaping any
   of those characters. [1]

 - `\u2028` and `\u2029` are blips of history. They used to cause issues
   because they weren't valid characters inside JS strings. However, not
   only were they always valid in non-js script elements, they are also
   valid in JS string since the "Subsume JSON" proposal was standardized
   in ES2019. [2]

 - Care must be taken about HTML comments. Even though `<script>` is
   famously a raw text element that will parse `"</script>"` as a valid
   closing tag, this is not true inside HTML comments. A rogue `<!--`
   can deactivate parsing until the next `-->`, after which text will
   continue to be consumed by the script element. [3]

With this in mind, we can limit the escaped characters to `<`.

[1] https://html.spec.whatwg.org/multipage/syntax.html#cdata-rcdata-restrictions
[2] https://github.com/tc39/proposal-json-superset
[3] https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escaped-state
@changeset-bot
Copy link

changeset-bot bot commented Feb 25, 2022

🦋 Changeset detected

Latest commit: b2fd95e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@sveltejs/kit Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@@ -21,4 +50,5 @@ attr('escapes invalid surrogates', () => {
assert.equal(escape_html_attr('\ud800\ud800\udc00\udc00'), '"&#55296;\ud800\udc00&#56320;"');
});

json.run();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw @Rich-Harris comment on a recent PR that it didn't need to create a test suite. The test setup has changed a lot and I'm not sure under which circumstances a test suite is needed and when it is not needed. Flagging this for his attention and my education 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought suites were just informational groups to help test navigation—without them, all tests go into a single anonymous suite. I'll remove them if that causes issues!

@Rich-Harris Rich-Harris merged commit 915691f into sveltejs:master Feb 28, 2022
@mrkishi mrkishi deleted the json-payload branch February 28, 2022 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants