Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch][bug] Escape __PRELOADED_STATE__. Fix for #441 #442

Merged
merged 1 commit into from
Jun 29, 2017

Conversation

bodawei
Copy link
Contributor

@bodawei bodawei commented Jun 27, 2017

This is a fix for issue 441.

This solves the issue by replacing "<"'s in the PRELOADED_STATE string with \u003C . The original blog post that alerted me to this problem pointed at a package ( https://github.com/yahoo/serialize-javascript ) that would do the escaping. It did other things that weren't needed here, and rather than bringing in a 100 line package that did what 5 lines could do, I implemented what was needed here. In the process, I also noted that they were solving another problem, which is that the \u2028 and \u2029 line and paragraph endings will also mess up the expected results, and so I added that here as well.

If you would prefer to pull in the whole serialize package, I'm happy to do that as an alternative.

=============================== Coverage summary ===============================
Statements   : 100% ( 79/79 )
Branches     : 100% ( 46/46 )
Functions    : 100% ( 16/16 )
Lines        : 100% ( 78/78 )
================================================================================

@bodawei
Copy link
Contributor Author

bodawei commented Jun 27, 2017

I think I see what you are saying. I would like to discuss this more.

If I understand you right, you are saying that your model is that people should be escaping these values in their reducers, in case that ends up in the preloaded state. Is that what you mean?

If so, that seems really error prone to me. That is, easy for someone to forget in some circumstance. That is, catching this in one place and handling it universally seems much safer to me.

As for the scripts within the store: Even if they want to do that, wouldn't they still need to be escaping those in some way? Which i think takes one back to this same problem. That is, unless I'm missing something, that doesn't really change the issue here.

I'd not thought of doing an encodeURIComponent() here. The thing I like about going with the \u escape pattern is that one doesn't need to do any manual decoding of the data on the receiving side. the \uXXXX just turns into the character in question when one access the string, so it is transparent to the app..

(I just checked "< script" in safari, and it didn't recognize it as a script tag. But, I agree with you it might be universally safest to allow for that, though it isn't technically valid)

Thanks for your swift response here!

@jchip
Copy link
Member

jchip commented Jun 27, 2017

I agree it should be done in the default stringify this module provides, but it should not alter the data, so encoding is the way to go. Adding an explicit note in the doc about this so users are aware of it would help.

@bodawei
Copy link
Contributor Author

bodawei commented Jun 28, 2017

@jchip I'm not quite sure I understand your comment "should not alter the data". Or, put another way, I'm not sure why using an escape is altering the data, but encoding is not?

Let me step back and explain what I'm interpreting. Let's say I do an encoding scheme:

   const replacers = {
     "<script": encodeURIComponent("<script"),
     "</script>": encodeURIComponent("</script>")
   };
   const encodeScript = function encodeScript(str) {
     const rg = new RegExp("<script|</script>", "g");
     return str.replace(rg, (m) => replacers[m]);
   };
   return `window.__WML_REDUX_INITIAL_STATE__ = ${encodeScript(JSON.stringify(storeState))};`;

and let's say storeState is:

{
  "todos": [
    {
      "id": 0,
      "text": "1",
      "completed": false
    },
    {
      "id": 2,
      "text": "</script><script>console.log('gotcha')</script>",
      "completed": false
    }
  ]
}

doesn't this mean any client components are going to need to do explicitly decode this? e.g.

function MyComponent({ todo }) {
   const usableText = decodeURIComponent(todo.text);  // todo.text[0] === '%'
   ...
}

(or some other portion of the code needs to do the same, and I guess encodeScript() needs to also escape any %'s that are already in there).

To compare, with the escape:

   const replacers = {
     "<script": "\u003cscript",
     "</script>": "\u003c/script>"
   };
   const encodeScript = function encodeScript(str) {
     const rg = new RegExp("<script|</script>", "g");
     return str.replace(rg, (m) => replacers[m]);
   };
   return `window.__WML_REDUX_INITIAL_STATE__ = ${encodeScript(JSON.stringify(storeState))};`;

With that same storeState as above.

Then components/reducers/consumers do not need to do anything, since the <'s are (as soon as the browser parses the page) already <'s

function MyComponent({ todo }) {
   const usableText = todo.text;   // todo.text[0] === '<'
   ...
}

@jchip
Copy link
Member

jchip commented Jun 28, 2017

Hi, thanks for the explanation. I misunderstood your changes. Your code is actually just using double backslash to escape < as a unicode char. That may be better than encoding.

Of course you are right that we have to decode the scripts back when we get them to the client.

I am not clear about escaping the \u2028 and \u2029 part still. Can you give me an example how that mess up the result please?

@bodawei
Copy link
Contributor Author

bodawei commented Jun 28, 2017

Hey, no problem. The double slash is a bit weird at first.. escaping the escape so it gets undone in that order.

As for the \u2028 and \u2029 parts... these are unicode line ending and paragraph ending characters. Evidently, when the browser is parsing the text from the server, JSON.stringify() doesn't escape them, and so the browser interprets these as line breaks and so will break the strings in a way that is illegal. This stack overflow question/answer says it better than I can: https://stackoverflow.com/questions/2965293/javascript-parse-error-on-u2028-unicode-character In practice, for me, if I let a \u2028 or \u2029 through unescaped, I get an error in the browser console on the line where the problematic character is.

@jchip
Copy link
Member

jchip commented Jun 29, 2017

awesome. thank you for your contribution.

@jchip jchip merged commit 161fb83 into electrode-io:master Jun 29, 2017
@bodawei
Copy link
Contributor Author

bodawei commented Jun 29, 2017

And thank you for giving me something I can contribute to! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants