-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve stringify_string performance #67
base: main
Are you sure you want to change the base?
Conversation
if (!escape_chars.test(str)){ | ||
return `"${str}"`; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove this part, some test fails. But the reason is because JSON.stringify
escapes more characters and uses lower case escapes (e.g. \udc00
instead of \uDC00
). There should be no essential problem.
The generated string would be slightly longer but I guess that's fine because in the previous version of the tests, it was expecting the escaped ones (changed by 5530e84#diff-a561630bb56b82342bc66697aee2ad96efddcbc9d150665abd6fb7ecb7c0ab2f).
Thank you! The performance is very dependent on the inputs. If I try shorter strings, this PR makes things much slower: https://jsbench.me/9zlv6vfa6c/2. I'm not sure if there's a way to get the best of both worlds (without introducing a ton of complexity) but I'm wary of accidentally making things slower for a lot of people. On the flip side of course we probably need to care about making existing slower cases faster more than we need to care about regressions in existing fast cases. But it would be good to have a more concrete grasp of the trade-offs as they relate to typical workloads. |
That's a good point. I've took some benchmarks with different input lengths. If the input length is longer than 100, the new algorithm is faster than the original one (except for the input that only contains benchmark results
In my workload, the input string doesn't contain const re = /[<\u2028\u2029]/
export function stringify_string(str) {
if (str.length < 100 || re.test(str)) {
return original_stringify_string(str)
}
return new_stringify_string(str)
} |
@sapphi-red return reg.test(str) ? JSON.stringify(str)
.replaceAll('<', '\\u003C')
.replaceAll('\u2028', '\\u2028')
.replaceAll('\u2029', '\\u2029')
: `"${str}"`; I think someone on the v8 team said that the callback based replace is much slower than a string. |
It seems the non-callback based replace is faster for long strings but slower for short strings. benchmark resultusing callback based replace
not using callback based replace
|
Ahh, sorry, replacing the '<' first increases the length of the string 5x in the pathological case so that compounds the slowdown
|
We probably need to keep both the original function and this one around, because stringify_string is also used to escape object keys, which I expect would be in the Edit: or not, because object keys usually don't have chars that need this escaping |
Ah, didn't think about that. Yeah that improves the perf. benchmark result
I assume yes. |
This PR improves
stringify_string
performance.Description
The ECMAScript spec specifies that
JSON.stringify
will escape"
,\\
,\n
,\r
,\t
,\b
,\f
, and chars less than space.https://tc39.es/ecma262/multipage/structured-data.html#sec-quotejsonstring
This is also specified in the ES5.1 so the compatibility should be fine.
benchmark numbers
(This PR / Original)
'long'.repeat(1000)
'long2\n'.repeat(1000)
'"\n"'.repeat(1000)
'<'.repeat(1000)
'long'.repeat(1000) + '\n'
(ops/s)
https://jsbench.me/25lm0vndix/1
This PR degrades performance on the
'<'.repeat(1000)
case but I assume a string like that won't appear frequently.I guess this PR would only be slow if a string contained
<
/\u2028
/\u2029
many times.