Skip to content

Conversation

noteflakes
Copy link

The existing html_escape implementation always allocates buffer space (6 times
the length of the input string), even when the input string does not contain any
character that needs to be escaped.

This PR modifies the implementation of optimized_escape_html to not
pre-allocate an output buffer, but instead allocate it on the first occurence of
a character that needs escaping. In addition, instead of copying non-escaped
characters one by one to the output buffer, continuous non-escaped segments of
characters are copied using memcpy.

A synthetic benchmark employing the input strings used in the test_html_escape
method in test/test_erb.rb shows the modified implementation to be about 35%
faster than the original:

ruby 3.5.0preview1 (2025-04-18 master d06ec25be4) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
          escape old   273.773k i/100ms
          escape new   369.558k i/100ms
Calculating -------------------------------------
          escape old      2.766M (± 1.6%) i/s  (361.48 ns/i) -     13.962M in   5.048625s
          escape new      3.765M (± 2.0%) i/s  (265.58 ns/i) -     18.847M in   5.007869s

Comparison:
          escape old:  2766396.0 i/s
          escape new:  3765317.7 i/s - 1.36x  faster

…aracter

This change improves reduces allocations and makes `html_escape` ~35% faster in
a benchmark with escaped strings taken from the `test_html_escape` test in
`test/test_erb.rb`.

- Perform buffer allocation on first instance of escapable character.
- Instead of copying characters one at a time, copy unescaped segments using
  `memcpy`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant