Use SWAR for parsing integers on little endian machines. #878

samyron · 2025-10-31T02:33:02Z

This PR uses SWAR on little endian architectures to recognize and parse eight consecutive ASCII digits. This seems to have positive performance improvements when parsing long (but not too long) integers and floats.

The unsigned 32 bit integer parsing data was created with the following:

File.write("integers-rand-unsigned-32bits.json", JSON.generate((1..10000).map { rand(4294967295) }))

The integer parsing was created witht he following:

File.write("integers.json", JSON.generate((1..10000).map { rand(18446744073709551615) }))

The benchmarks below are using a Macbook Air M1.

This branch compared to master

Run 1

== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    17.000 i/100ms
Calculating -------------------------------------
               after    186.020 (± 1.6%) i/s    (5.38 ms/i) -    935.000 in   5.027877s

Comparison:
              before:      164.3 i/s
               after:      186.0 i/s - 1.13x  faster


== Parsing integer parsing (204025 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   123.000 i/100ms
Calculating -------------------------------------
               after      1.248k (± 0.9%) i/s  (801.04 μs/i) -      6.273k in   5.025278s

Comparison:
              before:     1167.0 i/s
               after:     1248.4 i/s - 1.07x  faster


== Parsing unsigned 32 bit integer parsing (107355 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   862.000 i/100ms
Calculating -------------------------------------
               after      8.593k (± 1.4%) i/s  (116.38 μs/i) -     43.100k in   5.016926s

Comparison:
              before:     6805.6 i/s
               after:     8592.6 i/s - 1.26x  faster

Run 2

== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    17.000 i/100ms
Calculating -------------------------------------
               after    183.029 (± 2.2%) i/s    (5.46 ms/i) -    918.000 in   5.017972s

Comparison:
              before:      160.3 i/s
               after:      183.0 i/s - 1.14x  faster


== Parsing integer parsing (204025 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   123.000 i/100ms
Calculating -------------------------------------
               after      1.234k (± 5.4%) i/s  (810.62 μs/i) -      6.150k in   5.006831s

Comparison:
              before:     1139.1 i/s
               after:     1233.6 i/s - same-ish: difference falls within error


== Parsing unsigned 32 bit integer parsing (107355 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after   847.000 i/100ms
Calculating -------------------------------------
               after      8.534k (± 1.7%) i/s  (117.17 μs/i) -     43.197k in   5.063015s

Comparison:
              before:     6760.0 i/s
               after:     8534.5 i/s - 1.26x  faster

This branch compared to other libraries

== Parsing float parsing (2251051 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json    17.000 i/100ms
          json_coder    18.000 i/100ms
                  oj     2.000 i/100ms
          Oj::Parser     2.000 i/100ms
           rapidjson    20.000 i/100ms
Calculating -------------------------------------
                json    182.670 (± 0.5%) i/s    (5.47 ms/i) -    918.000 in   5.025684s
          json_coder    189.703 (± 0.5%) i/s    (5.27 ms/i) -    954.000 in   5.029077s
                  oj     24.049 (± 0.0%) i/s   (41.58 ms/i) -    122.000 in   5.073862s
          Oj::Parser     28.384 (± 0.0%) i/s   (35.23 ms/i) -    142.000 in   5.003921s
           rapidjson    219.182 (± 0.9%) i/s    (4.56 ms/i) -      1.100k in   5.019120s

Comparison:
                json:      182.7 i/s
           rapidjson:      219.2 i/s - 1.20x  faster
          json_coder:      189.7 i/s - 1.04x  faster
          Oj::Parser:       28.4 i/s - 6.44x  slower
                  oj:       24.0 i/s - 7.60x  slower


== Parsing integer parsing (204025 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   122.000 i/100ms
          json_coder   124.000 i/100ms
                  oj    97.000 i/100ms
          Oj::Parser    28.000 i/100ms
           rapidjson   214.000 i/100ms
Calculating -------------------------------------
                json      1.233k (± 0.6%) i/s  (811.23 μs/i) -      6.222k in   5.047603s
          json_coder      1.237k (± 0.7%) i/s  (808.46 μs/i) -      6.200k in   5.012715s
                  oj    971.156 (± 0.6%) i/s    (1.03 ms/i) -      4.947k in   5.094138s
          Oj::Parser    280.331 (± 1.4%) i/s    (3.57 ms/i) -      1.428k in   5.094857s
           rapidjson      2.166k (± 0.6%) i/s  (461.78 μs/i) -     10.914k in   5.040056s

Comparison:
                json:     1232.7 i/s
           rapidjson:     2165.5 i/s - 1.76x  faster
          json_coder:     1236.9 i/s - same-ish: difference falls within error
                  oj:      971.2 i/s - 1.27x  slower
          Oj::Parser:      280.3 i/s - 4.40x  slower


== Parsing unsigned 32 bit integer parsing (107355 bytes)
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [arm64-darwin24]
Warming up --------------------------------------
                json   860.000 i/100ms
          json_coder   868.000 i/100ms
                  oj   390.000 i/100ms
          Oj::Parser   737.000 i/100ms
           rapidjson   565.000 i/100ms
Calculating -------------------------------------
                json      8.353k (± 5.2%) i/s  (119.72 μs/i) -     42.140k in   5.061476s
          json_coder      8.508k (± 3.2%) i/s  (117.54 μs/i) -     43.400k in   5.106714s
                  oj      3.789k (± 2.4%) i/s  (263.90 μs/i) -     19.110k in   5.046051s
          Oj::Parser      7.611k (± 4.0%) i/s  (131.39 μs/i) -     38.324k in   5.043984s
           rapidjson      5.673k (± 1.8%) i/s  (176.26 μs/i) -     28.815k in   5.080640s

Comparison:
                json:     8352.7 i/s
          json_coder:     8507.9 i/s - same-ish: difference falls within error
          Oj::Parser:     7610.9 i/s - 1.10x  slower
           rapidjson:     5673.3 i/s - 1.47x  slower
                  oj:     3789.3 i/s - 2.20x  slower

byroot

I like this a lot. But I think I'll refactor it a bit to reduce duplication, and potentially improve code generation.

byroot · 2025-10-31T07:40:06Z

ext/json/ext/parser/parser.c

+static inline int has_eight_consecutive_digits(const char *p) {
+    uint64_t val;
+    memcpy(&val, p, sizeof(uint64_t));
+    return (((val & 0xF0F0F0F0F0F0F0F0) | (((val + 0x0606060606060606) & 0xF0F0F0F0F0F0F0F0) >> 4)) == 0x3333333333333333);


I'm thinking we could use that trick combined with clz (or similar) to know how many consecutive digits we have.

I suspect 8 consecutive digits aren't that common, but if we also had a 4 digits (uint32_t) version and a fast dispatch, that could help on more benchmarks.

Actually, I think we can simply do (comp & 0xFFFFFFFF) == 0x33333333 to check for 4 consecutive digits.

4bytes version:

static inline uint32_t parse_four_digits_unrolled(const char *p) { uint64_t large_val; memcpy(&large_val, p, sizeof(uint64_t)); uint32_t val = (uint32_t)large_val; const uint32_t mask = 0x000000FF; const uint32_t mul1 = 100; val -= 0x30303030; val = (val * 10) + (val >> 8); // val = (val * 2561) >> 8; val = ((val & mask) * mul1) + (((val >> 16) & mask)); return (uint32_t)val; }

Closes: ruby#878 ``` == Parsing float parsing (2251051 bytes) ruby 3.4.6 (2025-09-16 revision dbd83256b1) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- after 23.000 i/100ms Calculating ------------------------------------- after 214.382 (± 0.5%) i/s (4.66 ms/i) - 1.081k in 5.042555s Comparison: before: 189.5 i/s after: 214.4 i/s - 1.13x faster ``` Co-Authored-By: Scott Myron <samyron@gmail.com>

Closes: ruby/json#878 ``` == Parsing float parsing (2251051 bytes) ruby 3.4.6 (2025-09-16 revision ruby/json@dbd83256b1) +YJIT +PRISM [arm64-darwin24] Warming up -------------------------------------- after 23.000 i/100ms Calculating ------------------------------------- after 214.382 (± 0.5%) i/s (4.66 ms/i) - 1.081k in 5.042555s Comparison: before: 189.5 i/s after: 214.4 i/s - 1.13x faster ``` ruby/json@6348ff0891 Co-Authored-By: Scott Myron <samyron@gmail.com>

samyron and others added 2 commits October 31, 2025 08:24

Use SWAR for parsing integers on little endian machines.

fefb3ce

Add test coverage for T_BIGNUM parsing

5b76dc4

byroot force-pushed the sm/swar-integer-parsing branch from a9c9a22 to 5b76dc4 Compare October 31, 2025 07:31

byroot reviewed Oct 31, 2025

View reviewed changes

byroot mentioned this pull request Nov 1, 2025

Refactor number parsing #883

Merged

byroot mentioned this pull request Nov 1, 2025

Use SWAR for parsing integers on little endian machines #885

Merged

byroot closed this in #885 Nov 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use SWAR for parsing integers on little endian machines. #878

Use SWAR for parsing integers on little endian machines. #878

samyron commented Oct 31, 2025

Uh oh!

byroot left a comment

Uh oh!

byroot Oct 31, 2025

Uh oh!

byroot Oct 31, 2025

Uh oh!

byroot Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use SWAR for parsing integers on little endian machines. #878

Use SWAR for parsing integers on little endian machines. #878

Conversation

samyron commented Oct 31, 2025

This branch compared to master

Run 1

Run 2

This branch compared to other libraries

Uh oh!

byroot left a comment

Choose a reason for hiding this comment

Uh oh!

byroot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

byroot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

byroot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants