Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve JSON str encode performance #459

Merged
merged 1 commit into from
Jun 28, 2023
Merged

Conversation

jcrist
Copy link
Owner

@jcrist jcrist commented Jun 28, 2023

This improves the performance of encoding JSON strings, mostly by rearranging some work and unrolling some loops.

This improves the performance of encoding JSON strings, mostly by
rearranging some work and unrolling some loops.
@jcrist
Copy link
Owner Author

jcrist commented Jun 28, 2023

A quick benchmark of string performance:

import random
import string
from time import perf_counter_ns

from orjson import dumps
from msgspec.json import encode

CHARS = string.ascii_letters
CHARS_WITH_ESCAPES = "\x01\n'\"" + string.ascii_letters


def randstr(chars):
    k = random.randint(2, 64)
    return "".join(random.choices(chars, k=k))


N = 1000
M = 10000
random.seed(42)

for header, chars in [("No Escapes:", CHARS), ("With Escapes:", CHARS_WITH_ESCAPES)]:
    print(header)
    data = [randstr(chars) for _ in range(N)]

    start = perf_counter_ns()
    for _ in range(M):
        encode(data)
    stop = perf_counter_ns()
    print(f"- msgspec: {(stop - start) / (1000 * M):.1f} us")

    start = perf_counter_ns()
    for _ in range(M):
        dumps(data)
    stop = perf_counter_ns()
    print(f"- orjson: {(stop - start) / (1000 * M):.1f} us")

Before

$ python bench.py
No Escapes:
- msgspec: 29.7 us
- orjson: 21.5 us
With Escapes:
- msgspec: 53.1 us
- orjson: 67.0 us

This PR

$ python bench.py
No Escapes:
- msgspec: 18.6 us
- orjson: 22.0 us
With Escapes:
- msgspec: 34.2 us
- orjson: 68.1 us

Notes:

  • This PR makes msgspec ~1.6x faster for encoding all strings
  • orjson is optimized towards the common case of encoding strings that don't have characters needing escaping (\n, \t, \r, \f, \b, ", ', \, and anything in \x01 - \x32). In the case of strings where escape codes are needed, orjson's performance tanks. msgspec was already faster than orjson for encoding strings that require escape codes, now we're ~2x faster in that case. We're also now faster in the common case.

Even though this PR unrolls some loops, it actually results in a (negligible) decrease in binary size since we delete a non-inlined helper function for handling the escape code case.

@jcrist jcrist merged commit e793b50 into main Jun 28, 2023
@jcrist jcrist deleted the optimize-json-str-encoding branch June 28, 2023 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant