Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data.Aeson.Decoding.Text, decodeStrictText :: Text -> ... #1072

Merged
merged 1 commit into from
Oct 3, 2023

Conversation

phadej
Copy link
Collaborator

@phadej phadej commented Oct 3, 2023

We avoid intermediate ByteString copy by not doing decode . TE.encodeUtf8, but instead working on Text value directly. As we know that the stream is valid Unicode (UTF8 or UTF16), we can also take some shortcuts.

One gotcha is that internal Text values (in Keys or Value Strings) most likely retain the original input Text value (its Array). It shouldn't be an issue if the Value is actually decoded so these Text values disapper, but if not (e.g. Object keys survive) then users might want to use Data.Text.copy.

With GHC-9.6.2 (text-2.0.2; UTF-8) the speedup is not huge, but noticeable anyway:

aeson/strict:                 OK (0.26s)
  462  μs ±  23 μs
aeson/text:                   OK (0.22s)
  399  μs ±  25 μs
aeson/text-via-bs:            OK (0.14s)
  473  μs ±  45 μs

With GHC-8.6.5 (text-1.2.3.0; UTF-16) the speedup is relatively more:

aeson/strict:                 OK (0.22s)
  819  μs ±  74 μs
aeson/text:                   OK (0.17s)
  593  μs ±  46 μs
aeson/text-via-bs:            OK (0.23s)
  875  μs ±  62 μs

We avoid intermediate ByteString copy by not doing
`decode .  TE.encodeUtf8`, but instead working on `Text` value directly.
As we know that the stream is valid Unicode (UTF8 or UTF16),
we can also take some shortcuts.

One gotcha is that internal Text values (in Keys or Value Strings)
most likely retain the original input `Text` value (its Array).
It shouldn't be an issue if the Value is actually decoded so these
`Text` values disapper, but if not (e.g. `Object` keys survive)
then users might want to use `Data.Text.copy`.

With GHC-9.6.2 (text-2.0.2; UTF-8) the speedup is not huge, but
noticeable anyway:

    aeson/strict:                 OK (0.26s)
      462  μs ±  23 μs
    aeson/text:                   OK (0.22s)
      399  μs ±  25 μs
    aeson/text-via-bs:            OK (0.14s)
      473  μs ±  45 μs

With GHC-8.6.5 (text-1.2.3.0; UTF-16) the speedup is relatively more:

    aeson/strict:                 OK (0.22s)
      819  μs ±  74 μs
    aeson/text:                   OK (0.17s)
      593  μs ±  46 μs
    aeson/text-via-bs:            OK (0.23s)
      875  μs ±  62 μs
@phadej phadej merged commit b438e32 into master Oct 3, 2023
9 checks passed
@phadej phadej deleted the decoding-text branch October 3, 2023 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant