Speed up parsing notation LLSD #1

SaladDais · 2022-09-28T09:51:11Z

Thanks for splitting out this code!

As part of working with LEAP API, I need to parse very large notation LLSD payloads containing a large number of strings. The parsing of those payloads is currently heavily bottle-necked by the code handling delimited strings.

For example, invoking the LLWindowListener::getPaths() method over LEAP returns a 5.5mb blob of LLSD (assuming logged in with a relatively bare inventory,) and that payload takes 2.5s to parse with llsd.parse(). Since latency is important when dealing with LEAP, I looked at how ujson implemented their delimited string handling to see how llsd's string handling logic could be improved.

I ended up:

Removing use of self._getc() and reading from self._buffer directly, this was the most significant improvement. Since we need to read a byte at a time, getting single-byte byte strings is inefficient. The function not using a potentially overridden self._getc() shouldn't be an issue, since LLSDBinaryParser has done manual manipulation of self._index for a long time now. Basically, self._getc() must operate on self._buffer and self._index for the parsers to work correctly, so not using the abstraction in this particular case shouldn't break any existing uses
Removing self references inside the loop to avoid the attribute lookup cost
Slightly reordering the logic to do fewer comparisons per iteration in the common case of non-special characters
Using a pre-allocated decode buffer for all string parses, only using an expanded, dynamically-allocated buffer when the string's size exceeds that of the shared buffer. This avoids costly dynamic allocs associated with doing my_str.append(char) in a hot loop.
Precalculating as much as possible to avoid function calls in the hot loop (the value of ord('\\'), etc.)

Given a script like

import llsd
with open("large_getPaths_resp.llsd", "rb") as f:
        llsd.parse(f.read())

the script now takes 0.7~s to complete rather than the 2.5s it previously took.

Much better improvements are also possible, but they would either change how invalid escapes like \p are handled, or would require native code.

Checking if we're currently within an escape sequence every single loop iteration is wasteful.

This avoids an unnecessary alloc in the common case of strings under 1024 chars.

SaladDais · 2022-09-28T13:42:56Z

Slightly more typical example, parsing a 5.5mb notation LLSD serialized inventory file went from 2.7s to 1.8s, mostly due to the map keys parsing more quickly.

SaladDais · 2022-09-28T15:39:42Z

Looks like there are some Py2.7 failures, I'll check in a local pipenv

nat-goodspeed · 2022-09-28T15:42:31Z

Tangential, but of interest to me: what if LEAP used binary LLSD format instead of notation format? Would that be faster, or a wash? (If it's not yet faster, could LLSD binary parsing be optimized further than notation parsing?)

Even without a viewer that emits that, you could test by converting your large_getPaths_resp.llsd or other test files to binary LLSD format.

SaladDais · 2022-09-28T16:12:52Z

what if LEAP used binary LLSD format instead of notation format? Would that be faster, or a wash?

Oh absolutely faster. A binary LLSD serialized version of the large_getPaths_resp.llsd takes about 0.1s to parse on my PC. I think in this specific case that's mostly due to the fact that the delimited string form isn't really used in binary LLSD even though it's technically allowed there.

Even if you continue using notation form, but just make the viewer's formatter use the s(len)"val" length-prefixed string form, python-llsd is able to parse that in 0.15s. Still, notation's not ideal for potentially large and frequent payloads due to all the complex branching on individual bytes required (parsing numbers is still complicated,) so binary LEAP could be very helpful perf-wise.

`bytes` on PY2 are really `str`s, so `val[n]` returns a character rather than an integer. Wrapping bytes values on PY2 helps us preserve semantics without requiring branching on Python version or version-specific lambdas to peek from `self._buffer`.

SaladDais · 2022-09-28T16:53:46Z

That should fix the PY2 issues, I forgot that bytes_val[n] returns something different there. I considered instead creating helper lambdas for peek()ing inside _parsed_delimited_string() based on what version you were on, but that code path is so hot that doing so added 300ms for a 1.2s parse time. The wrapper I added shouldn't have any effects other than a slightly useless initial copy under PY2.

llsd/base.py

nat-goodspeed · 2022-09-28T16:58:35Z

llsd/base.py

+                # Any additions may now overflow the decoding buffer, make
+                # a new expanded buffer containing the existing contents.
+                decode_buff = bytearray(decode_buff)
+                decode_buff.extend(b"\x00" * _DECODE_BUFF_ALLOC_SIZE)


Would it be faster still to do something like:

self._decode_buffs = [bytearray(_DECODE_BUFF_ALLOC_SIZE)] # ... buff_idx = 0 decode_buff = self._decode_buffs[buff_idx] # ... try: decode_buff[insert_idx] = cc except IndexError: buff_idx += 1 try: decode_buff = self._decode_buffs[buff_idx] except IndexError: decode_buff = bytearray(_DECODE_BUFF_ALLOC_SIZE) self._decode_buffs.append(decode_buff) insert_idx = 0 decode_buff[insert_idx] = cc insert_idx += 1 # ... try: return b''.join(self._decode_buffs[:-1] + [decode_buff[:insert_idx]]).decode('utf-8') except UnicodeDecodeError as exc: # ...

I guess it depends on how many oversized strings you expect in the input stream. The code you submitted leaves you with a single larger buffer, prepared to handle many oversized strings without further allocations. My suggestion above should expand faster the first time, but requires consolidating multiple bytearrays for each oversized string. Plus which it's admittedly more complex.

I still suggest the code you wrote would be improved by catching IndexError on decode_buff assignment, though, rather than testing insert_idx as shown. Also I suspect there's a bug in your test.

Say you're working on the second oversized string in the same input stream, so len(self._decode_buff) == 2*_DECODE_BUFF_ALLOC_SIZE. You just inserted the byte at (_DECODE_BUFF_ALLOC_SIZE - 1) and incremented insert_idx. Now, although insert_idx == len(decode_buff)/2, insert_idx % _DECODE_BUFF_ALLOC_SIZE in fact equals 0, so you extend decode_buff -- resulting in _DECODE_BUFF_ALLOC_SIZE bytes of garbage in the buffer.

The code you submitted leaves you with a single larger buffer, prepared to handle many oversized strings without further allocations.

Whoops, not true: your expanded decode_buff isn't stored in self._decode_buff, so subsequent oversized strings would in fact require a new expanded decode_buff.

Whoops, not true: your expanded decode_buff isn't stored in self._decode_buff, so subsequent oversized strings would in fact require a new expanded decode_buff.

Yep, my reasoning there was I sometimes use long-lived LLSD*Parser objects and I was concerned about keeping a large scratch buffer around just because I'd previously parsed a large string.

except IndexError:

That is a good point! I forgot that in Python it's much less expensive to handle the (potential) IndexError than check if we're about to overflow. The internal IndexError check is priced in and you don't really get to opt out anyway. Doing an opportunistic assignment and doing my buffer copy in the except IndexError shaves off about 100ms! I think this strategy ends up being easier to read as well (I didn't feel great about the %.)

The buffer copy itself isn't terribly expensive relative to juggling + concatenating multiple buffers though. Buffer concatenation was a tiny bit slower for payloads containing strings mostly under 1024 bytes.

llsd/base.py

Shaves 100ms off my testcase's runtime.

bennettgoble · 2022-10-27T22:10:14Z

Hello, @SaladDais. Thanks for this improvement! Could you pull the latest changes from main in order to pick up our newly added CLA behavior?

github-actions · 2022-10-27T22:27:51Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

SaladDais · 2022-10-27T22:30:38Z

I have read the CLA Document and I hereby sign the CLA

SaladDais · 2022-10-27T22:31:30Z

recheck

bennettgoble · 2022-10-28T22:42:54Z

@nat-goodspeed are there any unresolved threads here?

nat-goodspeed

The CLA was the only blocker. Good to go!

bennettgoble · 2022-11-17T15:33:45Z

This has shipped in v1.1.0. 🎊

SaladDais added 4 commits September 28, 2022 08:59

Speed up LLSD decoding

2f252df

Further notation parse speedup by avoiding unnecessary comparisons

6d1ce89

Checking if we're currently within an escape sequence every single loop iteration is wasteful.

Used a shared pre-sized string decode buffer

147e7ea

This avoids an unnecessary alloc in the common case of strings under 1024 chars.

Clarify naming and comments

3f91c7b

nat-goodspeed reviewed Sep 28, 2022

View reviewed changes

bennettgoble added the enhancement New feature or request label Sep 28, 2022

nat-goodspeed reviewed Sep 28, 2022

View reviewed changes

llsd/base.py Show resolved Hide resolved

SaladDais added 5 commits September 28, 2022 17:22

Give context for hex escape decode failures

6bb156a

Only realloc / resize the decode buffer if we actually overflow

7330f31

Shaves 100ms off my testcase's runtime.

Add explanatory comment for PY3SemanticBytes

5abdc32

Add testcases for hex escape parse failures

232902c

Speed up hex escape parsing by 30~%

1bb6812

Merge branch 'main' into speed_up_notation_parse

2325efb

nat-goodspeed approved these changes Nov 17, 2022

View reviewed changes

bennettgoble merged commit eea1b9a into secondlife:main Nov 17, 2022

github-actions bot locked and limited conversation to collaborators Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up parsing notation LLSD #1

Speed up parsing notation LLSD #1

SaladDais commented Sep 28, 2022 •

edited

Loading

SaladDais commented Sep 28, 2022

SaladDais commented Sep 28, 2022 •

edited

Loading

nat-goodspeed commented Sep 28, 2022

SaladDais commented Sep 28, 2022

SaladDais commented Sep 28, 2022 •

edited

Loading

nat-goodspeed Sep 28, 2022

nat-goodspeed Sep 28, 2022

nat-goodspeed Sep 28, 2022

SaladDais Sep 28, 2022

bennettgoble commented Oct 27, 2022 •

edited

Loading

github-actions bot commented Oct 27, 2022 •

edited

Loading

SaladDais commented Oct 27, 2022

SaladDais commented Oct 27, 2022

bennettgoble commented Oct 28, 2022

nat-goodspeed left a comment

bennettgoble commented Nov 17, 2022

Speed up parsing notation LLSD #1

Speed up parsing notation LLSD #1

Conversation

SaladDais commented Sep 28, 2022 • edited Loading

SaladDais commented Sep 28, 2022

SaladDais commented Sep 28, 2022 • edited Loading

nat-goodspeed commented Sep 28, 2022

SaladDais commented Sep 28, 2022

SaladDais commented Sep 28, 2022 • edited Loading

nat-goodspeed Sep 28, 2022

Choose a reason for hiding this comment

nat-goodspeed Sep 28, 2022

Choose a reason for hiding this comment

nat-goodspeed Sep 28, 2022

Choose a reason for hiding this comment

SaladDais Sep 28, 2022

Choose a reason for hiding this comment

bennettgoble commented Oct 27, 2022 • edited Loading

github-actions bot commented Oct 27, 2022 • edited Loading

SaladDais commented Oct 27, 2022

SaladDais commented Oct 27, 2022

bennettgoble commented Oct 28, 2022

nat-goodspeed left a comment

Choose a reason for hiding this comment

bennettgoble commented Nov 17, 2022

SaladDais commented Sep 28, 2022 •

edited

Loading

SaladDais commented Sep 28, 2022 •

edited

Loading

SaladDais commented Sep 28, 2022 •

edited

Loading

bennettgoble commented Oct 27, 2022 •

edited

Loading

github-actions bot commented Oct 27, 2022 •

edited

Loading