-
Notifications
You must be signed in to change notification settings - Fork 1
Speed up parsing notation LLSD #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up parsing notation LLSD #1
Conversation
Checking if we're currently within an escape sequence every single loop iteration is wasteful.
This avoids an unnecessary alloc in the common case of strings under 1024 chars.
Slightly more typical example, parsing a 5.5mb notation LLSD serialized inventory file went from 2.7s to 1.8s, mostly due to the map keys parsing more quickly. |
Looks like there are some Py2.7 failures, I'll check in a local pipenv |
Tangential, but of interest to me: what if LEAP used binary LLSD format instead of notation format? Would that be faster, or a wash? (If it's not yet faster, could LLSD binary parsing be optimized further than notation parsing?) Even without a viewer that emits that, you could test by converting your |
Oh absolutely faster. A binary LLSD serialized version of the Even if you continue using notation form, but just make the viewer's formatter use the |
`bytes` on PY2 are really `str`s, so `val[n]` returns a character rather than an integer. Wrapping bytes values on PY2 helps us preserve semantics without requiring branching on Python version or version-specific lambdas to peek from `self._buffer`.
That should fix the PY2 issues, I forgot that |
# Any additions may now overflow the decoding buffer, make | ||
# a new expanded buffer containing the existing contents. | ||
decode_buff = bytearray(decode_buff) | ||
decode_buff.extend(b"\x00" * _DECODE_BUFF_ALLOC_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be faster still to do something like:
self._decode_buffs = [bytearray(_DECODE_BUFF_ALLOC_SIZE)]
# ...
buff_idx = 0
decode_buff = self._decode_buffs[buff_idx]
# ...
try:
decode_buff[insert_idx] = cc
except IndexError:
buff_idx += 1
try:
decode_buff = self._decode_buffs[buff_idx]
except IndexError:
decode_buff = bytearray(_DECODE_BUFF_ALLOC_SIZE)
self._decode_buffs.append(decode_buff)
insert_idx = 0
decode_buff[insert_idx] = cc
insert_idx += 1
# ...
try:
return b''.join(self._decode_buffs[:-1] + [decode_buff[:insert_idx]]).decode('utf-8')
except UnicodeDecodeError as exc:
# ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on how many oversized strings you expect in the input stream. The code you submitted leaves you with a single larger buffer, prepared to handle many oversized strings without further allocations. My suggestion above should expand faster the first time, but requires consolidating multiple bytearray
s for each oversized string. Plus which it's admittedly more complex.
I still suggest the code you wrote would be improved by catching IndexError
on decode_buff
assignment, though, rather than testing insert_idx
as shown. Also I suspect there's a bug in your test.
Say you're working on the second oversized string in the same input stream, so len(self._decode_buff) == 2*_DECODE_BUFF_ALLOC_SIZE
. You just inserted the byte at (_DECODE_BUFF_ALLOC_SIZE - 1)
and incremented insert_idx
. Now, although insert_idx == len(decode_buff)/2
, insert_idx % _DECODE_BUFF_ALLOC_SIZE
in fact equals 0, so you extend decode_buff
-- resulting in _DECODE_BUFF_ALLOC_SIZE
bytes of garbage in the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code you submitted leaves you with a single larger buffer, prepared to handle many oversized strings without further allocations.
Whoops, not true: your expanded decode_buff
isn't stored in self._decode_buff
, so subsequent oversized strings would in fact require a new expanded decode_buff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, not true: your expanded decode_buff isn't stored in self._decode_buff, so subsequent oversized strings would in fact require a new expanded decode_buff.
Yep, my reasoning there was I sometimes use long-lived LLSD*Parser
objects and I was concerned about keeping a large scratch buffer around just because I'd previously parsed a large string.
except IndexError:
That is a good point! I forgot that in Python it's much less expensive to handle the (potential) IndexError
than check if we're about to overflow. The internal IndexError
check is priced in and you don't really get to opt out anyway. Doing an opportunistic assignment and doing my buffer copy in the except IndexError
shaves off about 100ms! I think this strategy ends up being easier to read as well (I didn't feel great about the %
.)
The buffer copy itself isn't terribly expensive relative to juggling + concatenating multiple buffers though. Buffer concatenation was a tiny bit slower for payloads containing strings mostly under 1024 bytes.
Shaves 100ms off my testcase's runtime.
Hello, @SaladDais. Thanks for this improvement! Could you pull the latest changes from |
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
recheck |
@nat-goodspeed are there any unresolved threads here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLA was the only blocker. Good to go!
This has shipped in v1.1.0. 🎊 |
Thanks for splitting out this code!
As part of working with LEAP API, I need to parse very large notation LLSD payloads containing a large number of strings. The parsing of those payloads is currently heavily bottle-necked by the code handling delimited strings.
For example, invoking the
LLWindowListener::getPaths()
method over LEAP returns a 5.5mb blob of LLSD (assuming logged in with a relatively bare inventory,) and that payload takes 2.5s to parse withllsd.parse()
. Since latency is important when dealing with LEAP, I looked at howujson
implemented their delimited string handling to see howllsd
's string handling logic could be improved.I ended up:
self._getc()
and reading fromself._buffer
directly, this was the most significant improvement. Since we need to read a byte at a time, getting single-byte byte strings is inefficient. The function not using a potentially overriddenself._getc()
shouldn't be an issue, sinceLLSDBinaryParser
has done manual manipulation ofself._index
for a long time now. Basically,self._getc()
must operate onself._buffer
andself._index
for the parsers to work correctly, so not using the abstraction in this particular case shouldn't break any existing usesself
references inside the loop to avoid the attribute lookup costmy_str.append(char)
in a hot loop.ord('\\')
, etc.)Given a script like
the script now takes 0.7~s to complete rather than the 2.5s it previously took.
Much better improvements are also possible, but they would either change how invalid escapes like
\p
are handled, or would require native code.