Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Speed up parsing notation LLSD #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up parsing notation LLSD #1
Changes from 4 commits
2f252df
6d1ce89
147e7ea
3f91c7b
2764149
6bb156a
7330f31
5abdc32
232902c
1bb6812
2325efb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be faster still to do something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on how many oversized strings you expect in the input stream. The code you submitted leaves you with a single larger buffer, prepared to handle many oversized strings without further allocations. My suggestion above should expand faster the first time, but requires consolidating multiple
bytearray
s for each oversized string. Plus which it's admittedly more complex.I still suggest the code you wrote would be improved by catching
IndexError
ondecode_buff
assignment, though, rather than testinginsert_idx
as shown. Also I suspect there's a bug in your test.Say you're working on the second oversized string in the same input stream, so
len(self._decode_buff) == 2*_DECODE_BUFF_ALLOC_SIZE
. You just inserted the byte at(_DECODE_BUFF_ALLOC_SIZE - 1)
and incrementedinsert_idx
. Now, althoughinsert_idx == len(decode_buff)/2
,insert_idx % _DECODE_BUFF_ALLOC_SIZE
in fact equals 0, so you extenddecode_buff
-- resulting in_DECODE_BUFF_ALLOC_SIZE
bytes of garbage in the buffer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, not true: your expanded
decode_buff
isn't stored inself._decode_buff
, so subsequent oversized strings would in fact require a new expandeddecode_buff
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, my reasoning there was I sometimes use long-lived
LLSD*Parser
objects and I was concerned about keeping a large scratch buffer around just because I'd previously parsed a large string.That is a good point! I forgot that in Python it's much less expensive to handle the (potential)
IndexError
than check if we're about to overflow. The internalIndexError
check is priced in and you don't really get to opt out anyway. Doing an opportunistic assignment and doing my buffer copy in theexcept IndexError
shaves off about 100ms! I think this strategy ends up being easier to read as well (I didn't feel great about the%
.)The buffer copy itself isn't terribly expensive relative to juggling + concatenating multiple buffers though. Buffer concatenation was a tiny bit slower for payloads containing strings mostly under 1024 bytes.