SRV-439 - performance optimizations for string handling in xml formatting #15

roxanneskelly · 2023-06-28T18:01:22Z

A number of small perf optimizations:

use 'translate' to translate all xml characters at once instead of doing multiple string translation passes.
construct strings inline using writelines instead of doing it through a function, in order to save function call cost.
Also, includes SL-19700 - limit of 200 on depth for formatting and parsing (all types)

…ting A number of small perf optimizations: * use 'translate' to translate all xml characters at once instead of doing multiple string translation passes. * construct strings inline using writelines instead of doing it through a function, in order to save function call cost

codecov · 2023-06-28T20:51:53Z

Codecov Report

Merging #15 (6586308) into main (a63abbe) will decrease coverage by 0.27%.
The diff coverage is 95.78%.

@@            Coverage Diff             @@
##             main      #15      +/-   ##
==========================================
- Coverage   90.33%   90.07%   -0.27%     
==========================================
  Files           6        6              
  Lines         848      856       +8     
==========================================
+ Hits          766      771       +5     
- Misses         82       85       +3

Files Changed	Coverage Δ
llsd/serde_binary.py	`93.95% <88.88%> (-0.33%)`	⬇️
llsd/serde_notation.py	`91.27% <93.75%> (+0.15%)`	⬆️
llsd/base.py	`86.89% <94.73%> (-0.57%)`	⬇️
llsd/serde_xml.py	`97.27% <100.00%> (+0.32%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

roxanneskelly · 2023-07-10T17:05:17Z

.github/workflows/ci.yaml

@@ -13,21 +13,23 @@ jobs:
      matrix:
        python-version: ['2.7', '3.7', '3.8', '3.10']
    runs-on: [ubuntu-latest]
+    container:


Because setup-python no longer supports 2.7 (as of June 19th,) even with previous renditions of setup-python (the underlying support was taken away,) we need to run the python build in a container.

Just for testing?

roxanneskelly · 2023-07-10T17:05:54Z

.github/workflows/ci.yaml

      - name: Install python dependencies
        run: |
-          pip install wheel build tox
-          pip install .[dev]
+          apt-get update


The python container requires some jumping through hoops due to user account and git differences.

roxanneskelly · 2023-07-10T17:08:50Z

llsd/serde_xml.py

@@ -14,7 +13,21 @@
 INVALID_XML_RE = re.compile(r'[\x00-\x08\x0b\x0c\x0e-\x1f]')


+XML_ESC_TRANS = {}


We use 'translate' which in python 3 allows replacement of all of the exception characters in one go, instead of calling replace multiple times. This is a significant speedup.

roxanneskelly · 2023-07-10T17:10:28Z

llsd/serde_xml.py

+        "Construct a serializer."
+        # Call the super class constructor so that we have the type map
+        super(LLSDXMLFormatter, self).__init__()
+        self._indent_atom = b''


some of the 'pretty' formatting stuff is moved into the base formatter for code sharing. As _indent is a no-op, this is mostly non-impactful.

nat-goodspeed · 2023-07-20T15:20:32Z

.github/workflows/ci.yaml

@@ -13,21 +13,23 @@ jobs:
      matrix:
        python-version: ['2.7', '3.7', '3.8', '3.10']
    runs-on: [ubuntu-latest]
+    container:


Just for testing?

nat-goodspeed · 2023-07-20T15:39:44Z

llsd/serde_binary.py

@@ -110,16 +114,19 @@ def _parse_map(self):
            cc = self._getc()
        if cc != b'}':
            self._error("invalid map close token")
+        self._depth = self._depth - 1


I'm curious: what's the performance hit from covering this (and below) _depth - 1 statement with try / finally ?

again, vaguely remember an issue, but it's really not a problem. Fixing.

nat-goodspeed · 2023-07-20T15:43:08Z

llsd/base.py

    "Convert node to a python object."
-    return NODE_HANDLERS[node.tag](node)
+    if depth > MAX_PARSE_DEPTH:
+        raise LLSDParseError("Cannot serialize depth of more than %d" % MAX_FORMAT_DEPTH)


Wouldn't this be "parse" rather than "serialize," with MAX_PARSE_DEPTH rather than MAX_FORMAT_DEPTH?

And as it's serializing, it'd be MAX_FORMAT_DEPTH

llsd/serde_binary.py

nat-goodspeed · 2023-07-20T15:48:42Z

llsd/serde_binary.py

@@ -97,6 +100,7 @@ def _parse_map(self):
        count = 0
        cc = self._getc()
        key = b''
+        self._depth = self._depth + 1


We can hope that the Python compiler optimizes this to a single _depth lookup, but then Python isn't known for its optimizations. Maybe self._depth += 1, and so forth for other _depth adjustments?

Hmm, I vaguely remember running into an issue with this, but maybe it was something else. Will fix

nat-goodspeed · 2023-07-20T15:58:57Z

llsd/serde_xml.py

+        # as that results in another function call and is slightly less performant
+        if PY2:    # pragma: no cover
+            return self.stream.writelines([b'<string>', _str_to_bytes(xml_esc(v)), b'</string>', self._eol])
+        self.stream.writelines([b'<string>', v.translate(XML_ESC_TRANS).encode('utf-8'), b'</string>', self._eol])


I still think we could squeeze out even a touch more performance by moving the PY2 test to class-definition time instead of runtime:

if PY2: def _STRING(self, v): return self.stream.writelines([b'<string>', _str_to_bytes(xml_esc(v)), b'</string>', self._eol]) else: def _STRING(self, v): self.stream.writelines([b'<string>', v.translate(XML_ESC_TRANS).encode('utf-8'), b'</string>', self._eol])

I still think so, at least in _STRING() and _URI() where we restate the whole implementation anyway.

I stipulate that the runtime PY2 test in _MAP() may be more efficient than having _MAP() call a conditionally-defined helper function, and I wouldn't want you to restate the common parts of the _MAP() implementation.

llsd/serde_xml.py

tests/llsd_test.py

LogLinden · 2023-08-18T17:41:53Z

I just wanted to point out the coverage percent test is currently failing.

Sorry I was trying to cancel a comment and closed the whole PR!

nat-goodspeed

In general I like the changes you made. I still think module-scope _to_python() needs tweaking, and there remain a couple unaddressed suggestions from my previous review.

nat-goodspeed · 2023-09-07T18:26:58Z

llsd/base.py

@@ -317,7 +317,7 @@ def _array_to_python(node, depth=0):
 def _to_python(node, depth=0):
    "Convert node to a python object."
    if depth > MAX_PARSE_DEPTH:
-        raise LLSDParseError("Cannot serialize depth of more than %d" % MAX_FORMAT_DEPTH)
+        raise LLSDSerializationError("Cannot serialize depth of more than %d" % MAX_FORMAT_DEPTH)


Isn't _to_python() used by parsers? The docstring seems to suggest that we're deserializing rather than serializing. Even if I'm wrong about that, though, lines 319 and 320 are still inconsistent.

llsd/serde_binary.py

nat-goodspeed · 2023-09-07T18:41:03Z

llsd/serde_xml.py

+        # as that results in another function call and is slightly less performant
+        if PY2:    # pragma: no cover
+            return self.stream.writelines([b'<string>', _str_to_bytes(xml_esc(v)), b'</string>', self._eol])
+        self.stream.writelines([b'<string>', v.translate(XML_ESC_TRANS).encode('utf-8'), b'</string>', self._eol])


I still think so, at least in _STRING() and _URI() where we restate the whole implementation anyway.

I stipulate that the runtime PY2 test in _MAP() may be more efficient than having _MAP() call a conditionally-defined helper function, and I wouldn't want you to restate the common parts of the _MAP() implementation.

roxanneskelly · 2023-09-07T20:02:46Z

Performance optimizations are merged.

roxanneskelly added 30 commits June 28, 2023 17:56

Build in python containers, so we can get 2.7

7f1b5ae

fix syntax error

e44367b

try a different form for containers

437db6b

test

c6ca424

Spacing change

18dfa77

Another fix

2059a06

Sudo pip installs (in container)

a7a9ca3

upgrade pip

58ee348

Update scm version settings

a844d05

Try alpine container

5c6f302

Don't upgrade pip

30808b4

Don't use alpine

21b1945

Show tags

f0e0082

what user are we running under?

67712a2

Try sudo

4c2cfaf

Do we have a .git

ab0c5f6

Possibly don't need action to install python

0f8cb0e

show tags

466bd57

mark directory as safe

95671ab

try direct safe directory

e201ac4

Make all directories safe

b573bf8

more diagnostics

fc959dc

more diagnostics

f288dc6

show current user

d01b6ad

Try buster

308045d

try venv

8bf587e

be more assertive

b8745bb

Try again

91431ae

venv didn't work

5be2d14

roxanneskelly added 2 commits June 28, 2023 20:29

more diagnostics

d3c22a3

fixup setup

c6e1e8c

roxanneskelly added 3 commits June 28, 2023 20:54

cleanup

1994a95

Remove iterative xml parsing

00a4c7d

Improve test code coverage

95584f6

roxanneskelly added the enhancement New feature or request label Jun 29, 2023

roxanneskelly requested a review from nat-goodspeed July 5, 2023 21:17

roxanneskelly added 2 commits July 7, 2023 17:55

SL-19707 throw an error if we exceed 200 depth in formatting or parsing

e00171d

SL-19707 - maximum parse depth is now 200

78601c0

roxanneskelly changed the title ~~SRV-439 - performance optimizations for string handling in xml format…~~ SRV-439 - performance optimizations for string handling in xml formatting Jul 7, 2023

roxanneskelly commented Jul 10, 2023

View reviewed changes

roxanneskelly requested review from bennettgoble and LogLinden July 10, 2023 17:12

nat-goodspeed suggested changes Jul 20, 2023

View reviewed changes

CR changes

6586308

LogLinden closed this Aug 18, 2023

LogLinden reopened this Aug 18, 2023

github-actions bot locked and limited conversation to collaborators Aug 18, 2023

nat-goodspeed suggested changes Sep 7, 2023

View reviewed changes

roxanneskelly added 2 commits September 7, 2023 19:13

CR fixes

c5e41c8

CR fixes

2432466

nat-goodspeed approved these changes Sep 7, 2023

View reviewed changes

roxanneskelly merged commit b703873 into main Sep 7, 2023

roxanneskelly deleted the SRV-439 branch September 7, 2023 20:13

		@@ -14,7 +13,21 @@
		INVALID_XML_RE = re.compile(r'[\x00-\x08\x0b\x0c\x0e-\x1f]')


		XML_ESC_TRANS = {}

SRV-439 - performance optimizations for string handling in xml formatting #15

SRV-439 - performance optimizations for string handling in xml formatting #15

Uh oh!

Conversation

roxanneskelly commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LogLinden commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nat-goodspeed left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roxanneskelly commented Sep 7, 2023

Uh oh!

Uh oh!

roxanneskelly commented Jun 28, 2023 •

edited

Loading

codecov bot commented Jun 28, 2023 •

edited

Loading

LogLinden commented Aug 18, 2023 •

edited

Loading