Skip to content

SL-19314: Recast llsd serialization to write to a stream. #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 20, 2023

Conversation

nat-goodspeed
Copy link
Contributor

For large data, this is a huge win over constructing a contiguous bytes string in memory.

It turns out that _format_datestr() is always called in a context where we do
want a value, rather than directly writing to a stream.
Replace format() method with a new write(stream, something) method that sets a
new stream attribute on the formatter instance. Change (most) internal methods
that used to return a bytes string to instead write to self.stream. This
eliminates all serde_xml.py references to the B(...) hack.

Introduce module-scope serde_xml.write_xml() and write_pretty_xml() functions
to engage LLSDXMLFormatter.write() and LLSDXMLPrettyFormatter.write(),
respectively. Since write_xml() and format_xml() both want to use module-scope
_g_xml_formatter, extract its initialization to a new _get_xml_formatter()
function.

Enhance LLSDXMLFormatter._elt() method to accept a callable, and call it
between the open and close element tags. This allows an easy migration path
from
_elt(b'name', self.method(something))  # method() returns a bytes string
to
_elt(b'name', lambda: self.method(something)) # method() writes to self.stream

Removing LLSDXML[Pretty]Formatter.format() lets these classes inherit the new
LLSDBaseFormatter.format() method, which passes an io.BytesIO instance to the
subclass write() method and returns the contents of that BytesIO.

Rename LLSDXMLPrettyFormatter.PRETTY_ARRAY and PRETTY_MAP to ARRAY and MAP,
respectively, which lets LLSDBaseFormatter.__init__() find the right methods
without having to patch self.type_map in the subclass constructor.
We want every subclass write() operation to set self.stream and clear it when
done, so give that responsibility to the base-class write() method. write()
calls subclass _write(), which then need only accept the 'something' to
serialize. Update LLSDXMLFormatter and LLSDXMLPrettyFormatter accordingly.
Add serde_notation.write_notation(stream, something) module-scope method to
invoke LLSDNotationFormatter().write(stream, something).
@nat-goodspeed
Copy link
Contributor Author

Apparent "failures" are due to codecov upload failure, not logic errors.

@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2023

Codecov Report

Merging #5 (2a19caf) into main (8f5faba) will increase coverage by 0.06%.
The diff coverage is 95.97%.

@@            Coverage Diff             @@
##             main       #5      +/-   ##
==========================================
+ Coverage   89.83%   89.90%   +0.06%     
==========================================
  Files           6        6              
  Lines         797      822      +25     
==========================================
+ Hits          716      739      +23     
- Misses         81       83       +2     
Impacted Files Coverage Δ
llsd/serde_binary.py 94.32% <93.33%> (+0.70%) ⬆️
llsd/base.py 87.40% <94.73%> (+0.68%) ⬆️
llsd/serde_xml.py 97.56% <96.42%> (-1.64%) ⬇️
llsd/serde_notation.py 89.69% <97.56%> (+0.10%) ⬆️
llsd/__init__.py 93.54% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

That is, add serde_binary.write_binary(stream, something) and make
format_binary() call it with an internal io.BytesIO instance.

_format_binary_recurse(something) => _write_binary_recurse(stream, something)
that writes serialized 'something' to the passed stream.
@nat-goodspeed nat-goodspeed marked this pull request as ready for review March 6, 2023 20:57
@nat-goodspeed
Copy link
Contributor Author

Please fix or disable codecov: the "failing" run does show that all tests pass.

That is, publish the new write_binary(), write_notation(), write_xml() and
write_pretty_xml() functions to package scope.
global _g_xml_formatter
if _g_xml_formatter is None:
_g_xml_formatter = LLSDXMLFormatter()
return _g_xml_formatter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class construction expensive enough to merit lazy creation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not. LLSDBaseFormatter does construct type_map, requiring 19 attribute lookups, but that's not a lot of initialization. I simply extracted earlier logic from format_xml() to make it available to write_xml() as well. If you'd rather the functionality vanish entirely, I won't be distraught.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I'd prefer to eat the one-time import-time cost and simplify.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly: it's profoundly unsafe now that self.stream is carrying state. Threads, coroutines, etc., all break with shared instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, monty, should've caught that myself.

@bennettgoble bennettgoble added the enhancement New feature or request label Mar 15, 2023
Comment on lines +446 to +458
def write_notation(stream, something):
"""
Serialize to passed binary 'stream' a python object 'something' as
application/llsd+notation.

:param stream: a binary stream open for writing.
:param something: a python object (typically a dict) to be serialized.

See http://wiki.secondlife.com/wiki/LLSD#Notation_Serialization
"""
return LLSDNotationFormatter().write(stream, something)


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this code get duplicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why yes! git is good, but clearly not perfect.

Copy link

@monty-linden monty-linden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, some must fixes...

_write_binary_recurse(stream, something)


def _write_binary_recurse(stream, something):
"Binary formatter workhorse."
def _format_list(something):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing this definition binds is 'stream' which is a reference and so is really just an argument. A single 'def _format_list(stream, something):' at the same level is better, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is. Of course I inherited that nested definition from older code, but it's a change worth making.

return b'true'
else:
return b'false'
self.stream.write(b'true' if v else b'false')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive Q since I'm probably missing a lot of context. These were publicish methods that someone might use to generate fragments of notation format for a value before. Did nobody actually use them for such a purpose because this would break every such use? Should these all be renamed "_" (they're modifying an internal object)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were publicish methods that someone might use to generate fragments of notation format for a value before. Did nobody actually use them for such a purpose because this would break every such use? Should these all be renamed "_" (they're modifying an internal object)?

I have always assumed that the historical names of these methods were chosen to emphasize the communication between LLSDBaseFormatter and its subclasses. I've never seen any llsd consumer directly reference any LLSDMumbleFormatter class, never mind call its type-specific methods.

That said, yes, those methods are intended solely for internal use by that class hierarchy. Perhaps they should be prefixed with an underscore. If we're going to break hypothetical consumers, better to break them overtly with AttributeError than by quietly returning None.

@@ -92,18 +101,17 @@ def DATE(self, v):
def ARRAY(self, v):
return self._elt(
b'array',
b''.join([self._generate(item) for item in v]))
lambda: [self._generate(item) for item in v])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the previous class, these will always be returning None (return value of _elt()), yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I left the return statements in place because they're not wrong, albeit slightly misleading now.

global _g_xml_formatter
if _g_xml_formatter is None:
_g_xml_formatter = LLSDXMLFormatter()
return _g_xml_formatter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly: it's profoundly unsafe now that self.stream is carrying state. Threads, coroutines, etc., all break with shared instances.

Remove serde_xml._g_xml_formatter and its getter function: now that
LLSDBaseFormatter stores a stream, sharing a global LLSDXMLFormatter instance
is Wrong. Every call to format_xml() and write_xml() now gets a distinct
instance.

Rename LLSDBaseFormatter type-specific methods with leading underscore: they
were never intended for public consumption. Moreover, since they all now
return None, any hypothetical external caller will break. Better to break with
AttributeError than being silently, disastrously wrong.

Extract nested _format_list() function from serde_binary._write_binary_recurse()
to module-scope _write_list(), explicitly accepting 'stream' parameter instead
of binding it from enclosing scope.

Eliminate duplicate serde_notation.write_notation(), evidently a git merge
glitch.
@nat-goodspeed nat-goodspeed merged commit 0adae95 into main Mar 20, 2023
@nat-goodspeed nat-goodspeed deleted the sl-19314 branch March 20, 2023 18:41
@github-actions github-actions bot locked and limited conversation to collaborators Mar 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants