-
Notifications
You must be signed in to change notification settings - Fork 1
SL-19314: Recast llsd serialization to write to a stream. #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It turns out that _format_datestr() is always called in a context where we do want a value, rather than directly writing to a stream.
Replace format() method with a new write(stream, something) method that sets a new stream attribute on the formatter instance. Change (most) internal methods that used to return a bytes string to instead write to self.stream. This eliminates all serde_xml.py references to the B(...) hack. Introduce module-scope serde_xml.write_xml() and write_pretty_xml() functions to engage LLSDXMLFormatter.write() and LLSDXMLPrettyFormatter.write(), respectively. Since write_xml() and format_xml() both want to use module-scope _g_xml_formatter, extract its initialization to a new _get_xml_formatter() function. Enhance LLSDXMLFormatter._elt() method to accept a callable, and call it between the open and close element tags. This allows an easy migration path from _elt(b'name', self.method(something)) # method() returns a bytes string to _elt(b'name', lambda: self.method(something)) # method() writes to self.stream Removing LLSDXML[Pretty]Formatter.format() lets these classes inherit the new LLSDBaseFormatter.format() method, which passes an io.BytesIO instance to the subclass write() method and returns the contents of that BytesIO. Rename LLSDXMLPrettyFormatter.PRETTY_ARRAY and PRETTY_MAP to ARRAY and MAP, respectively, which lets LLSDBaseFormatter.__init__() find the right methods without having to patch self.type_map in the subclass constructor.
We want every subclass write() operation to set self.stream and clear it when done, so give that responsibility to the base-class write() method. write() calls subclass _write(), which then need only accept the 'something' to serialize. Update LLSDXMLFormatter and LLSDXMLPrettyFormatter accordingly.
Add serde_notation.write_notation(stream, something) module-scope method to invoke LLSDNotationFormatter().write(stream, something).
Apparent "failures" are due to codecov upload failure, not logic errors. |
Codecov Report
@@ Coverage Diff @@
## main #5 +/- ##
==========================================
+ Coverage 89.83% 89.90% +0.06%
==========================================
Files 6 6
Lines 797 822 +25
==========================================
+ Hits 716 739 +23
- Misses 81 83 +2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
That is, add serde_binary.write_binary(stream, something) and make format_binary() call it with an internal io.BytesIO instance. _format_binary_recurse(something) => _write_binary_recurse(stream, something) that writes serialized 'something' to the passed stream.
Please fix or disable codecov: the "failing" run does show that all tests pass. |
That is, publish the new write_binary(), write_notation(), write_xml() and write_pretty_xml() functions to package scope.
llsd/serde_xml.py
Outdated
global _g_xml_formatter | ||
if _g_xml_formatter is None: | ||
_g_xml_formatter = LLSDXMLFormatter() | ||
return _g_xml_formatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this class construction expensive enough to merit lazy creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not. LLSDBaseFormatter
does construct type_map
, requiring 19 attribute lookups, but that's not a lot of initialization. I simply extracted earlier logic from format_xml()
to make it available to write_xml()
as well. If you'd rather the functionality vanish entirely, I won't be distraught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I'd prefer to eat the one-time import-time cost and simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More importantly: it's profoundly unsafe now that self.stream is carrying state. Threads, coroutines, etc., all break with shared instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, monty, should've caught that myself.
llsd/serde_notation.py
Outdated
def write_notation(stream, something): | ||
""" | ||
Serialize to passed binary 'stream' a python object 'something' as | ||
application/llsd+notation. | ||
|
||
:param stream: a binary stream open for writing. | ||
:param something: a python object (typically a dict) to be serialized. | ||
|
||
See http://wiki.secondlife.com/wiki/LLSD#Notation_Serialization | ||
""" | ||
return LLSDNotationFormatter().write(stream, something) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did this code get duplicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why yes! git is good, but clearly not perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions, some must fixes...
llsd/serde_binary.py
Outdated
_write_binary_recurse(stream, something) | ||
|
||
|
||
def _write_binary_recurse(stream, something): | ||
"Binary formatter workhorse." | ||
def _format_list(something): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing this definition binds is 'stream' which is a reference and so is really just an argument. A single 'def _format_list(stream, something):' at the same level is better, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. Of course I inherited that nested definition from older code, but it's a change worth making.
return b'true' | ||
else: | ||
return b'false' | ||
self.stream.write(b'true' if v else b'false') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive Q since I'm probably missing a lot of context. These were publicish methods that someone might use to generate fragments of notation format for a value before. Did nobody actually use them for such a purpose because this would break every such use? Should these all be renamed "_" (they're modifying an internal object)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were publicish methods that someone might use to generate fragments of notation format for a value before. Did nobody actually use them for such a purpose because this would break every such use? Should these all be renamed "_" (they're modifying an internal object)?
I have always assumed that the historical names of these methods were chosen to emphasize the communication between LLSDBaseFormatter and its subclasses. I've never seen any llsd
consumer directly reference any LLSDMumbleFormatter
class, never mind call its type-specific methods.
That said, yes, those methods are intended solely for internal use by that class hierarchy. Perhaps they should be prefixed with an underscore. If we're going to break hypothetical consumers, better to break them overtly with AttributeError
than by quietly returning None
.
@@ -92,18 +101,17 @@ def DATE(self, v): | |||
def ARRAY(self, v): | |||
return self._elt( | |||
b'array', | |||
b''.join([self._generate(item) for item in v])) | |||
lambda: [self._generate(item) for item in v]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the previous class, these will always be returning None (return value of _elt()), yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I left the return
statements in place because they're not wrong, albeit slightly misleading now.
llsd/serde_xml.py
Outdated
global _g_xml_formatter | ||
if _g_xml_formatter is None: | ||
_g_xml_formatter = LLSDXMLFormatter() | ||
return _g_xml_formatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More importantly: it's profoundly unsafe now that self.stream is carrying state. Threads, coroutines, etc., all break with shared instances.
Remove serde_xml._g_xml_formatter and its getter function: now that LLSDBaseFormatter stores a stream, sharing a global LLSDXMLFormatter instance is Wrong. Every call to format_xml() and write_xml() now gets a distinct instance. Rename LLSDBaseFormatter type-specific methods with leading underscore: they were never intended for public consumption. Moreover, since they all now return None, any hypothetical external caller will break. Better to break with AttributeError than being silently, disastrously wrong. Extract nested _format_list() function from serde_binary._write_binary_recurse() to module-scope _write_list(), explicitly accepting 'stream' parameter instead of binding it from enclosing scope. Eliminate duplicate serde_notation.write_notation(), evidently a git merge glitch.
For large data, this is a huge win over constructing a contiguous bytes string in memory.