Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-36673: Implement comment/PI parsing support for the TreeBuilder in ElementTree. #12883

Merged
merged 3 commits into from
May 1, 2019

Conversation

scoder
Copy link
Contributor

@scoder scoder commented Apr 20, 2019

@scoder
Copy link
Contributor Author

scoder commented Apr 20, 2019

I thought about this some more and also took another look at the respective interface in lxml. I originally thought it would be nice to have something simple and fast for the event iteration (i.e. the plain comment text for comments and a (target, data) pair for PIs), but that doesn't fit the (event, element) interface of the pull parser. I'll change this again to return Comment and ProcessingInstruction instances instead. That's also what lxml does, and I think it's the right thing to do by default.

@@ -1172,6 +1202,10 @@ XMLPullParser Objects
parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename elem to data or payload or whatever (since it is not always an element) and document its meaning uniformly for all events (maybe using an unordered list)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's not ideal, but with the change that I'm currently working up, it's going to be an Element-like object (Element, Comment, PI) most of the time. For many use cases, it's really only an only Element, especially in the default end-only-events case. Also, elem, i.e. element, is a fairly generic term, even though its meaning is tighter in the context of ElementTree.

I've also thought of using a more structured way of presenting it, and I think I'll do that.

…r in "_elementtree" to make it use the same factories as the ElementTree module, and to make it explicit when the comments/PIs are inserted into the tree and when they are not (which is the default).
@scoder scoder force-pushed the bpo-36673_etree_comments branch from c17eca6 to 2d2df11 Compare April 20, 2019 20:55
@scoder
Copy link
Contributor Author

scoder commented Apr 20, 2019

Turns out, it's not that easy to get the Comment/PI factories into _elementtree. It doesn't have its own classes, and the double-module tests are written in a way that makes it tricky to have the same factory for both. When the Python module gets reloaded, the functions are no longer the same.

I solved this by adding a "private" _set_factories() function to the module and make ElementTree (and the tests) call it explicitly with the right factories, and by making it explicit in the TreeBuilder which factory it uses and if the created Comment and PI objects get added to the tree (which they currently don't, since ET has always discarded them on the way in).

@scoder scoder merged commit 43851a2 into python:master May 1, 2019
@bedevere-bot
Copy link

@scoder: Please replace # with GH- in the commit message next time. Thanks!

@@ -2925,7 +3147,7 @@ expat_set_error(enum XML_Error error_code, Py_ssize_t line, Py_ssize_t column,
if (errmsg == NULL)
return;

error = PyObject_CallFunctionObjArgs(st->parseerror_obj, errmsg, NULL);
error = _PyObject_FastCall(st->parseerror_obj, &errmsg, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm not sure if these _PyObject_FastCall() additions in _elementtree.c should have been made as part of this PR -- they are quite unrelated to the BPO issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted. I made them for the new code I wrote, so I consider it a bit of a consistency cleanup to use them in all similar places. Note that I also tried to apply the general implementation pattern to all the handler functions that either dispatches to the treebuilder or to a Python function. Given how repetitive this code becomes in C, I think avoiding surprises in the implementation is important to simplify the future maintenance.

@scoder scoder deleted the bpo-36673_etree_comments branch May 10, 2019 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants