-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Implement iteration cache, item cache and length cache for node list iteration #11330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach makes sense, I'm just confused by part of it where I've left some comments
ext/dom/nodelist.c
Outdated
nodep = dom_object_get_node(objmap->cached_obj); | ||
/* The node cannot be NULL if the cache is valid. If it is NULL, then it means we | ||
* forgot an invalidation somewhere. Take the defensive programming approach and invalidate | ||
* it here if it's NULL (except in debug mode where we would want to catch this). */ | ||
if (UNEXPECTED(nodep == NULL)) { | ||
#if ZEND_DEBUG | ||
ZEND_UNREACHABLE(); | ||
#endif | ||
reset_objmap_cache(objmap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I've ever seen us doing something like this instead of just asserting. But this can make sense, @iluuu1994 what is your opinion of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of ZEND_ASSERT() is surprising and unsafe. Doing ZEND_ASSERT(nodep) here in non-debug mode would mark the whole branch as unreachable and the compiler would optimise away the reset_objmap_cache() call. I confirmed this using gcc -S -O2. So it seems reasonable to me to avoid calling it.
A runtime check in non-debug mode also seems warranted since I have found a way to hit it, and maybe there are more ways that I have missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, this runtime check is pretty cheap too. And in debug mode the assertion is valuable to know when it goes wrong, but we don't want to crash in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw you can write [skip ci]
to not run CI on just comment changes in the future. :)
It's possible to break it with simplexml_import_dom(): $doc = new DOMDocument;
$doc->loadXML( '<root><e i="1"/><e i="2"/><e i="3"/><e i="4"/><e i="5"/><e i="6"/><e i="7"/><e i="8"/><e i="9"/><e i="10"/></root>' );
$list = $doc->getElementsByTagName('e');
print $list->item(5)->getAttribute('i')."\n";
$s = simplexml_import_dom($doc->documentElement);
unset($s->e[5]);
print $list->item(5)->getAttribute('i')."\n"; This hits the reset_objmap_cache() call, and DOMDocument::xinclude also seems suspicious, calling php_dom_remove_xinclude_nodes -> php_dom_free_xinclude_node(). If there is any userspace code execution between xmlUnlinkNode() and php_dom_invalidate_node_list_cache(), such as destructors, then that could also cause problems. |
dom_node_node_value_write() was missed. It calls php_libxml_node_free_list() which calls xmlUnlinkNode(). I haven't been able to get it to crash though, it just gives outdated return values. Test case: $doc = new DOMDocument;
$doc->loadXML( '<root><a></a></root>');
$list = $doc->getElementsByTagName('a');
print $list->item(0)->tagName . "\n";
$doc->documentElement->nodeValue = '';
print $list->item(0)->tagName . "\n";
print $doc->saveXML(); |
Something here has broken with my real-world test case. Will try to isolate. |
The problem is that when using DOMNodeList::item() over the whole document with consecutive indexes, it can only return children and siblings of the cached node, it can't go up the stack, looking at siblings of parents and other ancestors. I don't think it's necessary to actually store a stack, there's only one way tree traversal can proceed from the current node. Reduced test case: $doc = new DOMDocument;
$doc->loadXML('<root><a><b i="1"/></a><b i="2"/></root>');
$list = $doc->getElementsByTagName('b');
$n = $list->length;
for ($i = 0; $i < $n; $i++) {
print $list->item($i)->getAttribute('i') . "\n";
} Should find two nodes, but fails at the second node, giving "Call to a member function getAttribute() on null". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can work on this again tomorrow. Thanks Tim for checking this! The issues you found seem reasonably simple to fix. I also found a subtle issue: I missed invalidating when writing to the prefix field too. I'll do a thorough double check of all field writes to check if any invalidations are missed.
I can already reply concretely to one thing that's brought up:
If there is any userspace code execution between xmlUnlinkNode() and php_dom_invalidate_node_list_cache(), such as destructors, then that could also cause problems.
This shouldn't be a problem AFAIK. You can't really add custom objects to the DOM tree. The destructor will be called when the element is added to the tree, not when the element is removed. See https://3v4l.org/FsNuQ for example.
EDIT: huh, GH placed the comment about the assertion both here as well as a reply, oops 😅
ext/dom/nodelist.c
Outdated
nodep = dom_object_get_node(objmap->cached_obj); | ||
/* The node cannot be NULL if the cache is valid. If it is NULL, then it means we | ||
* forgot an invalidation somewhere. Take the defensive programming approach and invalidate | ||
* it here if it's NULL (except in debug mode where we would want to catch this). */ | ||
if (UNEXPECTED(nodep == NULL)) { | ||
#if ZEND_DEBUG | ||
ZEND_UNREACHABLE(); | ||
#endif | ||
reset_objmap_cache(objmap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, this runtime check is pretty cheap too. And in debug mode the assertion is valuable to know when it goes wrong, but we don't want to crash in production.
As a side note, it might also make sense to let DOMNodeList::item and DOMNamedNodeMap::item use fast zpp, since they're likely to be called in a hot loop (but haven't measured anything yet). I suspect the binary code increase will not be a lot because it's only one parameter that needs parsing (again, not measured anything yet). |
I just want to mention, the existing recursion in dom_get_elements_by_tag_name_ns_raw() is apparently unnecessary. The next node to consider for filtering is always:
Googling for preorder traversal, I am finding a lot of pages that claim you need a stack, but the stack in those algorithms is just being used to compensate for the lack of parent and sibling pointers.
Everything should use fast ZPP, I'm not aware of any downsides. Just as a matter of style, it would be easier to read the code if there was only one way to pass parameters. But it seems like an issue for a separate PR. |
You can overwrite the objects DOM creates for its nodes by calling: https://www.php.net/manual/en/domdocument.registernodeclass I'm using this obscure feature (which static analysers get utterly confused by) in my docbook render project: https://gitlab.com/Girgias/docbook-renderer/-/blob/master/src/DOMRenderingDocument.php So, maybe that actually needs to be tested if such a class sets a destructor? |
…iteration The current implementation follows the spec requirement that the list must be "live". This means that changes in the document must be reflected in the existing node lists without requiring the user to refetch the node list. The consequence is that getting any item, or the length of the list, always starts searching from the root element of the node list. This results in O(n) time to get any item or the length. If there's a for loop over the node list, this means the iterations will take O(n²) time in total. This causes real-world performance issues with potential for downtime (see phpGH-11308 and its references for details). We fix this by introducing a caching strategy. We cache the last iterated object in the iterator, the last requested item in the node list, and the last length computation. To invalidate the cache, we simply count the number of modifications made to the containing document. If the modification number does not match what the number was during caching, we know the document has been modified and the cache is invalid. If this ever overflows, we saturate the modification number and don't do any caching anymore. Note that we don't check for overflow on 64-bit systems because it would take hundreds of years to overflow. Fixes phpGH-11308.
Test was Co-authored-by: tstarling
I rebased, and fixed the reported issues and added more tests. I also double-checked the field writes and I think I got them all now. It also turns out that the prefix write does not need to invalidate because the namespace is kept. I still have to do some additional tests with trying to remove namespace declarations etc. I did not test Tim's real-world test case of #11330 (comment) yet, I plan to do that though.
Yes, but this still cannot cause a problem I think. Checkout https://3v4l.org/TA3MV. Here we see the strange behaviour of destructors when using custom elements. The destructor only gets executed when the PHP wrapper object is destroyed. For example, the first time you see |
I ran it, and it works, and I find that it reduces the benchmark time from 8.6s to 0.7s (an average of three runs). If you want to run it, the full procedure is something like:
The script should produce the result hash c94d6ea84fa23572e2abc66879384cb5. Incidentally, our workaround which replaced getElementsByTagName() with XPath gives a benchmark time of 4.4s, so your fix is around 6 times faster than the userspace workaround, and 12 times faster than the original code. |
ext/dom/php_dom.c
Outdated
/* Go upwards, until we find a parent node with a next sibling, or until we hit the base. */ | ||
do { | ||
nodep = nodep->parent; | ||
ZEND_ASSERT(nodep != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine I guess. It just seems like a lot of faith to have in your code and the code of future developers. If I simulate a forgotten cache invalidation, say by commenting out the one in DOMNode::removeChild(), two of the tests crash here. Something like the following suggestion would be safer, would following existing conventions in the file, and would add minimal overhead.
ZEND_ASSERT(nodep != NULL); | |
if (UNEXPECTED(!nodep)) { | |
zend_throw_error(NULL, "Current node is not in a document"); | |
return NULL; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have the assertion as this implies a bug within the implementation and having something crash is a very good indicator that we need to fix this.
Having a random Error been thrown to users is not really helpful IMHO, especially as I bet most people will not think to report an issue to the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three options I see right now:
- Assert. This will cause a crash in production which is bad, but it can result in a bug report.
- Just return NULL. This will not result in a crash but causes in wrong results, and might not result in a bug report.
- Throw an error and return NULL. The error might confuse users and might not result in a bug report.
There seems to be no good option here tbh. I like to avoid a crash in production too though... We can write something in the exception like "please report the bug at ...", but this has never been done before in php afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did have such a bug report error for a corrupted timezone database, so it wouldn't be unheard of. So feel free to do something like this.
Thanks for testing. |
I also checked and adding fast ZPP won't do much because parameter parsing is only ~1.5% of the profile. Might still be worthwhile to add it to some of the methods imo.
I forgot to reply to this. AFAIK this isn't done for all functions but only for functions where the overhead is large because of code size increase and pressure on the instruction cache. But I'm not too sure if this is the whole truth. |
No? That has never been the case, nor the intention of fast ZPP. It is used more in PHP 8 as it's the only sensible way to implement union types in ZPP, but it remains the case that non-critical functions should use the traditional ZPP mechanism to reduce code size. So as @nielsdos the main reason for usage of fast ZPP in PHP 7 was to reduce the overhead of calling an external function, the main cases now in PHP 8 to use ZPP other than that is for ZPP supported common union types. |
Sorry, I just meant, that's how I think it should be. Get rid of zend_parse_parameters(), speed up PHP for all use cases. You can't tell in advance which functions will be heavily called, and the long tail of function calls probably accounts for a lot of overhead, when added together. I don't think source code size should take precedence over performance and readability. In 2016-2017, Sara Goleman migrated a bunch of extensions to fast ZPP (standard, bcmath, ctype, date, posix, pdo, json, libxml, tokenizer) with no apparent filter for call frequency. I think we should finish the job. If you mean binary size, well, I have some thoughts about that too, which I'd be happy to share in some more appropriate place if you're interested. |
I don't really agree with this, considering ZPP is reimplementing the same thing over and over and over again in each function as it is a macro. Yes it is more readable, but I don't really think But in any case this is a discussion either on new issue or an internals email thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me, just some minor comments/questions and need to figure out the bit where we should either Assert or throw an Error
local = xmlCharStrndup(name, name_len); | ||
nsuri = xmlCharStrndup(uri ? uri : "", uri_len); | ||
dom_namednode_iter(intern, 0, namednode, NULL, local, nsuri); | ||
dom_namednode_iter(intern, 0, namednode, NULL, name, name_len, uri ? uri : "", uri_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't uri_len potentially undefined here? Or does ZPP set the value to 0 even if the uri pointer is set to NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uri_len will be 0 if uri is not provided. Check out zend_parse_arg_string()
: If the string is NULL then the length gets set to 0.
local = xmlCharStrndup(name, name_len); | ||
nsuri = xmlCharStrndup(uri ? uri : "", uri_len); | ||
dom_namednode_iter(intern, 0, namednode, NULL, local, nsuri); | ||
dom_namednode_iter(intern, 0, namednode, NULL, name, name_len, uri ? uri : "", uri_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto uri_len comment
OK, so one practical thing I can say about this is that the overhead is proportional the number of arguments, and is small when arguments have simple types. I tried this patch: diff --git a/ext/dom/nodelist.c b/ext/dom/nodelist.c
index 20e3b18bee..55073b2550 100644
--- a/ext/dom/nodelist.c
+++ b/ext/dom/nodelist.c
@@ -154,9 +154,9 @@ PHP_METHOD(DOMNodeList, item)
int count = 0;
id = ZEND_THIS;
- if (zend_parse_parameters(ZEND_NUM_ARGS(), "l", &index) == FAILURE) {
- RETURN_THROWS();
- }
+ ZEND_PARSE_PARAMETERS_START(1, 1)
+ Z_PARAM_LONG(index)
+ ZEND_PARSE_PARAMETERS_END();
if (index >= 0) {
intern = Z_DOMOBJ_P(id);
Prior to this patch, this function compiles to 867 bytes. After the patch, it is 884 bytes (an increase of 17 bytes), plus a cold section of 31 bytes, which is placed in a separate segment and won't affect instruction cache pressure. 17 bytes may be an overestimate of the effect of the patch, because unexpected branches are placed at the end of the function, after the ret. Note that code can only have an impact on instruction cache pressure when it is actually executed. That's the paradox of inlining hot code. The hot functions are the ones with the worst impact on the instruction cache, because they are executed frequently and will evict other things every time they are run. I have the 2017 version of the Intel Optimization Reference Manual handy, and it has some hints on this kind of problem:
Not easily applicable to us, but you get the idea that the critical thing to worry about is the size of the body of a hot loop. |
Changed the code to throw an exception if the NULL pointer is encountered. RE Tim's ZPP comment: |
FWIW, I get an increase of more than 17 bytes, I get 96 bytes increase. Still looks okay though. |
Great to see it merged, thanks for this!
I ran it again and I still get 17 bytes. I'm measuring it like this:
In debug mode I get a difference of 599 bytes, so it's definitely sensitive to compiler optimization level. |
I basically did the same thing, but on the |
The current implementation follows the spec requirement that the list must be "live". This means that changes in the document must be reflected in the existing node lists without requiring the user to refetch the node list.
The consequence is that getting any item, or the length of the list, always starts searching from the root element of the node list. This results in O(n) time to get any item or the length. If there's a for loop over the node list, this means the iterations will take O(n²) time in total. This causes real-world performance issues with potential for downtime (see GH-11308 and its references for details).
We fix this by introducing a caching strategy. We cache the last iterated object in the iterator, the last requested item in the node list, and the last length computation. To invalidate the cache, we simply count the number of modifications made to the containing document. If the modification number does not match what the number was during caching, we know the document has been modified and the cache is invalid. If this ever overflows, we saturate the modification number and don't do any caching anymore. Note that we don't check for overflow on 64-bit systems because it would take hundreds of years to overflow.
Fixes GH-11308.
Benchmarks based on Tim's sample: