Skip to content

Implement iteration cache, item cache and length cache for node list iteration #11330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 2, 2023

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented May 27, 2023

The current implementation follows the spec requirement that the list must be "live". This means that changes in the document must be reflected in the existing node lists without requiring the user to refetch the node list.
The consequence is that getting any item, or the length of the list, always starts searching from the root element of the node list. This results in O(n) time to get any item or the length. If there's a for loop over the node list, this means the iterations will take O(n²) time in total. This causes real-world performance issues with potential for downtime (see GH-11308 and its references for details).

We fix this by introducing a caching strategy. We cache the last iterated object in the iterator, the last requested item in the node list, and the last length computation. To invalidate the cache, we simply count the number of modifications made to the containing document. If the modification number does not match what the number was during caching, we know the document has been modified and the cache is invalid. If this ever overflows, we saturate the modification number and don't do any caching anymore. Note that we don't check for overflow on 64-bit systems because it would take hundreds of years to overflow.

Fixes GH-11308.

Benchmarks based on Tim's sample:

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach makes sense, I'm just confused by part of it where I've left some comments

Comment on lines 187 to 199
nodep = dom_object_get_node(objmap->cached_obj);
/* The node cannot be NULL if the cache is valid. If it is NULL, then it means we
* forgot an invalidation somewhere. Take the defensive programming approach and invalidate
* it here if it's NULL (except in debug mode where we would want to catch this). */
if (UNEXPECTED(nodep == NULL)) {
#if ZEND_DEBUG
ZEND_UNREACHABLE();
#endif
reset_objmap_cache(objmap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've ever seen us doing something like this instead of just asserting. But this can make sense, @iluuu1994 what is your opinion of doing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of ZEND_ASSERT() is surprising and unsafe. Doing ZEND_ASSERT(nodep) here in non-debug mode would mark the whole branch as unreachable and the compiler would optimise away the reset_objmap_cache() call. I confirmed this using gcc -S -O2. So it seems reasonable to me to avoid calling it.

A runtime check in non-debug mode also seems warranted since I have found a way to hit it, and maybe there are more ways that I have missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, this runtime check is pretty cheap too. And in debug mode the assertion is valuable to know when it goes wrong, but we don't want to crash in production.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw you can write [skip ci] to not run CI on just comment changes in the future. :)

@tstarling
Copy link
Contributor

It's possible to break it with simplexml_import_dom():

$doc = new DOMDocument;
$doc->loadXML( '<root><e i="1"/><e i="2"/><e i="3"/><e i="4"/><e i="5"/><e i="6"/><e i="7"/><e i="8"/><e i="9"/><e i="10"/></root>' );
$list = $doc->getElementsByTagName('e');
print $list->item(5)->getAttribute('i')."\n";
$s = simplexml_import_dom($doc->documentElement);
unset($s->e[5]);
print $list->item(5)->getAttribute('i')."\n";

This hits the reset_objmap_cache() call, and restart is true, but it still segfaults afterwards, since objmap->nodetype is 0, so it dereferences nodep on line 213.

DOMDocument::xinclude also seems suspicious, calling php_dom_remove_xinclude_nodes -> php_dom_free_xinclude_node().

If there is any userspace code execution between xmlUnlinkNode() and php_dom_invalidate_node_list_cache(), such as destructors, then that could also cause problems.

@tstarling
Copy link
Contributor

dom_node_node_value_write() was missed. It calls php_libxml_node_free_list() which calls xmlUnlinkNode(). I haven't been able to get it to crash though, it just gives outdated return values. Test case:

$doc = new DOMDocument;
$doc->loadXML( '<root><a></a></root>');
$list = $doc->getElementsByTagName('a');
print $list->item(0)->tagName . "\n";
$doc->documentElement->nodeValue = '';
print $list->item(0)->tagName . "\n";
print $doc->saveXML();

@tstarling
Copy link
Contributor

Something here has broken with my real-world test case. Will try to isolate.

@tstarling
Copy link
Contributor

The problem is that when using DOMNodeList::item() over the whole document with consecutive indexes, it can only return children and siblings of the cached node, it can't go up the stack, looking at siblings of parents and other ancestors.

I don't think it's necessary to actually store a stack, there's only one way tree traversal can proceed from the current node.

Reduced test case:

$doc = new DOMDocument;
$doc->loadXML('<root><a><b i="1"/></a><b i="2"/></root>');
$list = $doc->getElementsByTagName('b');
$n = $list->length;
for ($i = 0; $i < $n; $i++) {
	print $list->item($i)->getAttribute('i') . "\n";
}

Should find two nodes, but fails at the second node, giving "Call to a member function getAttribute() on null".

Copy link
Member Author

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can work on this again tomorrow. Thanks Tim for checking this! The issues you found seem reasonably simple to fix. I also found a subtle issue: I missed invalidating when writing to the prefix field too. I'll do a thorough double check of all field writes to check if any invalidations are missed.

I can already reply concretely to one thing that's brought up:

If there is any userspace code execution between xmlUnlinkNode() and php_dom_invalidate_node_list_cache(), such as destructors, then that could also cause problems.

This shouldn't be a problem AFAIK. You can't really add custom objects to the DOM tree. The destructor will be called when the element is added to the tree, not when the element is removed. See https://3v4l.org/FsNuQ for example.

EDIT: huh, GH placed the comment about the assertion both here as well as a reply, oops 😅

Comment on lines 187 to 199
nodep = dom_object_get_node(objmap->cached_obj);
/* The node cannot be NULL if the cache is valid. If it is NULL, then it means we
* forgot an invalidation somewhere. Take the defensive programming approach and invalidate
* it here if it's NULL (except in debug mode where we would want to catch this). */
if (UNEXPECTED(nodep == NULL)) {
#if ZEND_DEBUG
ZEND_UNREACHABLE();
#endif
reset_objmap_cache(objmap);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, this runtime check is pretty cheap too. And in debug mode the assertion is valuable to know when it goes wrong, but we don't want to crash in production.

@nielsdos
Copy link
Member Author

As a side note, it might also make sense to let DOMNodeList::item and DOMNamedNodeMap::item use fast zpp, since they're likely to be called in a hot loop (but haven't measured anything yet). I suspect the binary code increase will not be a lot because it's only one parameter that needs parsing (again, not measured anything yet).

@tstarling
Copy link
Contributor

I don't think it's necessary to actually store a stack, there's only one way tree traversal can proceed from the current node.

I just want to mention, the existing recursion in dom_get_elements_by_tag_name_ns_raw() is apparently unnecessary. The next node to consider for filtering is always:

  • The first child of the current node, or if there is none,
  • The next sibling of the current node, or if there is none,
  • The next sibling of the nearest ancestor that has a such a sibling, terminating when the base node of the search is reached.

Googling for preorder traversal, I am finding a lot of pages that claim you need a stack, but the stack in those algorithms is just being used to compensate for the lack of parent and sibling pointers.

As a side note, it might also make sense to let DOMNodeList::item and DOMNamedNodeMap::item use fast zpp

Everything should use fast ZPP, I'm not aware of any downsides. Just as a matter of style, it would be easier to read the code if there was only one way to pass parameters. But it seems like an issue for a separate PR.

@Girgias
Copy link
Member

Girgias commented May 31, 2023

This shouldn't be a problem AFAIK. You can't really add custom objects to the DOM tree.

You can overwrite the objects DOM creates for its nodes by calling: https://www.php.net/manual/en/domdocument.registernodeclass

I'm using this obscure feature (which static analysers get utterly confused by) in my docbook render project: https://gitlab.com/Girgias/docbook-renderer/-/blob/master/src/DOMRenderingDocument.php

So, maybe that actually needs to be tested if such a class sets a destructor?

nielsdos added 11 commits May 31, 2023 19:22
…iteration

The current implementation follows the spec requirement that the list
must be "live". This means that changes in the document must be
reflected in the existing node lists without requiring the user to
refetch the node list.
The consequence is that getting any item, or the length of the list,
always starts searching from the root element of the node list. This
results in O(n) time to get any item or the length. If there's a for
loop over the node list, this means the iterations will take O(n²) time
in total. This causes real-world performance issues with potential for
downtime (see phpGH-11308 and its references for details).

We fix this by introducing a caching strategy. We cache the last
iterated object in the iterator, the last requested item in the node
list, and the last length computation. To invalidate the cache, we
simply count the number of modifications made to the containing
document. If the modification number does not match what the number was
during caching, we know the document has been modified and the cache is
invalid. If this ever overflows, we saturate the modification number and
don't do any caching anymore. Note that we don't check for overflow on
64-bit systems because it would take hundreds of years to overflow.

Fixes phpGH-11308.
Test was Co-authored-by: tstarling
@nielsdos
Copy link
Member Author

I rebased, and fixed the reported issues and added more tests. I also double-checked the field writes and I think I got them all now. It also turns out that the prefix write does not need to invalidate because the namespace is kept.

I still have to do some additional tests with trying to remove namespace declarations etc.

I did not test Tim's real-world test case of #11330 (comment) yet, I plan to do that though.

You can overwrite the objects DOM creates for its nodes by calling: https://www.php.net/manual/en/domdocument.registernodeclass

Yes, but this still cannot cause a problem I think. Checkout https://3v4l.org/TA3MV. Here we see the strange behaviour of destructors when using custom elements. The destructor only gets executed when the PHP wrapper object is destroyed. For example, the first time you see int(1), it's when the temporary argument of appendChild gets destroyed (this is the same that I showed earlier).
The int(1) at the end is the return value of remove() being destroyed, followed by the destruction of $doc. We won't execute the destructor during tree manipulation.

@tstarling
Copy link
Contributor

I did not test Tim's real-world test case of #11330 (comment) yet, I plan to do that though.

I ran it, and it works, and I find that it reduces the benchmark time from 8.6s to 0.7s (an average of three runs). If you want to run it, the full procedure is something like:

git clone https://github.com/wikimedia/svgtranslate.git
cd svgtranslate/

# Get version from before xpath workaround
git checkout eb9d5a230452fb588fbcabd6957e20c19cca8e74
composer update --no-scripts

# Create CLI test script
cat > analyze-svg.php
<?php
require __DIR__.'/vendor/autoload.php';
$f = new \App\Model\Svg\SvgFile( $argv[1] );
$trans = $f->getInFileTranslations();
print md5( json_encode( $trans ) ) . "\n";
^D

# Get large SVG
wget -O ukmap.svg https://upload.wikimedia.org/wikipedia/commons/archive/4/4f/20230525015255%212022_Russian_invasion_of_Ukraine.svg

time php analyze-svg.php ukmap.svg

The script should produce the result hash c94d6ea84fa23572e2abc66879384cb5.

Incidentally, our workaround which replaced getElementsByTagName() with XPath gives a benchmark time of 4.4s, so your fix is around 6 times faster than the userspace workaround, and 12 times faster than the original code.

/* Go upwards, until we find a parent node with a next sibling, or until we hit the base. */
do {
nodep = nodep->parent;
ZEND_ASSERT(nodep != NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine I guess. It just seems like a lot of faith to have in your code and the code of future developers. If I simulate a forgotten cache invalidation, say by commenting out the one in DOMNode::removeChild(), two of the tests crash here. Something like the following suggestion would be safer, would following existing conventions in the file, and would add minimal overhead.

Suggested change
ZEND_ASSERT(nodep != NULL);
if (UNEXPECTED(!nodep)) {
zend_throw_error(NULL, "Current node is not in a document");
return NULL;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have the assertion as this implies a bug within the implementation and having something crash is a very good indicator that we need to fix this.

Having a random Error been thrown to users is not really helpful IMHO, especially as I bet most people will not think to report an issue to the project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three options I see right now:

  • Assert. This will cause a crash in production which is bad, but it can result in a bug report.
  • Just return NULL. This will not result in a crash but causes in wrong results, and might not result in a bug report.
  • Throw an error and return NULL. The error might confuse users and might not result in a bug report.

There seems to be no good option here tbh. I like to avoid a crash in production too though... We can write something in the exception like "please report the bug at ...", but this has never been done before in php afaik.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did have such a bug report error for a corrupted timezone database, so it wouldn't be unheard of. So feel free to do something like this.

@nielsdos
Copy link
Member Author

nielsdos commented Jun 1, 2023

Thanks for testing.
I also benchmarked and I did some tiny improvements. Most notably I now fall back to the strings that libxml2 interned, if they are available. However, this did not result in a further big speedup (for this benchmark) because this only helps in the case where the strings are equal...

@nielsdos nielsdos requested a review from Girgias June 1, 2023 19:47
@nielsdos
Copy link
Member Author

nielsdos commented Jun 1, 2023

I also checked and adding fast ZPP won't do much because parameter parsing is only ~1.5% of the profile. Might still be worthwhile to add it to some of the methods imo.

Everything should use fast ZPP, I'm not aware of any downsides. Just as a matter of style, it would be easier to read the code if there was only one way to pass parameters. But it seems like an issue for a separate PR.

I forgot to reply to this. AFAIK this isn't done for all functions but only for functions where the overhead is large because of code size increase and pressure on the instruction cache. But I'm not too sure if this is the whole truth.

@Girgias
Copy link
Member

Girgias commented Jun 1, 2023

Everything should use fast ZPP, I'm not aware of any downsides. Just as a matter of style, it would be easier to read the code if there was only one way to pass parameters. But it seems like an issue for a separate PR.

No? That has never been the case, nor the intention of fast ZPP. It is used more in PHP 8 as it's the only sensible way to implement union types in ZPP, but it remains the case that non-critical functions should use the traditional ZPP mechanism to reduce code size.

So as @nielsdos the main reason for usage of fast ZPP in PHP 7 was to reduce the overhead of calling an external function, the main cases now in PHP 8 to use ZPP other than that is for ZPP supported common union types.

@tstarling
Copy link
Contributor

No? That has never been the case, nor the intention of fast ZPP. It is used more in PHP 8 as it's the only sensible way to implement union types in ZPP, but it remains the case that non-critical functions should use the traditional ZPP mechanism to reduce code size.

Sorry, I just meant, that's how I think it should be. Get rid of zend_parse_parameters(), speed up PHP for all use cases. You can't tell in advance which functions will be heavily called, and the long tail of function calls probably accounts for a lot of overhead, when added together. I don't think source code size should take precedence over performance and readability. In 2016-2017, Sara Goleman migrated a bunch of extensions to fast ZPP (standard, bcmath, ctype, date, posix, pdo, json, libxml, tokenizer) with no apparent filter for call frequency. I think we should finish the job.

If you mean binary size, well, I have some thoughts about that too, which I'd be happy to share in some more appropriate place if you're interested.

@Girgias
Copy link
Member

Girgias commented Jun 2, 2023

No? That has never been the case, nor the intention of fast ZPP. It is used more in PHP 8 as it's the only sensible way to implement union types in ZPP, but it remains the case that non-critical functions should use the traditional ZPP mechanism to reduce code size.

Sorry, I just meant, that's how I think it should be. Get rid of zend_parse_parameters(), speed up PHP for all use cases. You can't tell in advance which functions will be heavily called, and the long tail of function calls probably accounts for a lot of overhead, when added together. I don't think source code size should take precedence over performance and readability. In 2016-2017, Sara Goleman migrated a bunch of extensions to fast ZPP (standard, bcmath, ctype, date, posix, pdo, json, libxml, tokenizer) with no apparent filter for call frequency. I think we should finish the job.

If you mean binary size, well, I have some thoughts about that too, which I'd be happy to share in some more appropriate place if you're interested.

I don't really agree with this, considering ZPP is reimplementing the same thing over and over and over again in each function as it is a macro. Yes it is more readable, but I don't really think zend_parse_parameters() is that unreadable, the main problem being support for generic type signatures, something which is already an issue with fast ZPP and it would be great to have something else.

But in any case this is a discussion either on new issue or an internals email thread.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, just some minor comments/questions and need to figure out the bit where we should either Assert or throw an Error

local = xmlCharStrndup(name, name_len);
nsuri = xmlCharStrndup(uri ? uri : "", uri_len);
dom_namednode_iter(intern, 0, namednode, NULL, local, nsuri);
dom_namednode_iter(intern, 0, namednode, NULL, name, name_len, uri ? uri : "", uri_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't uri_len potentially undefined here? Or does ZPP set the value to 0 even if the uri pointer is set to NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri_len will be 0 if uri is not provided. Check out zend_parse_arg_string(): If the string is NULL then the length gets set to 0.

local = xmlCharStrndup(name, name_len);
nsuri = xmlCharStrndup(uri ? uri : "", uri_len);
dom_namednode_iter(intern, 0, namednode, NULL, local, nsuri);
dom_namednode_iter(intern, 0, namednode, NULL, name, name_len, uri ? uri : "", uri_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto uri_len comment

@tstarling
Copy link
Contributor

AFAIK this isn't done for all functions but only for functions where the overhead is large because of code size increase and pressure on the instruction cache. But I'm not too sure if this is the whole truth.

OK, so one practical thing I can say about this is that the overhead is proportional the number of arguments, and is small when arguments have simple types. I tried this patch:

diff --git a/ext/dom/nodelist.c b/ext/dom/nodelist.c
index 20e3b18bee..55073b2550 100644
--- a/ext/dom/nodelist.c
+++ b/ext/dom/nodelist.c
@@ -154,9 +154,9 @@ PHP_METHOD(DOMNodeList, item)
        int count = 0;
 
        id = ZEND_THIS;
-       if (zend_parse_parameters(ZEND_NUM_ARGS(), "l", &index) == FAILURE) {
-               RETURN_THROWS();
-       }
+       ZEND_PARSE_PARAMETERS_START(1, 1)
+               Z_PARAM_LONG(index)
+       ZEND_PARSE_PARAMETERS_END();
 
        if (index >= 0) {
                intern = Z_DOMOBJ_P(id);

Prior to this patch, this function compiles to 867 bytes. After the patch, it is 884 bytes (an increase of 17 bytes), plus a cold section of 31 bytes, which is placed in a separate segment and won't affect instruction cache pressure. 17 bytes may be an overestimate of the effect of the patch, because unexpected branches are placed at the end of the function, after the ret.

Note that code can only have an impact on instruction cache pressure when it is actually executed. That's the paradox of inlining hot code. The hot functions are the ones with the worst impact on the instruction cache, because they are executed frequently and will evict other things every time they are run.

I have the 2017 version of the Intel Optimization Reference Manual handy, and it has some hints on this kind of problem:

To ensure “hot” code is feeding from the decoded ICache:
• Make sure each hot code block is less than about 500 instructions. Specifically, do not unroll to more
than 500 instructions in a loop. This should enable Decoded ICache residency even when hyper-
threading is enabled.
• For applications with very large blocks of calculations inside a loop, consider loop-fission: split the
loop into multiple loops that fit in the Decoded ICache, rather than a single loop that overflows.
• If an application can be sure to run with only one thread per core, it can increase hot code block size
to about 1000 instructions.

Not easily applicable to us, but you get the idea that the critical thing to worry about is the size of the body of a hot loop.

@nielsdos
Copy link
Member Author

nielsdos commented Jun 2, 2023

Changed the code to throw an exception if the NULL pointer is encountered.

RE Tim's ZPP comment:
Interesting, I didn't realise the difference is so small for a single parameter. It's basically a no-brainer then for the ext/dom methods with a single parameter. I'll take a look sometime soon at converting some of these functions.

@nielsdos nielsdos merged commit c3f0797 into php:master Jun 2, 2023
@nielsdos
Copy link
Member Author

nielsdos commented Jun 2, 2023

FWIW, I get an increase of more than 17 bytes, I get 96 bytes increase. Still looks okay though.

@tstarling
Copy link
Contributor

Great to see it merged, thanks for this!

FWIW, I get an increase of more than 17 bytes, I get 96 bytes increase. Still looks okay though.

I ran it again and I still get 17 bytes. I'm measuring it like this:

$ objdump -t ext/dom/nodelist.o | grep zim_DOMNodeList_item
0000000000000220 g     F .text  0000000000000363 .hidden zim_DOMNodeList_item
$ make
...
$ objdump -t ext/dom/nodelist.o | grep zim_DOMNodeList_item
0000000000000013 l     F .text.unlikely 000000000000001f zim_DOMNodeList_item.cold
0000000000000220 g     F .text  0000000000000374 .hidden zim_DOMNodeList_item
$ php -r 'print 0x374-0x363 . "\n";'
17

In debug mode I get a difference of 599 bytes, so it's definitely sensitive to compiler optimization level.

@nielsdos
Copy link
Member Author

nielsdos commented Jun 3, 2023

I basically did the same thing, but on the sapi/cli/php binary. I guess the difference is due to a different compiler (& version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

getElementsByTagName() is O(N^2)
3 participants