Description
Description
DOMDocument::getElementsByTagName() and DOMElement::getElementsByTagName() return an iterator which does a linear time search for the current position, each time the position is moved to the next node. So completing the iteration of the node list takes O(N^2) time where N is the number of nodes in the list.
For example:
for ( $n = 1000; $n <= 16000; $n *= 2 ) {
$doc = new DOMDocument;
$root = $doc->createElement( 'root' );
$doc->appendChild( $root );
for ( $i = 0; $i < $n; $i++ ) {
$root->appendChild( $doc->createElement( 'e' ) );
}
$t = -microtime( true );
foreach ( $doc->getElementsByTagName( 'e' ) as $node );
$t += microtime( true );
print "$n\t$t\n";
}
resulted in this output:
1000 0.0031359195709229
2000 0.012067079544067
4000 0.050027132034302
8000 0.20300078392029
16000 0.76775693893433
It was likely faster prior to 3084e72. Since that commit in 2003, getElementsByTagName() is similar to the following pseudocode:
function getElementsByTagName( $root, $name ) {
$i = 0;
do {
$node = findNextNode( $root, $name );
for ( $j = 0; $node && $j < $i; $j++ ) {
$node = findNextNode( $node, $name );
}
if ( $node ) {
yield $node;
}
$i++;
} while ( $node );
}
That commit was motivated by standards compliance. The standard says that the node list returned by getElementsByTagName() is live, that is, it must immediately reflect changes to the document. The standard has no concept of iteration, it only has an integer offset which can be passed to item(). The pseudocode above does correctly reproduce the quirks implied by the standard. For example:
$doc = new DOMDocument;
$doc->loadXML( '<root><e i="1"/><e i="2"/><e i="3"/><e i="4"/><e i="5"/><e i="6"/><e i="7"/><e i="8"/><e i="9"/><e i="10"/></root>' );
$root = $doc->documentElement;
foreach ( $doc->getElementsByTagName( 'e' ) as $node ) {
print $node->getAttribute( 'i' ) . ' ';
$root->removeChild( $node );
}
print "\n";
produces
1 3 5 7 9
WebKit implements the standard efficiently by having an item cache and a length cache. The item cache allows efficient retrieval of an item which has an index close to the previously requested item. These caches are invalidated on tree mutation.
PHP has no concept of tree mutation event callbacks, so implementing this in PHP may be somewhat tedious.
PHP Version
PHP 8.2.6
Operating System
No response