getElementsByTagName() is O(N^2)

### Description

DOMDocument::getElementsByTagName() and DOMElement::getElementsByTagName() return an iterator which does a linear time search for the current position, each time the position is moved to the next node. So completing the iteration of the node list takes O(N^2) time where N is the number of nodes in the list.

For example:

```php
for ( $n = 1000; $n <= 16000; $n *= 2 ) {
	$doc = new DOMDocument;
	$root = $doc->createElement( 'root' );
	$doc->appendChild( $root );
	for ( $i = 0; $i < $n; $i++ ) {
		$root->appendChild( $doc->createElement( 'e' ) );
	}
	$t = -microtime( true );
	foreach ( $doc->getElementsByTagName( 'e' ) as $node );
	$t += microtime( true );
	print "$n\t$t\n";
}
```

resulted in this output:
```
1000    0.0031359195709229
2000    0.012067079544067
4000    0.050027132034302
8000    0.20300078392029
16000   0.76775693893433
```

It was likely faster prior to 3084e72ef1053b8ead468e66baf55d5256e7e9af. Since that commit in 2003, getElementsByTagName() is similar to the following pseudocode:

```php
function getElementsByTagName( $root, $name ) {
	$i = 0;
	do {
		$node = findNextNode( $root, $name );
		for ( $j = 0; $node && $j < $i; $j++ ) {
			$node = findNextNode( $node, $name );
		}
		
		if ( $node ) {
			yield $node;
		}
		$i++;
	} while ( $node );
}
```

That commit was motivated by standards compliance. The standard says that the node list returned by getElementsByTagName() is live, that is, it must immediately reflect changes to the document. The standard has no concept of iteration, it only has an integer offset which can be passed to item(). The pseudocode above does correctly reproduce the quirks implied by the standard. For example:

```php
$doc = new DOMDocument;
$doc->loadXML( '<root><e i="1"/><e i="2"/><e i="3"/><e i="4"/><e i="5"/><e i="6"/><e i="7"/><e i="8"/><e i="9"/><e i="10"/></root>' );
$root = $doc->documentElement;
foreach ( $doc->getElementsByTagName( 'e' ) as $node ) {
	print $node->getAttribute( 'i' ) . ' ';
	$root->removeChild( $node );
}
print "\n";
```

produces

```
1 3 5 7 9 
```

WebKit implements the standard efficiently by having an item cache and a length cache. The item cache allows efficient retrieval of an item which has an index close to the previously requested item. These caches are invalidated on tree mutation.

PHP has no concept of tree mutation event callbacks, so implementing this in PHP may be somewhat tedious. 

### PHP Version

PHP 8.2.6

### Operating System

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

getElementsByTagName() is O(N^2) #11308

Description

PHP Version

Operating System

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

getElementsByTagName() is O(N^2) #11308

Description

Description

PHP Version

Operating System

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions