Avoid allocations when getting content of DOM nodes if possible #11543

nielsdos · 2023-06-26T20:07:34Z

Everytime text content or node value or something alike is requested, it currently allocates memory twice: once in getting the data from libxml2, and then allocating a zend_string to copy the data into.
We can very often skip the first allocation. Let's do that because it's actually quite common to get the textContent for example from a node when processing documents.

Here's a microbenchmark with time measurements (only for textContent, but for nodeValue etc the result is similar).
There's no noticeable slowdown because of the extra checks for the cases where we cannot avoid an allocation.
Based on these results I'd say it's worth it:

<?php

$doc = new DOMDocument;
$doc->loadXML('<?xml version="1.0"?><div/>');
$doc->documentElement->append(str_repeat('hello world', 10));
$el = $doc->documentElement->firstChild;

for ($i = 0; $i < 1000000; $i++) {
    $el->textContent;
}

Bench results:

Benchmark 1: ./sapi/cli/php dom.php
  Time (mean ± σ):      43.1 ms ±   2.2 ms    [User: 40.4 ms, System: 2.6 ms]
  Range (min … max):    40.3 ms …  51.5 ms    62 runs
 
Benchmark 2: ./sapi/cli/php_old dom.php
  Time (mean ± σ):     107.6 ms ±   1.6 ms    [User: 104.5 ms, System: 2.9 ms]
  Range (min … max):   105.4 ms … 112.1 ms    27 runs
 
Summary
  ./sapi/cli/php dom.php ran
    2.50 ± 0.13 times faster than ./sapi/cli/php_old dom.php

Girgias

LGTM

UPGRADING.INTERNALS

Girgias · 2023-06-27T13:04:00Z

ext/dom/characterdata.c

@@ -38,19 +38,13 @@ URL: http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030226/DOM3-Core.html#core-
 int dom_characterdata_data_read(dom_object *obj, zval *retval)


Complete side note, changing the return type from various functions that only return SUCCESS and FAILURE from int to zend_result might be a good idea as a follow-up.

nielsdos added 2 commits June 26, 2023 21:34

Avoid allocation when getting the node content, if possible

ba030c0

[ci skip] Update UPGRADING.INTERNALS

5b83529

github-actions bot added the Extension: dom label Jun 26, 2023

Fix CI

c853e08

Girgias approved these changes Jun 27, 2023

View reviewed changes

nielsdos closed this in 941a7e5 Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid allocations when getting content of DOM nodes if possible #11543

Avoid allocations when getting content of DOM nodes if possible #11543

Uh oh!

nielsdos commented Jun 26, 2023

Uh oh!

Girgias left a comment

Uh oh!

Uh oh!

Girgias Jun 27, 2023

Uh oh!

Uh oh!

		@@ -38,19 +38,13 @@ URL: http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030226/DOM3-Core.html#core-
		int dom_characterdata_data_read(dom_object obj, zval retval)

Avoid allocations when getting content of DOM nodes if possible #11543

Avoid allocations when getting content of DOM nodes if possible #11543

Uh oh!

Conversation

nielsdos commented Jun 26, 2023

Uh oh!

Girgias left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Girgias Jun 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!