Skip to content

Commit 44dd16c

Browse files
committed
Improve performances by inlining the TextVisitor
1 parent 5784bf3 commit 44dd16c

File tree

3 files changed

+140
-1
lines changed

3 files changed

+140
-1
lines changed

benchmark/fixture.html

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
<h1><a id="user-content-creating-an-extension-to-allow-custom-tags" class="anchor" aria-hidden="true" href="#creating-an-extension-to-allow-custom-tags"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Creating an extension to allow custom tags</h1>
2+
<p>If you want to use additional tags than the one present in the sanitizer core extensions, you can create your
3+
own extension.</p>
4+
<p>There are two steps in the creation of an extension to handle additional tags: creating the node visitor which
5+
will handle the custom tag, and registering it using an extension.</p>
6+
<p>To better understand how to create an extension suited to your needs, you can also have a look at the
7+
<a href="https://github.com/tgalopin/html-sanitizer/tree/master/src/Extension/Image">Image extension</a>
8+
which shows the different features available.</p>
9+
<h2><a id="user-content-creating-a-node-and-a-node-visitor" class="anchor" aria-hidden="true" href="#creating-a-node-and-a-node-visitor"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Creating a node and a node visitor</h2>
10+
<p>A node visitor is a class able to handle DOMNode instances of a certain type. It needs to implement the
11+
<code>HtmlSanitizer\Visitor\VisitorInterface</code>.</p>
12+
<p>A node visitor is responsible of adding a node to the tree of safe HTML by filtering the DOMNode
13+
it's given. Thus, for an example <code>my-tag</code> custom tag, we need to create two classes: a Node and
14+
a NodeVisitor.</p>
15+
<p>The node could look like this:</p>
16+
<div class="highlight highlight-text-html-php"><pre><span class="pl-s1"><span class="pl-k">namespace</span> <span class="pl-en">App\Sanitizer</span>;</span>
17+
<span class="pl-s1"></span>
18+
<span class="pl-s1"><span class="pl-k">use</span> <span class="pl-c1">HtmlSanitizer\Node\AbstractTagNode</span>;</span>
19+
<span class="pl-s1"><span class="pl-k">use</span> <span class="pl-c1">HtmlSanitizer\Node\HasChildrenTrait</span>;</span>
20+
<span class="pl-s1"></span>
21+
<span class="pl-s1"><span class="pl-k">class</span> <span class="pl-en">MyTagNode</span> <span class="pl-k">extends</span> <span class="pl-e">AbstractTagNode</span></span>
22+
<span class="pl-s1">{</span>
23+
<span class="pl-s1"> <span class="pl-k">use</span> <span class="pl-c1">HasChildrenTrait</span>; <span class="pl-c"><span class="pl-c">//</span> Or IsChildlessTrait</span></span>
24+
<span class="pl-s1"></span>
25+
<span class="pl-s1"> <span class="pl-k">public</span> <span class="pl-k">function</span> <span class="pl-en">getTagName</span>(): <span class="pl-k">string</span></span>
26+
<span class="pl-s1"> {</span>
27+
<span class="pl-s1"> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">'</span>my-tag<span class="pl-pds">'</span></span>;</span>
28+
<span class="pl-s1"> }</span>
29+
<span class="pl-s1">}</span></pre></div>
30+
<p>A simple visitor for a <code>my-tag</code> custom tag could look like this:</p>
31+
<div class="highlight highlight-text-html-php"><pre><span class="pl-s1"><span class="pl-k">namespace</span> <span class="pl-en">App\Sanitizer</span>;</span>
32+
<span class="pl-s1"></span>
33+
<span class="pl-s1"><span class="pl-k">use</span> <span class="pl-c1">HtmlSanitizer\Model\Cursor</span>;</span>
34+
<span class="pl-s1"><span class="pl-k">use</span> <span class="pl-c1">HtmlSanitizer\Node\NodeInterface</span>;</span>
35+
<span class="pl-s1"><span class="pl-k">use</span> <span class="pl-c1">HtmlSanitizer\Visitor\AbstractNodeVisitor</span>;</span>
36+
<span class="pl-s1"><span class="pl-k">use</span> <span class="pl-c1">HtmlSanitizer\Visitor\HasChildrenNodeVisitorTrait</span>;</span>
37+
<span class="pl-s1"></span>
38+
<span class="pl-s1"><span class="pl-k">class</span> <span class="pl-en">MyTagNodeVisitor</span> <span class="pl-k">extends</span> <span class="pl-e">AbstractNodeVisitor</span></span>
39+
<span class="pl-s1">{</span>
40+
<span class="pl-s1"> <span class="pl-k">use</span> <span class="pl-c1">HasChildrenNodeVisitorTrait</span>; <span class="pl-c"><span class="pl-c">//</span> Or IsChildlessTagVisitorTrait</span></span>
41+
<span class="pl-s1"></span>
42+
<span class="pl-s1"> <span class="pl-k">protected</span> <span class="pl-k">function</span> <span class="pl-en">getDomNodeName</span>(): <span class="pl-k">string</span></span>
43+
<span class="pl-s1"> {</span>
44+
<span class="pl-s1"> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">'</span>my-tag<span class="pl-pds">'</span></span>;</span>
45+
<span class="pl-s1"> }</span>
46+
<span class="pl-s1"></span>
47+
<span class="pl-s1"> <span class="pl-k">public</span> <span class="pl-k">function</span> <span class="pl-en">getDefaultAllowedAttributes</span>(): <span class="pl-k">array</span></span>
48+
<span class="pl-s1"> {</span>
49+
<span class="pl-s1"> <span class="pl-k">return</span> [</span>
50+
<span class="pl-s1"> <span class="pl-s"><span class="pl-pds">'</span>class<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>width<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>height<span class="pl-pds">'</span></span></span>
51+
<span class="pl-s1"> ];</span>
52+
<span class="pl-s1"> }</span>
53+
<span class="pl-s1"></span>
54+
<span class="pl-s1"> <span class="pl-k">public</span> <span class="pl-k">function</span> <span class="pl-en">getDefaultConfiguration</span>(): <span class="pl-k">array</span></span>
55+
<span class="pl-s1"> {</span>
56+
<span class="pl-s1"> <span class="pl-k">return</span> [</span>
57+
<span class="pl-s1"> <span class="pl-s"><span class="pl-pds">'</span>custom_config<span class="pl-pds">'</span></span> <span class="pl-k">=&gt;</span> <span class="pl-c1">null</span>,</span>
58+
<span class="pl-s1"> ];</span>
59+
<span class="pl-s1"> }</span>
60+
<span class="pl-s1"></span>
61+
<span class="pl-s1"> <span class="pl-k">protected</span> <span class="pl-k">function</span> <span class="pl-en">createNode</span>(<span class="pl-c1">\DOMNode</span> <span class="pl-smi">$domNode</span>, <span class="pl-c1">Cursor</span> <span class="pl-smi">$cursor</span>): <span class="pl-c1">NodeInterface</span></span>
62+
<span class="pl-s1"> {</span>
63+
<span class="pl-s1"> <span class="pl-c"><span class="pl-c">//</span> You need to pass the current node as your node parent</span></span>
64+
<span class="pl-s1"> <span class="pl-smi">$node</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-c1">MyTagNode</span>(<span class="pl-smi">$cursor</span><span class="pl-k">-&gt;</span><span class="pl-smi">node</span>);</span>
65+
<span class="pl-s1"> </span>
66+
<span class="pl-s1"> <span class="pl-c"><span class="pl-c">//</span> You can use $this-&gt;config['custom_config'] to access the user-defined configuration</span></span>
67+
<span class="pl-s1"></span>
68+
<span class="pl-s1"> <span class="pl-k">return</span> <span class="pl-smi">$node</span>;</span>
69+
<span class="pl-s1"> }</span>
70+
<span class="pl-s1">}</span></pre></div>
71+
<h2><a id="user-content-registering-the-node-visitor-with-an-extension" class="anchor" aria-hidden="true" href="#registering-the-node-visitor-with-an-extension"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>Registering the node visitor with an extension</h2>
72+
<p>Once you created a node and a node visitor, you need to use an extension to register the visitor in the
73+
sanitizer.</p>
74+
<p>An extension is a class implementing the <code>HtmlSanitizer\Extension\ExtensionInterface</code> interface, which requires
75+
two methods:</p>
76+
<ul>
77+
<li><code>getName()</code> which should return the name to use in the configuration (<code>basic</code>, <code>list</code>, etc.) ;</li>
78+
<li>and <code>createNodeVisitors()</code> which should return a list of node visitors associated to the tag the visit ;</li>
79+
</ul>
80+
<p>For our node visitor, this could look like this:</p>
81+
<div class="highlight highlight-text-html-php"><pre><span class="pl-s1"><span class="pl-k">namespace</span> <span class="pl-en">App\Sanitizer</span>;</span>
82+
<span class="pl-s1"></span>
83+
<span class="pl-s1"><span class="pl-k">use</span> <span class="pl-c1">HtmlSanitizer\Extension\ExtensionInterface</span>;</span>
84+
<span class="pl-s1"></span>
85+
<span class="pl-s1"><span class="pl-k">class</span> <span class="pl-en">MyTagExtension</span> <span class="pl-k">implements</span> <span class="pl-e">ExtensionInterface</span></span>
86+
<span class="pl-s1">{</span>
87+
<span class="pl-s1"> <span class="pl-k">public</span> <span class="pl-k">function</span> <span class="pl-en">getName</span>(): <span class="pl-k">string</span></span>
88+
<span class="pl-s1"> {</span>
89+
<span class="pl-s1"> <span class="pl-k">return</span> <span class="pl-s"><span class="pl-pds">'</span>my-tag<span class="pl-pds">'</span></span>;</span>
90+
<span class="pl-s1"> }</span>
91+
<span class="pl-s1"></span>
92+
<span class="pl-s1"> <span class="pl-k">public</span> <span class="pl-k">function</span> <span class="pl-en">createNodeVisitors</span>(<span class="pl-k">array</span> <span class="pl-smi">$config</span> <span class="pl-k">=</span> []): <span class="pl-k">array</span></span>
93+
<span class="pl-s1"> {</span>
94+
<span class="pl-s1"> <span class="pl-k">return</span> [</span>
95+
<span class="pl-s1"> <span class="pl-s"><span class="pl-pds">'</span>my-tag<span class="pl-pds">'</span></span> <span class="pl-k">=&gt;</span> <span class="pl-k">new</span> <span class="pl-c1">MyTagNodeVisitor</span>(<span class="pl-smi">$config</span>[<span class="pl-s"><span class="pl-pds">'</span>tags<span class="pl-pds">'</span></span>][<span class="pl-s"><span class="pl-pds">'</span>my-tag<span class="pl-pds">'</span></span>] ?? []),</span>
96+
<span class="pl-s1"> </span>
97+
<span class="pl-s1"> <span class="pl-c"><span class="pl-c">//</span> You can also override previous extensions tags here, for instance:</span></span>
98+
<span class="pl-s1"> <span class="pl-c"><span class="pl-c">//</span> 'img' =&gt; new MyCustomImgVisitor(),</span></span>
99+
<span class="pl-s1"> ];</span>
100+
<span class="pl-s1"> }</span>
101+
<span class="pl-s1">}</span></pre></div>
102+
<p>Then, you can use the builder to create a Sanitizer that include this extension:</p>
103+
<div class="highlight highlight-text-html-php"><pre><span class="pl-s1"><span class="pl-smi">$builder</span> <span class="pl-k">=</span> <span class="pl-k">new</span> <span class="pl-c1">HtmlSanitizer\</span><span class="pl-c1">SanitizerBuilder</span>();</span>
104+
<span class="pl-s1"><span class="pl-smi">$builder</span><span class="pl-k">-&gt;</span>registerExtension(<span class="pl-k">new</span> <span class="pl-c1">HtmlSanitizer\Extension\</span><span class="pl-c1">BasicExtension</span>());</span>
105+
<span class="pl-s1"><span class="pl-smi">$builder</span><span class="pl-k">-&gt;</span>registerExtension(<span class="pl-k">new</span> <span class="pl-c1">HtmlSanitizer\Extension\</span><span class="pl-c1">ListExtension</span>());</span>
106+
<span class="pl-s1"><span class="pl-c"><span class="pl-c">//</span> Add the other core ones you need</span></span>
107+
<span class="pl-s1"></span>
108+
<span class="pl-s1"><span class="pl-smi">$builder</span><span class="pl-k">-&gt;</span>registerExtension(<span class="pl-k">new</span> <span class="pl-c1">App\Sanitizer\</span><span class="pl-c1">MyTagExtension</span>());</span>
109+
<span class="pl-s1"></span>
110+
<span class="pl-s1"><span class="pl-smi">$sanitizer</span> <span class="pl-k">=</span> <span class="pl-smi">$builder</span><span class="pl-k">-&gt;</span>build([</span>
111+
<span class="pl-s1"> <span class="pl-s"><span class="pl-pds">'</span>extensions<span class="pl-pds">'</span></span> <span class="pl-k">=&gt;</span> [<span class="pl-s"><span class="pl-pds">'</span>basic<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>list<span class="pl-pds">'</span></span>, <span class="pl-s"><span class="pl-pds">'</span>my-tag<span class="pl-pds">'</span></span>],</span>
112+
<span class="pl-s1">});</span></pre></div>

benchmark/run.php

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
<?php
2+
3+
require __DIR__.'/../vendor/autoload.php';
4+
5+
$sanitizer = \HtmlSanitizer\Sanitizer::create(['extensions' => ['basic', 'code', 'image', 'list', 'table', 'extra']]);
6+
7+
$input = file_get_contents(__DIR__.'/fixture.html');
8+
$times = 100;
9+
$time = microtime(true);
10+
11+
echo "Running...\n";
12+
13+
for ($i = 0; $i < $times; $i++) {
14+
$output = $sanitizer->sanitize($input);
15+
}
16+
17+
$total = (microtime(true) - $time) * 1000;
18+
19+
echo 'Total for '.$times.' loops: '.round($total, 2)."ms\n";
20+
echo 'Time per loop: '.round($total / $times, 2)."ms\n";
21+
echo "\n";

src/DomVisitor.php

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
use HtmlSanitizer\Model\Cursor;
1515
use HtmlSanitizer\Node\DocumentNode;
16+
use HtmlSanitizer\Node\TextNode;
1617
use HtmlSanitizer\Visitor\NodeVisitorInterface;
1718

1819
/**
@@ -58,7 +59,12 @@ private function visitNode(\DOMNode $node, Cursor $cursor)
5859
}
5960

6061
foreach ($node->childNodes ?? [] as $k => $child) {
61-
$this->visitNode($child, $cursor);
62+
if ('#text' === $child->nodeName) {
63+
$cursor->node->addChild(new TextNode($cursor->node, $child->nodeValue));
64+
} elseif (!$child instanceof \DOMText) {
65+
// Ignore CDATA sections
66+
$this->visitNode($child, $cursor);
67+
}
6268
}
6369

6470
foreach ($this->reversedVisitors as $visitor) {

0 commit comments

Comments
 (0)