Skip to content

Commit 2a663ce

Browse files
deanrasheeddutow
authored andcommitted
doc: Warn that ts_headline() output is not HTML-safe.
Add a documentation warning to ts_headline() pointing out that, when working with untrusted input documents, the output is not guaranteed to be safe for direct inclusion in web pages. This is because, while it does remove some XML tags from the input, it doesn't remove all HTML markup, and so the result may be unsafe (e.g., it might permit XSS attacks). To guard against that, all HTML markup should be removed from the input, making it plain text, or the output should be passed through an HTML sanitizer. In addition, document precisely what the default text search parser recognises as valid XML tags, since that's what determines which XML tags ts_headline() will remove. Reported-by: Richard Neill <richard.neill@telos.digital> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Backpatch-through: 13
1 parent 8104115 commit 2a663ce

File tree

1 file changed

+28
-1
lines changed

1 file changed

+28
-1
lines changed

doc/src/sgml/textsearch.sgml

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1342,7 +1342,7 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
13421342
document, to distinguish them from other excerpted words. The
13431343
default values are <quote><literal>&lt;b&gt;</literal></quote> and
13441344
<quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
1345-
for HTML output.
1345+
for HTML output (but see the warning below).
13461346
</para>
13471347
</listitem>
13481348
<listitem>
@@ -1354,6 +1354,21 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
13541354
</listitem>
13551355
</itemizedlist>
13561356

1357+
<warning>
1358+
<title>Warning: Cross-site scripting (XSS) safety</title>
1359+
<para>
1360+
The output from <function>ts_headline</function> is not guaranteed to
1361+
be safe for direct inclusion in web pages. When
1362+
<literal>HighlightAll</literal> is <literal>false</literal> (the
1363+
default), some simple XML tags are removed from the document, but this
1364+
is not guaranteed to remove all HTML markup. Therefore, this does not
1365+
provide an effective defense against attacks such as cross-site
1366+
scripting (XSS) attacks, when working with untrusted input. To guard
1367+
against such attacks, all HTML markup should be removed from the input
1368+
document, or an HTML sanitizer should be used on the output.
1369+
</para>
1370+
</warning>
1371+
13571372
These option names are recognized case-insensitively.
13581373
You must double-quote string values if they contain spaces or commas.
13591374
</para>
@@ -2225,6 +2240,18 @@ LIMIT 10;
22252240
Specifically, the only non-alphanumeric characters supported for
22262241
email user names are period, dash, and underscore.
22272242
</para>
2243+
2244+
<para>
2245+
<literal>tag</literal> does not support all valid tag names as defined by
2246+
<ulink url="https://www.w3.org/TR/xml/">W3C Recommendation, XML</ulink>.
2247+
Specifically, the only tag names supported are those starting with an
2248+
ASCII letter, underscore, or colon, and containing only letters, digits,
2249+
hyphens, underscores, periods, and colons. <literal>tag</literal> also
2250+
includes XML comments starting with <literal>&lt;!--</literal> and ending
2251+
with <literal>--&gt;</literal>, and XML declarations (but note that this
2252+
includes anything starting with <literal>&lt;?x</literal> and ending with
2253+
<literal>&gt;</literal>).
2254+
</para>
22282255
</note>
22292256

22302257
<para>

0 commit comments

Comments
 (0)