-
Notifications
You must be signed in to change notification settings - Fork 6
Description
I'm trying to parse a feed and render its contents on a website. The feed sometimes contains HTML code blocks (think tutorial posts explaining how to do something in HTML, like this).
Take this example feed for instance:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<item>
<content:encoded><![CDATA[
<pre><code><div class="wrapper">Lorem ipsum dolor sit amet</div></code></pre>
]]></content:encoded>
</item>
</channel>
</rss>
Intuitively, I expected that parseFeed(xml).items[0].content
would return something like:
<pre><code><div class="wrapper">Lorem ipsum dolor sit amet</div></code></pre>
Instead, the text for content gets unescaped (RSS, Atom), and this is returned instead:
<pre><code><div class="wrapper">Lorem ipsum dolor sit amet</div></code></pre>
While I do want the outer <pre>
and <code>
tags to be rendered as proper HTML tags on the final page, the inner div
I want to keep verbatim, i.e. <div class="wrapper">
, so that it is rendered as text on the final website.
I made the changes to suit my needs in this commit, including some tests. I was unable to get most of the integration tests to actually pass, since the feedparser
library (used to process feeds in tests) seems to unescape HTML in the same way, with no option to turn it off.
The way I did it would also be a breaking change; to avoid, assuming you even want to support this use case, perhaps we could add an options
parameter to the parseFeed
function to opt out of unescaping?