Skip to content

Commit

Permalink
Simplify loop for unwrapping noscript
Browse files Browse the repository at this point in the history
  • Loading branch information
RadhiFadlillah authored and gijsk committed Apr 3, 2020
1 parent adc6acc commit 6fed286
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 42 deletions.
57 changes: 27 additions & 30 deletions Readability.js
Original file line number Diff line number Diff line change
Expand Up @@ -1316,6 +1316,19 @@ Readability.prototype = {
return metadata;
},

/**
* Check if node is image, or if node contains exactly only one image
* whether as a direct child or as its descendants.
*
* @param Element
**/
_isSingleImage: function(node) {
if (node.tagName === "IMG") return true;
if (node.children.length !== 1) return false;
if (node.textContent.trim() !== "") return false;
return this._isSingleImage(node.children[0]);
},

/**
* Find all <noscript> that are located after <img> nodes, and which contain only one
* <img> element. Replace the first image with the image from inside the <noscript> tag,
Expand All @@ -1325,50 +1338,34 @@ Readability.prototype = {
* @param Element
**/
_unwrapNoscriptImages: function(doc) {
// First, find div which only contains single img element, then put it out.
var divs = doc.getElementsByTagName("div");
this._forEachNode(divs, function(div) {
if (div.children.length == 1 && div.children[0].tagName === "IMG") {
div.parentNode.replaceChild(div.children[0], div);
}
});

// Next find img without source, and remove it. This is done to
// prevent a placeholder img is replaced by img from noscript in next step.
// Find img without source and remove it. This is done to prevent a placeholder
// img is replaced by img from noscript in next step.
var imgs = doc.getElementsByTagName("img");
this._forEachNode(imgs, function(img) {
var src = img.getAttribute("src") || "",
srcset = img.getAttribute("srcset") || "",
dataSrc = img.getAttribute("data-src") || "",
dataSrcset = img.getAttribute("data-srcset") || "";
var src = img.getAttribute("src");
var srcset = img.getAttribute("srcset");
var dataSrc = img.getAttribute("data-src");
var dataSrcset = img.getAttribute("data-srcset");

if (src === "" && srcset === "" && dataSrc === "" && dataSrcset === "") {
if (!src && !srcset && !dataSrc && !dataSrcset) {
img.parentNode.removeChild(img);
}
});

// Next find noscript and try to extract its image
var noscripts = doc.getElementsByTagName("noscript");
this._forEachNode(noscripts, function(noscript) {
// Make sure prev sibling exists and it's image
var prevElement = noscript.previousElementSibling;
if (prevElement == null || prevElement.tagName !== "IMG") {
return;
}

// In spec-compliant browser, content of noscript is treated as
// string so here we parse it.
// Parse content of noscript and make sure it only contains image
var tmp = doc.createElement("div");
tmp.innerHTML = noscript.innerHTML;
if (!this._isSingleImage(tmp)) return;

// Make sure noscript only has one child, and it's <img> element
var children = tmp.children;
if (children.length != 1 || children[0].tagName !== "IMG") {
return;
// If noscript has previous sibling and it only contains image,
// replace it with noscript content.
var prevElement = noscript.previousElementSibling;
if (prevElement && this._isSingleImage(prevElement)) {
noscript.parentNode.replaceChild(tmp.children[0], prevElement);
}

// At this point, just replace the previous img with img from noscript.
noscript.parentNode.replaceChild(children[0], prevElement);
});
},

Expand Down
25 changes: 15 additions & 10 deletions test/test-pages/bug-1255978/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@
<p>But even luxury hotels aren’t always cleaned as often as they should be.</p>
<p>Here are some of the secrets that the receptionist will never tell you when you check in, according to answers posted on <a href="https://www.quora.com/What-are-the-things-we-dont-know-about-hotel-rooms" target="_blank">Quora</a>.</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/18/10/bandb2.jpg" alt="bandb2.jpg" title="bandb2.jpg" width="564" height="423" />
</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/18/10/bandb2.jpg" alt="bandb2.jpg" title="bandb2.jpg" width="564" height="423" /></p>
</div>
<p>Even posh hotels might not wash a blanket in between stays </p>
</div>
<p>1. Take any blankets or duvets off the bed</p>
Expand All @@ -16,22 +17,25 @@
<p>Video shows bed bug infestation at New York hotel</p>
</div>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/05/26/11/hotel-door-getty.jpg" alt="hotel-door-getty.jpg" title="hotel-door-getty.jpg" width="564" height="423" />
</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/05/26/11/hotel-door-getty.jpg" alt="hotel-door-getty.jpg" title="hotel-door-getty.jpg" width="564" height="423" /></p>
</div>
<p>Forrest Jones advised stuffing the peep hole with a strip of rolled up notepaper when not in use. </p>
</div>
<p>2. Check the peep hole has not been tampered with</p>
<p>This is not common, but can happen, Forrest Jones said. He advised stuffing the peep hole with a strip of rolled up notepaper when not in use. When someone knocks on the door, the paper can be removed to check who is there. If no one is visible, he recommends calling the front desk immediately. “I look forward to the day when I can tell you to choose only hotels where every employee who has access to guestroom keys is subjected to a complete public records background check, prior to hire, and every year or two thereafter. But for now, I can't,” he said.</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2013/07/31/15/luggage-3.jpg" alt="luggage-3.jpg" title="luggage-3.jpg" width="564" height="423" />
</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2013/07/31/15/luggage-3.jpg" alt="luggage-3.jpg" title="luggage-3.jpg" width="564" height="423" /></p>
</div>
<p>Put luggage on the floor </p>
</div>
<p>3. Don’t use a wooden luggage rack</p>
<p>Bedbugs love wood. Even though a wooden luggage rack might look nicer and more expensive than a metal one, it’s a breeding ground for bugs. Forrest Jones says guests should put the items they plan to take from bags on other pieces of furniture and leave the bag on the floor.</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/04/13/11/Lifestyle-hotels.jpg" alt="Lifestyle-hotels.jpg" title="Lifestyle-hotels.jpg" width="564" height="423" />
</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2015/04/13/11/Lifestyle-hotels.jpg" alt="Lifestyle-hotels.jpg" title="Lifestyle-hotels.jpg" width="564" height="423" /></p>
</div>
<p>The old rule of thumb is that for every 00 invested in a room, the hotel should charge in average daily rate </p>
</div>
<p>4. Hotel rooms are priced according to how expensive they were to build</p>
Expand All @@ -48,8 +52,9 @@ <h2><span></span>Business news in pictures</h2>
<h3>6. Mini bars almost always lose money</h3>
<p>Despite the snacks in the minibar seeming like the most overpriced food you have ever seen, hotel owners are still struggling to make a profit from those snacks. "Minibars almost always lose money, even when they charge $10 for a Diet Coke,” Sharon said.</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/13/16/agenda7.jpg" alt="agenda7.jpg" title="agenda7.jpg" width="564" height="423" />
</p>
<div>
<p><img src="https://static.independent.co.uk/s3fs-public/styles/story_medium/public/thumbnails/image/2014/03/13/16/agenda7.jpg" alt="agenda7.jpg" title="agenda7.jpg" width="564" height="423" /></p>
</div>
<p>Towels should always be cleaned between stays </p>
</div>
<p>7. Always made sure the hand towels are clean when you arrive</p>
Expand Down
4 changes: 2 additions & 2 deletions test/test-pages/mozilla-1/expected.html
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ <h3>Themes</h3>
<br /> <a rel="external" href="https://support.mozilla.org/kb/use-themes-change-look-of-firefox">Learn more</a>
</p>
</div>
<p><a href="#add-ons" role="button">Next</a>
<img id="theme-demo" src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/theme-red.61611c5734ab.png" alt="Preview of the currently selected theme" />
<p><a href="#add-ons" role="button">Next</a></p>
<p><img id="theme-demo" src="http://mozorg.cdn.mozilla.net/media/img/firefox/desktop/customize/theme-red.61611c5734ab.png" alt="Preview of the currently selected theme" />
</p>
</div>
</section>
Expand Down

0 comments on commit 6fed286

Please sign in to comment.