You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the partition_html function is returning javascript code in some html documents. The goal of this issue is to update our partitioning logic so that this javascript code doesn't come through in the example document.
For the document in question, it looks like the offending javascript is actually coming back in a <td> tag. is_possible_narrative_text is also flagging this block as narrative, which isn't right. I think we actually are already the script tags.
'(function(d){\n var js, id = \'facebook-jssdk\'; if (d.getElementById(id)) {return;}\n js = d.createElement(\'script\'); js.id = id; js.async = true;\n js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";\n d.getElementsByTagName(\'head\')[0].appendChild(js);\n}(document));'
Updating the issue description to reflect the last comment
MthwRobinson
changed the title
partition_html is returning javascript code from <script> tagspartition_html is returning javascript code from some HTML documents
Feb 15, 2023
Currently, the
partition_html
function is returning javascript code in some html documents. The goal of this issue is to update our partitioning logic so that this javascript code doesn't come through in the example document.Steps to reproduce
You should see the following javascript code in
elements[1].text
'(function(d){\n var js, id = \'facebook-jssdk\'; if (d.getElementById(id)) {return;}\n js = d.createElement(\'script\'); js.id = id; js.async = true;\n js.src = "//connect.facebook.net/en_US/all.js#xfbml=1";\n d.getElementsByTagName(\'head\')[0].appendChild(js);\n}(document));'
The text was updated successfully, but these errors were encountered: