-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derive "safeHtml" from all "bodyHtml" values #3168
Conversation
d33f0a2
to
43055b4
Compare
43055b4
to
380b69b
Compare
380b69b
to
8c28a01
Compare
source_url = proxy.first("sourceUrl", quiet=True) | ||
encoding = proxy.first("encoding", quiet=True) | ||
entity["safeHtml"] = sanitize_html(html, source_url, encoding=encoding) | ||
entity["safeHtml"] = [ | ||
sanitize_html(value, source_url, encoding=encoding) for value in html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a utility function that we wrote, or a third party created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a utility we wrote wrapping the Cleaner class from lxml:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
We need this fix to preserve the original order of email parts: alephdata/followthemoney#1148
8c28a01
to
694f4de
Compare
Please do not merge this. This is currently blocked until alephdata/followthemoney#1148 is merged and released.
Fixes #3163.