-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add escape parameter to to_html() #2919
Conversation
Good addition, but I think there are actually 3 cases to be handled, and Escaping <> by default is important to prevent XSS. your PR Running the following example in qtconsole/ipnb (or opening the HTML in a browser,I guess) import pandas as pd
class AsHtml(object):
def _repr_html_(self):
return pd.DataFrame(["<b>str<ing1 & < &a</b>"]).to_html()
AsHtml() currently produces
and with your PR it produces:
It seems to me that when a user has HTML in his frame and is using So there are 3 cases:
I think 1 should stay the default, and 2/3 should become possible by |
Thanks for the feedback. It's interesting because I've come to embrace black/white escaping rules (i.e., text is markup or not) based on experience with web frameworks. Mixing In web frameworks, the general practice is to explicitly identify markup and treat all other text as suspect. Below are two examples of how you'd add markup to non-markup text with
Furthermore, one must explicitly identify strings as Markup to prevent double-escaping.
I'd be interested to hear your feedback on the following patterns and whether there is a better API for pandas users to safely markup DataFrame content.
|
That's nice, but I'm not sure that addresses the problem I mentioned. Let me put it this way: suppose a user of pandas is happy with the way If you're aware of security issues that make merely <> escaping ineffective, Also note that GH displays '&' when you write '&' in a comment, and GH |
I'm going 180 on this. In a nutshell, you can't tell what markup context the output might somehow end up embedded in It seems an unlikely attack vector, but common experience shows that's bad reasoning. |
Merge status? |
it's the right thing to do but i'm afraid it'd really inconvenience some users. |
For strict compatibility with 0.10.1, either a third escape mode can be added (e.g., 'compat' in addition to True/False), in which & is not escaped, or & escaping can be removed from this PR. The main goal was to add an escape parm to give users the ability to restore 0.10.0 behavior. |
I'm comfortable with you merging as long as you put something in the What's New so we can point to it |
Rebased to master and added a few words to RELEASE.rst and v0.11.0.txt. Let me know if I only need to update one or the other. |
merged, thanks! |
Failing in master on p3.3
|
Shucks, I can look into it later today. |
FYI These are failing in 3.2
|
Ok, |
cherry picked in master. Try and get travis-ci installed, it'll do the py3 testing for you |
Treating DataFrame content as plain text, rather than HTML markup, by escaping
everything (#2617) seems like the right default for
to_html()
, however, if a DataFrame contains HTML (example) or text already HTML escaped, it results in either unwanted escaping or double-escaping.Changes in this PR:
to_html()
parameter namedescape
(default True), allowing users to restore oldto_html()
behavior (<=0.10.0) by settingescape=False
.&
to the list of HTML chars escaped, so strings that happen to containHTML escape sequences or reserved entities, such as
"<"
, are displayedproperly.