-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Strip markup from app_name if instance_name_has_markup = True #28894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip markup from app_name if instance_name_has_markup = True #28894
Conversation
airflow/www/app.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to also un-escape the value? (for & etc) Maybe it’s better to use an HTML parser for this. Or maybe that’s an overkill. No idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to also un-escape the value? (for & etc)
Something like https://docs.python.org/3/library/html.html#html.unescape ?
From the docs:
Convert all named and numeric character references (e.g.
>,>,>) in the string s to the corresponding Unicode characters. This function uses the rules defined by the HTML 5 standard for both valid and invalid character references, and the list of HTML 5 named character references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing html with regex can be tricky. Please see this answer https://stackoverflow.com/a/4869782/2610955 and also https://blog.codinghorror.com/parsing-html-the-cthulhu-way/. The question has several options and also discusses handling & . Since beautifulsoup is already a dependency maybe it can be used.
python
Python 3.10.6 (main, Aug 2 2022, 15:11:28) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bs4
>>> from bs4 import BeautifulSoup
>>> b = BeautifulSoup("<b>Bold Site Title Test</b>")
>>> b.get_text()
'Bold Site Title Test'There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I had a devel installation and beautifulsoup is a devel dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's why I wonder if this would be an overkill. It's not terribly difficult to implement text extraction with html.parser (stdlib) but whether it's worthwhile is still susceptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is easier than I thought!
import html.parser
def strip_tags(inp: str) -> str:
parts: list[str] = []
class TagStripParser(html.parser.HTMLParser):
def handle_data(self, d: str) -> None:
parts.append(d)
TagStripParser().feed(inp)
return "".join(parts)>>> strip_tags("<b>Bold Site <span>Title</span> Test &</b>")
'Bold Site Title Test &'There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used @uranusjr as suggestion for bc21a02. I really didn't know where to put the method. Is there a better place?
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use Markup() class from markupsafe instead?
@ashb We can't. Markup only escapes html text, which is already done by Flask (see tests). The goal of my PR would be to remove HTML tags prior to being escaped by Flask, so it doesn't look ugly. An alternative would be to have something like |
If you put markup in the attribute and don't want it escaped: don't put markup in the attribute. Stripping didn't seem worth it when the user can just do it selves. |
|
Oh sorry I see. We have it on title and on the page as the h1 |
|
Exactly, we use the same attribute for both, which makes this a bit tricky |
798d17b to
51bdd9b
Compare
|
Isn't striptags() better to use here? Instead of implementing our own... |
664924b to
3f38004
Compare
a1df630 to
dc9dd28
Compare
dc9dd28 to
46177e2
Compare
|
I switched the stripping implementation to use Markupsafe. One less thing to maintain. |
appbuilder.app_name if instance_name_has_markup = TrueCo-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit 971e322)
Closes #28888
if
webserver.instance_name_has_markup = True, we strip markup fromflask_app.config["APP_NAME"], preventing it to show up as a sanitized HTML in the<title>tag.