Strip markup from app_name if instance_name_has_markup = True #28894

IAL32 · 2023-01-12T16:37:46Z

if webserver.instance_name_has_markup = True, we strip markup from flask_app.config["APP_NAME"], preventing it to show up as a sanitized HTML in the <title> tag.

uranusjr · 2023-01-13T07:25:10Z

airflow/www/app.py

Do we want to also un-escape the value? (for & etc) Maybe it’s better to use an HTML parser for this. Or maybe that’s an overkill. No idea.

Do we want to also un-escape the value? (for & etc)

Something like https://docs.python.org/3/library/html.html#html.unescape ?

From the docs:

Convert all named and numeric character references (e.g. >, >, >) in the string s to the corresponding Unicode characters. This function uses the rules defined by the HTML 5 standard for both valid and invalid character references, and the list of HTML 5 named character references.

Parsing html with regex can be tricky. Please see this answer https://stackoverflow.com/a/4869782/2610955 and also https://blog.codinghorror.com/parsing-html-the-cthulhu-way/. The question has several options and also discusses handling & . Since beautifulsoup is already a dependency maybe it can be used.

python Python 3.10.6 (main, Aug 2 2022, 15:11:28) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import bs4 >>> from bs4 import BeautifulSoup >>> b = BeautifulSoup("Bold Site Title Test") >>> b.get_text() 'Bold Site Title Test'

Sorry, I had a devel installation and beautifulsoup is a devel dependency.

Yeah that's why I wonder if this would be an overkill. It's not terribly difficult to implement text extraction with html.parser (stdlib) but whether it's worthwhile is still susceptive.

Actually it is easier than I thought!

import html.parser def strip_tags(inp: str) -> str: parts: list[str] = [] class TagStripParser(html.parser.HTMLParser): def handle_data(self, d: str) -> None: parts.append(d) TagStripParser().feed(inp) return "".join(parts)

>>> strip_tags("Bold Site Title Test &") 'Bold Site Title Test &'

Used @uranusjr as suggestion for bc21a02. I really didn't know where to put the method. Is there a better place?

ashb

Can we use Markup() class from markupsafe instead?

IAL32 · 2023-01-14T09:46:33Z

Can we use Markup() class from markupsafe instead?

@ashb We can't. Markup only escapes html text, which is already done by Flask (see tests).

The goal of my PR would be to remove HTML tags prior to being escaped by Flask, so it doesn't look ugly.

An alternative would be to have something like instance_title which does not contain any markup, but I imagine that's a whole other config entry that we might not want to have.

ashb · 2023-01-14T10:06:46Z

The goal of my PR would be to remove HTML tags prior to being escaped by Flask, so it doesn't look ugly.

If you put markup in the attribute and don't want it escaped: don't put markup in the attribute.

Stripping didn't seem worth it when the user can just do it selves.

ashb · 2023-01-14T10:07:28Z

Oh sorry I see. We have it on title and on the page as the h1

IAL32 · 2023-01-14T10:09:27Z

Exactly, we use the same attribute for both, which makes this a bit tricky

airflow/www/app.py

BasPH · 2023-02-13T11:47:53Z

Isn't striptags() better to use here? Instead of implementing our own...

…= True`

uranusjr · 2023-03-16T08:59:59Z

I switched the stripping implementation to use Markupsafe. One less thing to maintain.

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit 971e322)

IAL32 requested review from ashb, bbovenzi and ryanahamilton as code owners January 12, 2023 16:37

boring-cyborg bot added the area:webserver Webserver related Issues label Jan 12, 2023

eladkal added this to the Airflow 2.5.2 milestone Jan 13, 2023

uranusjr reviewed Jan 13, 2023

View reviewed changes

ashb reviewed Jan 14, 2023

View reviewed changes

uranusjr reviewed Jan 19, 2023

View reviewed changes

airflow/www/app.py Outdated Show resolved Hide resolved

potiuk force-pushed the ac/28888-instance-name-title-markup branch from 798d17b to 51bdd9b Compare January 20, 2023 22:28

uranusjr approved these changes Jan 31, 2023

View reviewed changes

Taragolis force-pushed the ac/28888-instance-name-title-markup branch from 664924b to 3f38004 Compare February 18, 2023 17:46

pierrejeambrun modified the milestones: Airflow 2.5.2, Airflow 2.5.3 Feb 28, 2023

pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Mar 1, 2023

ephraimbuddy modified the milestones: Airflow 2.5.2, Airflow 2.5.3 Mar 10, 2023

IAL32 mentioned this pull request Mar 15, 2023

webserver.instance_name shows markup text in <title> tag #28888

Closed

2 tasks

potiuk approved these changes Mar 15, 2023

View reviewed changes

potiuk force-pushed the ac/28888-instance-name-title-markup branch 2 times, most recently from a1df630 to dc9dd28 Compare March 16, 2023 00:57

IAL32 added 3 commits March 16, 2023 09:54

Strip Markup from appbuilder.app_name if `instance_name_has_markup …

f7be5e2

…= True`

Fix

daefbf2

Added suggestion

a1a5eb6

IAL32 added 2 commits March 16, 2023 09:54

Fix

3f4d313

Fix

46177e2

potiuk force-pushed the ac/28888-instance-name-title-markup branch from dc9dd28 to 46177e2 Compare March 16, 2023 08:54

Use Markupsafe to strip tags

715eaca

uranusjr changed the title ~~Strip markup from appbuilder.app_name if instance_name_has_markup = True~~ Strip markup from app_name if instance_name_has_markup = True Mar 16, 2023

uranusjr merged commit 971e322 into apache:main Mar 16, 2023

IAL32 deleted the ac/28888-instance-name-title-markup branch March 16, 2023 12:03

pierrejeambrun pushed a commit that referenced this pull request Mar 23, 2023

Strip markup from app_name if instance_name_has_markup = True (#28894)

46f36db

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit 971e322)

pierrejeambrun mentioned this pull request Mar 27, 2023

Status of testing of Apache Airflow 2.5.3rc2 #30337

Closed

28 tasks

Strip markup from app_name if instance_name_has_markup = True #28894

Strip markup from app_name if instance_name_has_markup = True #28894

Uh oh!

Conversation

IAL32 commented Jan 12, 2023

Uh oh!

uranusjr Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

IAL32 Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

tirkarthi Jan 14, 2023

Choose a reason for hiding this comment

Uh oh!

tirkarthi Jan 14, 2023

Choose a reason for hiding this comment

Uh oh!

uranusjr Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uranusjr Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IAL32 Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

IAL32 commented Jan 14, 2023

Uh oh!

ashb commented Jan 14, 2023

Uh oh!

ashb commented Jan 14, 2023

Uh oh!

IAL32 commented Jan 14, 2023

Uh oh!

Uh oh!

BasPH commented Feb 13, 2023

Uh oh!

uranusjr commented Mar 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

uranusjr Jan 16, 2023 •

edited

Loading

uranusjr Jan 16, 2023 •

edited

Loading