Fixes for recently reported issues by frenzymadness · Pull Request #28 · fedora-python/lxml_html_clean

frenzymadness · 2026-02-26T08:57:15Z

This is kinda urgent, so I'm requesting review from multiple teammates at the same time.

As members of fedora-python org, you should have access to the privately reported vulnerabilities that this is going to solve. We can also discuss the issues and the solution proposed here in person, if you want to.

Cc: @uug4na

Unicode escapes in CSS were not properly decoded before security checks. This prevents attackers from bypassing filters using escape sequences.

<base> tags are now automatically removed whenever <head> is removed to prevent URL hijacking attacks. According to HTML spec, <base> must be in <head>, but browsers may interpret misplaced <base> tags, allowing attackers to redirect all relative URLs to malicious servers.

uug4na

Thanks for the quick fix - decoding hex escapes covers the @\69mport / \000069 case nicely.

One small concern: CSS escapes aren’t only hex-based. A backslash can also escape a single character (e.g. \i -> i), so variants like @\impor\t or @\i mport may still be interpreted by browsers as @import. Since backslashes are no longer stripped, those might slip past _has_sneaky_javascript().

Maybe worth either fully normalizing CSS escapes or handling remaining backslashes after hex decoding. I can add test cases like <style>@\impor\t url(evil.css)</style> if helpful.

frenzymadness · 2026-02-26T10:42:29Z

I thought about it and tested cases like:

<div style="background:url(ja\vascript:alert(1));">INLINE TEST 1</div>

or

<div style="@\import url(https://evil.invalid/x.css);">INLINE TEST 2</div>

but they were neither rendered without the backslash nor executed by my browser. But sure, I can put the removal of all backslash characters back and run it after the escape sequences are decoded.

uug4na · 2026-02-26T11:54:32Z

Appreciate you testing it. +1 to re-adding backslash removal after decoding, to cover non-hex escapes too.

…unicode escapes

befeleme

Maint commits: ✔️
sanitation - code-wise, functionality-wise: ✔️
CSS @import sanitizer: code-wise, functionality-wise: ✔️

I'm not familiar with the topic well enough to figure out if there are missing use cases, but for the ones listed in the report and covered in tests, I verified that the vulnerability existed and is remedied with the code from this PR.
If you're going to add more robust backslash handling, I'll be happy to re-review.

frenzymadness · 2026-02-26T15:52:24Z

I've restored the old behavior and added more tests in two new commits for easier review.

tests/test_clean.py

befeleme · 2026-02-26T16:02:01Z

tests/test_clean.py

+        test_cases = [
+            # Tab after escape
+            ('<div style="@\\69\tmport url(evil.css)">test</div>', '<div>test</div>'),
+            # Newline after escape (note: actual newline, not \n)


(note: actual newline, not \n) - I see \n just a line below, I don't understand the comment

I understand it like it's neither r"\n" nor "\\n".

frenzymadness added 4 commits February 25, 2026 22:30

Add missing Python 3.14 to classifiers

d134556

Implement unicode escape decoding

5d48ba6

Unicode escapes in CSS were not properly decoded before security checks. This prevents attackers from bypassing filters using escape sequences.

Prepare release 0.4.4

c9b82ba

frenzymadness requested review from befeleme, hrnciar, hroncok and stratakis February 26, 2026 08:57

frenzymadness self-assigned this Feb 26, 2026

uug4na approved these changes Feb 26, 2026

View reviewed changes

Restore the removal of all backslashes from styles after decoding of …

67e029f

…unicode escapes

befeleme approved these changes Feb 26, 2026

View reviewed changes

befeleme reviewed Feb 26, 2026

View reviewed changes

tests/test_clean.py Outdated Show resolved Hide resolved

befeleme reviewed Feb 26, 2026

View reviewed changes

Add more tests for different combinations of backslashes and unicode

8620e3c

frenzymadness force-pushed the fixes branch from 9746150 to 8620e3c Compare February 26, 2026 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for recently reported issues#28

Fixes for recently reported issues#28
frenzymadness wants to merge 6 commits intomainfrom
fixes

frenzymadness commented Feb 26, 2026

Uh oh!

uug4na left a comment

Uh oh!

frenzymadness commented Feb 26, 2026

Uh oh!

uug4na commented Feb 26, 2026

Uh oh!

befeleme left a comment •

edited

Loading

Uh oh!

frenzymadness commented Feb 26, 2026

Uh oh!

Uh oh!

befeleme Feb 26, 2026

Uh oh!

frenzymadness Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

frenzymadness commented Feb 26, 2026

Uh oh!

uug4na left a comment

Choose a reason for hiding this comment

Uh oh!

frenzymadness commented Feb 26, 2026

Uh oh!

uug4na commented Feb 26, 2026

Uh oh!

befeleme left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frenzymadness commented Feb 26, 2026

Uh oh!

Uh oh!

befeleme Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

frenzymadness Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

befeleme left a comment •

edited

Loading