Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Update) Remove XSS cleaner and remove XSS vulnerabilities #3222

Open
wants to merge 1 commit into
base: 8.x.x
Choose a base branch
from

Conversation

Roardom
Copy link
Collaborator

@Roardom Roardom commented Nov 6, 2023

We've been mostly relying on the 3rd party xss cleaner to make sure user submitted content is clean. This PR fixes up any leftover holes in the bbcode parser that allow xss vulnerabilities, and as a result, the 3rd party library isn't needed anymore. It cleans responsibly by first, running htmlspecialchars() over the content, followed by validating any untrusted input used inside html attributes. It validates urls by returning them as redirects relying on the browser not supporting executable protocols (like javascript:).

@Roardom
Copy link
Collaborator Author

Roardom commented Nov 20, 2023

It validates urls by returning them as redirects relying on the browser not supporting executable protocols (like javascript:).

This is the definition of an open redirect vulnerability which is unacceptable and will require rethinking.

Copy link

what-the-diff bot commented Jan 5, 2024

PR Summary

  • Enhanced BBCode Regular Expression
    The old code checker for the namedquote configuration has been improved to allow any character in the quote tag.

  • Removal of namedlink BBCode configuration
    The namedlink BBCode configuration has been removed from the system.

  • Significant improvements to parse method
    This method now sanitizes and effectively encodes special characters in our source string. It also converts BBCode tags like url, img, and video to equivalent HTML tags. The HTML tags produced from this conversion are more secure with attributes like href and src properly sanitized. Additionally, performance has been improved by removing some redundant block elements.

  • Introducing sanitizeURL method
    A new method to make sure URLs are safe and valid before use.

  • Optimized Livewire components
    Changes have been made to how the Livewire components handle data entry. These components now directly update or create comments and replies with sanitized text, improving security and integrity of data.

  • Removed redundant set methods from models
    Certain methods from various models like Article, Message, Playlist, etc. were unnecessary and hence have been eliminated.

  • Dependency cleanup
    A dependency named voku/anti-xss that is no longer needed has been removed from the project.

  • Database Update
    A new migration file is added to correctly decode certain fields in the database records. This fixes and normalises these fields to ensure consistency across the application.

@Roardom
Copy link
Collaborator Author

Roardom commented Jan 5, 2024

It validates urls by returning them as redirects relying on the browser not supporting executable protocols (like javascript:).

This is the definition of an open redirect vulnerability which is unacceptable and will require rethinking.

Open redirect vulnerabilities are only valid for links ([url]). [img] bbcode could technically still be implemented like this correctly, but [url] bbcode should be sent to a page that counts down from e.g. 15 seconds before redirecting to the link to avoid this vulnerability. I've opted to whitelist protocols instead to avoid this issue, but in the future this can be perhaps redesigned.

@Roardom Roardom force-pushed the bbcode-improvements branch 2 times, most recently from a852f82 to 8ab3967 Compare March 1, 2024 12:53
@HDVinnie HDVinnie closed this Mar 27, 2024
@Roardom Roardom reopened this May 26, 2024
@Roardom Roardom changed the base branch from 7.x.x to 8.x.x May 26, 2024 16:34
@Roardom Roardom force-pushed the bbcode-improvements branch 2 times, most recently from 2017c0b to ea8be74 Compare July 3, 2024 09:00
We've been mostly relying on the 3rd party xss cleaner to make sure user submitted content is clean. This PR fixes up any leftover holes in the bbcode parser that allow xss vulnerabilities, and as a result, the 3rd party library isn't needed anymore. It cleans responsibly by first, running `htmlspecialchars()` over the content, followed by sanitizing the untrusted urls and whitelisting their protocol.
@Roardom Roardom marked this pull request as ready for review July 3, 2024 09:12
@HDVinnie HDVinnie self-requested a review July 4, 2024 01:26
@Roardom
Copy link
Collaborator Author

Roardom commented Jul 6, 2024

Hey @JoshyPHP

It's been awhile but I remember you responding to UNIT3D comments before and I just saw on another repo that you reviewed their BBCode implementation. Is there any chance you would be willing to do a brief lookover of the code here to see if I missed anything in terms of XSS prevention? I would greatly appreciate it.

@JoshyPHP
Copy link

JoshyPHP commented Jul 7, 2024

@Roardom I wouldn't consider myself an XSS expert but for what it's worth, I gave this PR a half-decent look-over and didn't find anything. If you're concerned about XSS and exploits based on user input in general, I would consider URL-encoding non-ASCII characters and Punycoding host names to avoid homoglyphs, but modern browsers already present host names punycoded so it's not a requirement.

By the way, I'm subscribed to this repo and I almost missed your mention because GitHub doesn't differentiate between normal PR churn and user mentions. If you tag me in a discussion and I don't reply within 48 hours, assume that I missed it and feel free to ping me again and/or try alternative methods.

@Roardom
Copy link
Collaborator Author

Roardom commented Jul 8, 2024

@Roardom I wouldn't consider myself an XSS expert but for what it's worth, I gave this PR a half-decent look-over and didn't find anything. If you're concerned about XSS and exploits based on user input in general, I would consider URL-encoding non-ASCII characters and Punycoding host names to avoid homoglyphs, but modern browsers already present host names punycoded so it's not a requirement.

By the way, I'm subscribed to this repo and I almost missed your mention because GitHub doesn't differentiate between normal PR churn and user mentions. If you tag me in a discussion and I don't reply within 48 hours, assume that I missed it and feel free to ping me again and/or try alternative methods.

Thank you so much for checking it out. If I understand correctly, we need to decode $parts = parse_url($url), utf8_to_idn($parts['host']) and urlencode($parts['query']) and then build the query again? Looking over the built-in php function parse_url(), it seems like it does not fully implement the relevant specs and fails at random exception cases (e.g. relative urls). Do you have any recommendations on deserializing the given urls correctly and only encoding the specific parts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants