Skip to content

Conversation

danielmorell
Copy link
Collaborator

Description of the change

This PR resolves an issue where out of order nested query string values could fail to be properly scrubbed.

The reason was we were parsing then rebuilding the query string to see if they matched. If they did we would decode and scrub the string as if it was a query string.

However, this caused issues because the nested keys would be order together, when the encoded string did not strictly require that. This solves that problem by partially decoding the string and comparing that with the decoded and rebuilt query string with a sorted key order.

As part of this change we also no longer encode all query strings that we send in the payload in the RFC3986 percent encoding sytanx. Instead we send the query string with the more human readable non-encoded characters.

Type of change

  • Bug fix (non-breaking change that fixes an issue)

Related issues

Checklists

Development

  • Lint rules pass locally
  • The code changed/added as part of this pull request has been covered with tests
  • All tests related to the changed code pass in development

Code review

  • This pull request has a descriptive title and information useful to a reviewer. There may be a screenshot or screencast attached
  • "Ready for review" label attached to the PR and reviewers assigned
  • Issue from task tracker has a link to this pull request
  • Changes have been reviewed by at least one other engineer

@danielmorell danielmorell requested a review from brianr October 4, 2025 18:37
Copy link
Member

@brianr brianr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmorell This part I'm not understanding:

As part of this change we also no longer encode all query strings that we send in the payload in the RFC3986 percent encoding sytanx. Instead we send the query string with the more human readable non-encoded characters.

Doesn't that mean that other characters that previously would have been encoded (and need to be encoded), will no longer be encoded?

Codex's review noted something similar:

- [P0] Keep percent-encoding when rebuilding query — src/Scrubber.php:233-236
  Calling urldecode() here means any percent-encoded characters (for example %26 representing &, %2B
representing +, etc.) are decoded before we hand the string back. Because we re-insert the string
without re-encoding, a query like token=abc%26def now becomes token=abc&def, which splits the value
into a brand-new parameter and permanently corrupts the payload we send. This regresses behaviour
for every query string that contains encoded separators, so we need to drop the urldecode() (and
accompanying str_replace) and keep the percent-encoding intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrubbing bug causing PII leaks in Rollbar PHP library
2 participants