Skip to content
This repository was archived by the owner on Aug 14, 2024. It is now read-only.

Explain how to sanitize url parameters #760

Closed

Conversation

antonpirker
Copy link
Member

This is for getting rid of PII in URLs used in bread crumbs or span descriptions.

@vercel
Copy link

vercel bot commented Nov 15, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
develop ✅ Ready (Inspect) Visit Preview Nov 16, 2022 at 0:35AM (UTC)

@antonpirker antonpirker changed the title Explain how to parameterize url parameters Explain how to sanitize url parameters Nov 15, 2022
@@ -268,6 +268,12 @@ If Performance Monitoring is both supported by the SDK and enabled in the client
- span status must match HTTP response status code ([see Span status to HTTP status code mapping](/sdk/event-payloads/span/))
- when network error occurs, span status must be set to `internal_error`

The `url` in breadcumbs and `$url` in span descriptions must be stripped of sensitive data like query parameters and/or username/password.
This example URL `https://username:password@example.com/bla/blub?token=abc&sessionid=123&save=true#fragment` has to be modified to look like this `https://%s:%s@example.com/bla/blub?token=%s&sessionid=%s&save=%s#fragment`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

username:password is called userInfo IIRC, or part of the authority.

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the docs, @antonpirker. The recent addition leaves plenty of room for interpretation. Please be more specific on what sensitive data is precisely.

@@ -268,6 +268,12 @@ If Performance Monitoring is both supported by the SDK and enabled in the client
- span status must match HTTP response status code ([see Span status to HTTP status code mapping](/sdk/event-payloads/span/))
- when network error occurs, span status must be set to `internal_error`

The `url` in breadcumbs and `$url` in span descriptions must be stripped of sensitive data like query parameters and/or username/password.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, query parameters are the only sensitive data we can reliably detect. Which other sensitive data can you think of removing? If any come to mind, please mention that here. This sentence leaves a lot of room for interpretation.
Please try to be as specific as possible.

@@ -268,6 +268,12 @@ If Performance Monitoring is both supported by the SDK and enabled in the client
- span status must match HTTP response status code ([see Span status to HTTP status code mapping](/sdk/event-payloads/span/))
- when network error occurs, span status must be set to `internal_error`

The `url` in breadcumbs and `$url` in span descriptions must be stripped of sensitive data like query parameters and/or username/password.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that URL info could be in different places as well, See this.

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @antonpirker. This is already way better.


The SDK should maintain a list of query params that can include sensitive data. The default of the list should be the same list that relay uses to scrub sensitive data: https://github.com/getsentry/relay/blob/master/relay-general/src/pii/regexes.rs#L272

The values of all query parameters whose name is in the list of params with sensitive data must be scrubbed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have the list of params, we should add them here.


SDKs should also give the user the possibility to define a custom list of query param names that should be scrubbed. Given the user full control over what data gets sent do Sentry.

There should be a config option to the SDKs `init()` call that can set a list of query params that should be scrubbed. By setting this config option the default list of query params with sensitive data is overridden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to specify the name of the option, once we decided on one.

Co-authored-by: Philipp Hofmann <philipp.hofmann@sentry.io>
@@ -277,7 +277,7 @@ HTTP Client Integrations record the URLs of HTTP requests. URLs can contain two

#### Privacy Related Data

Privacy related data from the `url` in breadcumbs and `$url` in span descriptions must always be scrubbed.
The SDKs must scrub privacy-related data from the `url` in breadcumbs and `$url` in span descriptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 for specifying the subject and using the active voice.

@antonpirker
Copy link
Member Author

Will close this in favor of new PR that was started after the RFC-0038 was created: #773

@antonpirker antonpirker closed this Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants