Skip to content

perf: improve URL parsing performance #9694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

anonrig
Copy link
Contributor

@anonrig anonrig commented Nov 28, 2023

Follow up for #9676.

Without introducing any breaking changes, this is what I came up with that is in par with current regexp parsing. I'm not quite happy with it, but I think it's better than the current approach.

cc @AbhiPrasad

@anonrig anonrig requested review from a team, stephanie-anderson and ale-cota and removed request for a team November 28, 2023 22:12
}
}

const match = url.match(urlRegex);

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data

This [regular expression](1) that depends on [library input](2) may run slow on strings with many repetitions of '"'. This [regular expression](1) that depends on [library input](3) may run slow on strings with many repetitions of '"'. This [regular expression](1) that depends on [library input](4) may run slow on strings with many repetitions of '"'.
@anonrig anonrig force-pushed the improve-url-parsing branch from 1dcba82 to 67676d6 Compare November 28, 2023 22:20
Copy link
Contributor

github-actions bot commented Nov 28, 2023

size-limit report 📦

Path Size
@sentry/browser (incl. Tracing, Replay) - Webpack (gzipped) 65.94 KB (+0.12% 🔺)
@sentry/browser (incl. Tracing, Replay) - Webpack with treeshaking flags (gzipped) 59.55 KB (+0.15% 🔺)
@sentry/browser (incl. Tracing) - Webpack (gzipped) 31.2 KB (+0.27% 🔺)
@sentry/browser - Webpack (gzipped) 21.46 KB (+0.4% 🔺)
@sentry/browser (incl. Tracing, Replay) - ES6 CDN Bundle (gzipped) 62.79 KB (+0.12% 🔺)
@sentry/browser (incl. Tracing) - ES6 CDN Bundle (gzipped) 29.61 KB (+0.3% 🔺)
@sentry/browser - ES6 CDN Bundle (gzipped) 21.67 KB (+0.36% 🔺)
@sentry/browser (incl. Tracing, Replay) - ES6 CDN Bundle (minified & uncompressed) 197.69 KB (+0.12% 🔺)
@sentry/browser (incl. Tracing) - ES6 CDN Bundle (minified & uncompressed) 89.55 KB (+0.26% 🔺)
@sentry/browser - ES6 CDN Bundle (minified & uncompressed) 64.52 KB (+0.36% 🔺)
@sentry/browser (incl. Tracing) - ES5 CDN Bundle (gzipped) 32.3 KB (+0.27% 🔺)
@sentry/react (incl. Tracing, Replay) - Webpack (gzipped) 66.32 KB (+0.12% 🔺)
@sentry/react - Webpack (gzipped) 21.51 KB (+0.4% 🔺)
@sentry/nextjs Client (incl. Tracing, Replay) - Webpack (gzipped) 83.05 KB (+0.11% 🔺)
@sentry/nextjs Client - Webpack (gzipped) 48.3 KB (+0.18% 🔺)
@sentry-internal/feedback - Webpack (gzipped) 16.19 KB (0%)

Copy link
Contributor

@mitsuhiko mitsuhiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is not so much performance as bundle size. Since we are not running this in any perf critical path I would prefer the smaller size.

@anonrig anonrig force-pushed the improve-url-parsing branch from 67676d6 to 07f0572 Compare November 28, 2023 22:55
@anonrig
Copy link
Contributor Author

anonrig commented Nov 28, 2023

The goal here is not so much performance as bundle size. Since we are not running this in any perf critical path I would prefer the smaller size.

@mitsuhiko The current implementation is prone to bugs. Regexp approach is wrong and results in bugs. For example currently we don't handle http://username:@localhost or http://:password@localhost when redacting URLs. The only reason this PR increased the bundle size, is because of not changing the API of parseUrl. But if bundle size is a blocker, we can wait for the next major release where we'll be dropping v8 and v10 support and easily use the global URL and change the API structure.

urlInstance: URL;
}>;

const urlRegex = /^(([^:/?#]+):)?(\/\/([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an external reference to the source of this regexp that would make sense to add here? I'm asking because these look daunting (at least to me), but I find that they can usually be a good way to educate, for example maybe by linking to the standard that this is supposed to support.

@anonrig anonrig marked this pull request as draft November 29, 2023 14:23
@mydea
Copy link
Member

mydea commented Apr 17, 2024

@anonrig I think we could use URL now, in current v8. Should we close this PR (and possibly create a follow up using newer APIs)?

@anonrig anonrig closed this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants