-
-
Notifications
You must be signed in to change notification settings - Fork 37
Improve regular expression for removing trailing slashes #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
st.js
Outdated
| // trailing slash removal to fix Node.js v23 bug | ||
| // https://github.com/nodejs/node/pull/55527 | ||
| // can be removed when this is resolved and released | ||
| return path.join(this.path, u.replace(/\/+$/, '')) | ||
| return path.join(this.path, u.replace(/(?<!\/)\/+$/, '')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the negative look-behind gets us in the wrong place here because it'll not match multiple slashes, voiding the + all together. I think the right answer here is to just avoid regexes entirely; plus the original comment should be changed since we've now essentially baked in normalised paths that help us get consistent rendering when returning a directory listing.
| // trailing slash removal to fix Node.js v23 bug | |
| // https://github.com/nodejs/node/pull/55527 | |
| // can be removed when this is resolved and released | |
| return path.join(this.path, u.replace(/\/+$/, '')) | |
| return path.join(this.path, u.replace(/(?<!\/)\/+$/, '')) | |
| // Normalise paths by removing trailing slashes | |
| // This ensures consistent paths for directory content rendering | |
| while (u.length > 0 && u[u.length - 1] === '/') { | |
| u = u.slice(0, -1) | |
| } | |
| return path.join(this.path, u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the negative look-behind gets us in the wrong place here because it'll not match multiple slashes, voiding the
+all together.
That is incorrect, you can test it with let u = "test//"; u.replace(/(?<!\/)\/+$/, '') which will return "test". This is because it will match 1-or-more / starting from a / that is preceded by a non-/. The negative look-behind only applies to the start of \/+, not every \/ in the sequence (though I understand the confusion).
I think the right answer here is to just avoid regexes entirely;
Having said the above, happy to adopt your change if that's what you prefer.
plus the original comment should be changed since we've now essentially baked in normalised paths that help us get consistent rendering when returning a directory listing.
And I'll update this regardless of whether we keep the regex or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @rvagg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i didn't realise I needed a follow-up here -- tbh I'd prefer to just avoid the regex, I think it's going to be much quicker anyway and it's much more self-documenting than making the regex more complicated. I appreciate the effort though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem - updated now, see d45d312
|
Well, thanks for the education, I didn't realise we didn't have $ anchored optimisations in V8, so your critique makes sense in that light. But as per the comment, I think the solution is to just not use regexes for something this simple. |
Never heard this term before 🤔 Are you referring to V8's non-backtracking RegExp engine? |
|
No, just the fact that there's a |
Update the regular expression that is used to remove trailing slashes
from the path in `Mount#getPath` to avoid super-linear runtimes. I
believe this does not affect the `st` export because the only path to
this API is through `Mount#serve` which sanitizes the `req.url` using
`path.normalize` (in `Mount#getUriPath`). However, the `Mount` class
is also exported directly, through which `getPath` could be called on
unsanitized inputs.
The super-linear runtime could occur because the regex engine will
backtrack through a repetition of '/'s if it is not at the end. This
is resolved by anchoring a repetition '/' on a non-'/' with a negative
lookbehind.
The issue can be tested and reproduced using this script:
const st = require('st')
const path = require('path')
const mount = new st.Mount({
path: path.join(__dirname, '/static'),
url: '/static'
})
for (let i = 0; i < 90_000; i += 10_000) {
console.time('timer')
mount.getPath(`a${"/".repeat(i)}a`)
console.timeEnd('timer')
}
Co-authored-by: Rod Vagg <rod@vagg.org>
Update the regular expression that is used to remove trailing slashes from the path in
Mount#getPath- introduced in #98 - to avoid super-linear runtimes1. I believe this does not affect thestexport because the only path to this API is throughMount#servewhich sanitizes thereq.urlusingpath.normalize(inMount#getUriPath). However, theMountclass is also exported directly, through whichgetPathcould be called on unsanitized inputs.The super-linear runtime could occur because the regex engine will backtrack through a repetition of '/'s if it is not at the end. This is resolved by anchoring a repetition '/' on a non-'/' with a negative lookbehind.
The issue can be tested and reproduced using this script:
Footnotes
while some consider this an availability problem, I decided to contribute this update in the open because 1) it only affects an undocumented API, and 2) this project lacks a security policy. ↩