-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: disallow two-byte characters in URL path #16237
Changes from 1 commit
844a470
eab523c
58ac26f
60b3290
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
This commit changes node's handling of two-byte characters in the path component of an http URL. Previously, node would just strip the higher byte when generating the request. So this code: ``` http.request({host: "example.com", port: "80", "/\uFF2e"}) ``` would request `http://example.com/.` (`.` is the character for the byte `0x2e`). This is not useful and can in some cases lead to filter evasion. With this change, the code generates `ERR_UNESCAPED_CHARACTERS`, just like space and control characters already did.
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,20 +52,20 @@ const errors = require('internal/errors'); | |
// checks can greatly outperform the equivalent regexp (tested in V8 5.4). | ||
function isInvalidPath(s) { | ||
var i = 0; | ||
if (s.charCodeAt(0) <= 32) return true; | ||
if (s.charCodeAt(0) <= 32 || s.charCodeAt(0) > 0xFF) return true; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. V8 strings are not UTF-8, but UTF-16 (UCS2). If it was UTF-8, this would not be necessary, as each byte in an UTF-8 multibyte sequence is What happens is that the string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it's not that simple. See #13296 (comment) for an explanation. The two word summary is "it depends" (on the encoding of the body, but that doesn't fit in two words.) Aside: calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, thanks for pointing me to that issue. I thought a bit about something like that while writing the patch, but I couldn't find out where the encoding is actually set. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least as long as we aren't using chunked encoding (no idea what that is :)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the delay, missed your replies. I don't know if your exact proposal would work but take a look at the You may also want to look at the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This raises the question if we actually want to support non-latin1 encoded request path? Maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I spent some time browsing through open issues yesterday and there's already a bug logged against this behaviour. (Would need to go back to find it again.) IMO what mscdex said above makes sense, just cap it at 127 instead. The current behaviour is completely unpredictable and broken. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's the issue I was mentioning #13296 — there's some good discussion there too. I would prefer what I said above, at least for 9.x (maybe different solution for LTS), but I'll defer to @bnoordhuis and others with more experience with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's in fairly widespread use. Rejecting it outright would almost certainly result in quite a bit of fallout and lots of frustrated users. Rejecting malformed UTF-8 should be acceptable and appropriate, though. (But as discussed, what to accept and what to reject depends on the encoding.) |
||
if (++i >= s.length) return false; | ||
if (s.charCodeAt(1) <= 32) return true; | ||
if (s.charCodeAt(1) <= 32 || s.charCodeAt(1) > 0xFF) return true; | ||
if (++i >= s.length) return false; | ||
if (s.charCodeAt(2) <= 32) return true; | ||
if (s.charCodeAt(2) <= 32 || s.charCodeAt(2) > 0xFF) return true; | ||
if (++i >= s.length) return false; | ||
if (s.charCodeAt(3) <= 32) return true; | ||
if (s.charCodeAt(3) <= 32 || s.charCodeAt(3) > 0xFF) return true; | ||
if (++i >= s.length) return false; | ||
if (s.charCodeAt(4) <= 32) return true; | ||
if (s.charCodeAt(4) <= 32 || s.charCodeAt(4) > 0xFF) return true; | ||
if (++i >= s.length) return false; | ||
if (s.charCodeAt(5) <= 32) return true; | ||
if (s.charCodeAt(5) <= 32 || s.charCodeAt(5) > 0xFF) return true; | ||
++i; | ||
for (; i < s.length; ++i) | ||
if (s.charCodeAt(i) <= 32) return true; | ||
if (s.charCodeAt(i) <= 32 || s.charCodeAt(i) > 0xFF) return true; | ||
return false; | ||
} | ||
|
||
|
@@ -121,7 +121,7 @@ function ClientRequest(options, cb) { | |
if (path.length <= 39) { // Determined experimentally in V8 5.4 | ||
invalidPath = isInvalidPath(path); | ||
} else { | ||
invalidPath = /[\u0000-\u0020]/.test(path); | ||
invalidPath = /[\u0000-\u0020\u0100-\uffff]/.test(path); | ||
} | ||
if (invalidPath) | ||
throw new errors.TypeError('ERR_UNESCAPED_CHARACTERS', 'Request path'); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
// Copyright Joyent, Inc. and other Node contributors. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: copyright is not needed for new tests. |
||
// | ||
// Permission is hereby granted, free of charge, to any person obtaining a | ||
// copy of this software and associated documentation files (the | ||
// "Software"), to deal in the Software without restriction, including | ||
// without limitation the rights to use, copy, modify, merge, publish, | ||
// distribute, sublicense, and/or sell copies of the Software, and to permit | ||
// persons to whom the Software is furnished to do so, subject to the | ||
// following conditions: | ||
// | ||
// The above copyright notice and this permission notice shall be included | ||
// in all copies or substantial portions of the Software. | ||
// | ||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS | ||
// OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF | ||
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN | ||
// NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | ||
// DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR | ||
// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE | ||
// USE OR OTHER DEALINGS IN THE SOFTWARE. | ||
|
||
'use strict'; | ||
const common = require('../common'); | ||
const http = require('http'); | ||
|
||
common.expectsError(() => { | ||
http.request({ | ||
path: '/thisisinvalid\uffe2' | ||
}).end(); | ||
}, { | ||
code: 'ERR_UNESCAPED_CHARACTERS', | ||
type: TypeError | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above this function needs an update as well