Skip to content
This repository was archived by the owner on Jan 29, 2020. It is now read-only.

Encode user info when calling withUserInfo() #248

Merged
merged 4 commits into from
Aug 22, 2017
Merged

Encode user info when calling withUserInfo() #248

merged 4 commits into from
Aug 22, 2017

Conversation

DASPRiD
Copy link
Member

@DASPRiD DASPRiD commented May 10, 2017

According to https://www.ietf.org/rfc/rfc3986.txt 3.2.1, the user info parts should actually be URL encoded.

if ($password) {
$info .= ':' . $password;
$info .= ':' . $this->filterUserInfoPart($password);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch will fix most cases, however, if the password contains an apparently url encoded string literal, for example, "%3A" that will be decoded as ":" and therefore fail authentication. In discussion with @DASPRiD, fixing this would be a BC break.

Copy link

@jeremiahsmall jeremiahsmall May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, maybe we can change this to a BC compatible solution by not using self::CHAR_UNRESERVED and self::CHAR_SUB_DELIMS for user info. Instead, we can/should/must assume that input is already urlencoded if it comes from the uri string, and can/should/must assume it's not if it is set via withUserInfo method.

        $info = urlencode($user);
        if ($password) {
            $info .= ':' . urlencode($password);
        }

Now, if that would be considered bc breaking, we could add a third optional arg to withUserInfo. E.g.

public function withUserInfo($user, $password = null, $urlEncoded = true)
    {
   // snip

But I'd argue that if someone is already urlencoding this input, they're working around the bug, and we shouldn't worry about BC, and/or we just accept it as BC breaking, and increase the version.

Copy link

@jeremiahsmall jeremiahsmall May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In discussion with @DASPRiD we realized that since the PSR-7 interface doesn't include the third argument, that suggests we shouldn't add it in Diactoros, even though PSR-7 does not specify if the input should be url encoded or not.

Also, @DASPRiD argues that using the core PHP urlencode is overkill because it encodes too many characters, but agrees that always encoding for the user info inputs is correct since it would be a pretty normal case for a password to contain a character combination that looks like an encoded character, but is in fact a string literal.

We agree that it's okay to always assume that a character combination that looks url encoded in the path component should not be re-encoded. That is, the way it functions already is fine for the path. The userInfo case is special.

test/UriTest.php Outdated
$new = $uri->withUserInfo('foo:bar', 'baz:bat');

$this->assertSame('foo%3Abar:baz%3Abat', $new->getUserInfo());
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably either a second assertion, or add %25 to the first input to cover that it's not getting accepted as urlencoded input.

public function testWithUserInfoDoesNotExpectUrlencodedCharactersInUsernameAndPassword()
{
    $uri = new Uri('https://localhost');
    $new = $uri->withUserInfo('%25', '%25');

   $this->assertSame('%2525:%2525', $new->getUserInfo());
}

@weierophinney
Copy link
Member

I've created a new commit that adds a data provider with both user and credential values to test against, as well as the expected value of getUserInfo() following, and pushed that; it covers the suggested test by @jeremiahsmall as well.

The one issue I see with this is that it can lead to issues if you use the value retrieved from getUserInfo() to seed a new instance (e.g., $instance = $instance->withUserInfo($previous->getUserInfo()); you'll essentially get double-encoding (as evidenced by the test @jeremiahsmall suggested, where %25 becomes %2525).

Interestingly, @akrabat ran into this issue this week as well, when using an email address as the user information (e.g., user@example.com).

My suggestion is that any encoding is done when calling getAuthority() (which is used internally by createUriString() to return the authority value, which includes user-info, host, and port); this allows passing the raw values to withUserInfo(), and for getUserInfo() to return the same data it receives.

In reviewing the code, I also see that the parseUri() implementation is buggy, because it does not urldecode() the user and pass segments, which it likely should to prevent issues when consuming these values. (Interestingly, PHP's parse_url() function will accept unencoded values here and still parse the URL correctly!)

To be honest, parseUri() likely should not filter the path, query, and fragment, and instead delegate those operations to createUriString(). However, I think that's a topic for another PR.

@akrabat @DASPRiD and @jeremiahsmall : Do my suggestions here make sense?

@weierophinney weierophinney added this to the 1.4.1 milestone Aug 16, 2017
@jeremiahsmall
Copy link

The new data provider in the test looks good. Not sure what to make of the suggestion about encoding when calling getAuthority() without more time to reflect on that.

@weierophinney
Copy link
Member

I'm going to move this to the 1.5.0 milestone, so we can also address the other issues related to parseUri(); I think consistency in how we handle these is best, and I'd rather the various getter methods return the raw values, and that casting to string filters/normalizes them for that context. This prevents double-encoding issues

The only "con" I can see is that doing so puts the onus on the user to encode if they wish to use those values as part of a URI string). I think we can message that in the docblocks, however.

@weierophinney weierophinney modified the milestones: 1.5.0, 1.4.1 Aug 17, 2017
@jeremiahsmall
Copy link

jeremiahsmall commented Aug 17, 2017

So are we still saying that the arguments in withUserInfo() would be non-encoded? If they would need to be encoded by the user, the variable names should reflect that. Eg:

public function withUserInfo($user, $password = null)
    {
   // snip

vs

public function withUserInfo($urlEncodedUser, $urlEncodedPassword = null)
    {
   // snip

@weierophinney
Copy link
Member

If they're not encoded, why would we indicate they're encoded in the variable name?

@jeremiahsmall
Copy link

Right, well I guess that answers my question... I was making sure that you're not suggesting that these be encoded.

@weierophinney
Copy link
Member

I'm going to move this to the 1.5.0 milestone, so we can also address the other issues related to parseUri(); I think consistency in how we handle these is best, and I'd rather the various getter methods return the raw values, and that casting to string filters/normalizes them for that context. This prevents double-encoding issues

Okay, re-read the specification, and this is not possible. Each of the docblocks for getPath(), getQuery(), and getFragment() in the UriInterface indicate that they should return the percent encoded values, and prevent double-encoding. As such, the current approach suggested by @DASPRiD in this pull request is correct: percent-encode the value as it comes in, and any call to getUserInfo() and/or getAuthority() should return that encoded value. This will keep the workflow consistent with the other URI segments.

What this means, though, is that we need to ensure that the filterUserInfoPart() does not double-encode. We have tests in place for the other URI segments that use encoding, so I can use those as a basis for understanding how to both test and implement that functionality.

DASPRiD and others added 2 commits August 22, 2017 14:15
Allows testing common combinations of characters that may occur in both
user and credential strings to ensure they are encoded properly.
@weierophinney weierophinney changed the base branch from master to develop August 22, 2017 19:16
Adds `%` to the initial character exclusion set in order to allow the
latter `%(?!...)` pattern to match correctly, and thus prevent
double-encoding of values.
@weierophinney
Copy link
Member

@jeremiahsmall I've isolated the double-encoding issue, and have now pushed a test that verifies that calling filterUserInfoPart() no longer double-encodes. With that change, and assuming that the changes pass continuous integration, I can merge this finally.

@weierophinney weierophinney merged commit a0112a9 into zendframework:develop Aug 22, 2017
weierophinney added a commit that referenced this pull request Aug 22, 2017
weierophinney added a commit that referenced this pull request Aug 22, 2017
@weierophinney
Copy link
Member

Thanks, @DASPRiD and @jeremiahsmall

@jeremiahsmall
Copy link

+1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants