Skip to content
This repository has been archived by the owner on Jul 18, 2020. It is now read-only.

Review TargetExtractor class #10

Closed
acoburn opened this issue Sep 23, 2019 · 6 comments
Closed

Review TargetExtractor class #10

acoburn opened this issue Sep 23, 2019 · 6 comments

Comments

@acoburn
Copy link

acoburn commented Sep 23, 2019

Be aware that this code is pretty brittle and easy to circumvent.

One can URLencode the slash character (/ => %2F) which can be used to get around the conditional check, potentially, allowing one to do some filesystem navigation that should be strictly disallowed. Instead of decodeURI, I think you mean decodeURIComponent.

Also, here you are assuming that an ACL resource involves a particular path-based construct. That necessarily excludes any query parameter-based approach. (I don't know if you want to leave that option open?)

It seems that non-alphanumeric characters are disallowed from domain names, including underscores? Is that a necessary restriction? Also, the VALID_HOST regex is pretty simplistic for a concrete class like this. (Just note, that I'm almost always 👎 on using regular expression in server code, so take that with a grain of salt)

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Sep 23, 2019

Be aware that this code is pretty brittle and easy to circumvent.

Yes, needs TODOs.

I think you mean decodeURIComponent.

Indeed, that's a bug. Need tests for that. (That's the good thing of having this as a separate class.)

you are assuming that an ACL resource involves a particular path-based construct.

Unfortunately, yes; see solid/specification#31

That necessarily excludes any query parameter-based approach.

Such as? (I'm not entirely following here.)

It seems that non-alphanumeric characters are disallowed from domain names, including underscores? Is that a necessary restriction?

Erring on the safe side for now. Meant to be extended and tested accordingly. Needs a note.

I'm almost always 👎 on using regular expression in server code

For which reasons? DDOS attacks? Good to have a length validation before a regex.

@acoburn
Copy link
Author

acoburn commented Sep 23, 2019

That necessarily excludes any query parameter-based approach.

Such as? (I'm not entirely following here.)

Trellis uses a query-based URL for ACL resources. I.e. given an LDP resource at https://example.com/resource, the acl header would be Link: <https://example.com/resource?ext=acl>; rel="acl".

I'm almost always 👎 on using regular expression in server code

For which reasons? DDOS attacks? Good to have a length validation before a regex.

There can be subtle bugs and unexpected assumptions hidden in regular expressions, especially if they are complex. (I have seen a lot of problems in the wild, here). I will add that I use regular expressions all the time in scripts that I run locally. It is just that, for server code, I try to be more strict.

@RubenVerborgh
Copy link
Contributor

Trellis uses a query-based URL for ACL resources.

I'm personally definitely open to that; I actually dislike the identification based on URL pattern. We need to discuss that at solid/specification#31

for server code, I try to be more strict.

The alternative being looping over the string? Not sure if less error-prone; both implementations need lots of test cases anyway. Curious for examples in the wild if you have them (low priority).

@acoburn
Copy link
Author

acoburn commented Sep 23, 2019

In this case, a regex is probably fine. I've seen regex-based host validators that aim for RFC-level compliance, and they are veritable monsters. My personal "don't use regular expression rule" is more of a general rule of thumb: one that has plenty of exceptions in practice. I just try not to reach for a regex out of habit or convenience.

@michielbdejong
Copy link
Contributor

OK so can you summarize what works need to be done here?

@acoburn
Copy link
Author

acoburn commented Sep 25, 2019

  1. Change decodeURL => decodeURLComponent (this is a bug)
  2. For the path-based construct, I was just noting that the interface enshrines a particular pattern. That isn't necessarily a problem, but that decision should at least be noted in the documentation.
  3. The VALID_HOST regex is fine but it's a bit simple. You might want to add a TODO noting that it could be improved.

michielbdejong added a commit that referenced this issue Sep 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants