Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user/preferences/[KEY] to accept arbitrary keys #3787

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tsmock
Copy link

@tsmock tsmock commented Nov 7, 2022

This was a two-part problem:

  1. Keys with a . would have the latter part of the key interpreted as
    a format. Example: gps.trace would have the key set to
    gps and a format set to trace.
  2. The :preference_key parameter would not be URL decoded.

This PR has two commits.
The first commit adds a method (fix_request_key) that takes the request and merges the :preference_key and :format parameter keys, and then URL decodes them. This method does not work with keys like gps.trace.visibility but does work with keys like https://some.ones/api.

The second commit uses routes to [do the URL decoding, and it does not have to deal with the key being split at . characters into a key and format].

My preference would therefore be to use the routes, but on just in case the project doesn't want to have explicit route entries for user/preferences/:preference_key, I have kept the commit adding fix_request_key instead of squashing it.

EDIT: The commits have been squashed.

@tomhughes
Copy link
Member

I wasn't actually expecting us to change the behaviour! I thought you meant you were going to fix the client to not use such silly names...

@tomhughes
Copy link
Member

Also what is the point of the first commit if the second one reverts it? If you want to offer two alternatives then open two PRs don't lump them together in one PR because then if we merge it we'll add loads of useless noise to the history.

@tsmock
Copy link
Author

tsmock commented Nov 7, 2022

Also what is the point of the first commit if the second one reverts it? If you want to offer two alternatives then open two PRs don't lump them together in one PR because then if we merge it we'll add loads of useless noise to the history.

I was planning on squashing it before it got merged. I'll go ahead and do that now.

I wasn't actually expecting us to change the behaviour! I thought you meant you were going to fix the client to not use such silly names...
The whole point was to make it easier to map an API to the appropriate API key.

While MapRoulette has a primary instance (https://maproulette.org/api/v2), it is possible to have other instances (for example, Company Foo could have a private instance).

It is possible for each service have a static api key (for example, the main MR server could use maproulette_apikey_v2), but I don't think that is scalable. Hence why I was looking at apikey:<API URL>. Having the API URL as part of the key makes it trivial to map API keys to their respective services, minimizing special cases (if (api == "https://maproulette.org/api/v2"); then key = "maproulette_apikey_v2"; else if (api == "https://random.foo"); then key = "foo_random_key"; else if (...)).

With all that said, if you are against having arbitrary keys in user preferences, would you accept a PR that validated the PUT /user/preferences endpoint to not have keys that would be invalid for the /user/preferences/[KEY] endpoints?

EDIT: I'll note that this fixes the case of /user/preferences/gps.trace.visibility, where gps.trace.visibility is used by the OSM website itself. Should that be changed to something else, like gps_trace_visibility?

This was a two-part problem:
1. Keys with a `.` would have the latter part of the key interpreted as
   a format. Example: `gps.trace` would have the key set to
   `gps` and a format set to `trace`.
2. The :preference_key parameter would not be URL decoded.

Signed-off-by: Taylor Smock <tsmock@meta.com>
@gravitystorm
Copy link
Collaborator

I think the underlying problem here is that we are using the keys in our urls, without any validation to ensure that they are url safe. A similar example would be our validation of user display_names, which are validated as url safe, since we use those in urls.

   validates :user, :associated => true
-  validates :k, :v, :length => 1..255, :characters => true
+  validates :k, :length => 1..255, :characters => { :url_safe => true }
+  validates :v, :length => 1..255, :characters => true
 end

It's a bit surprising that this hasn't come up until now! And it's quite funny to me that the problem is highlighted by the keys that we use internally for gpx visibility and diary default language, both of which use "." characters 😄

Unfortunately I'm not in favour of the proposed solution. We've been trying to move away from plain-text API responses, mostly for consistency reasons between API endpoints but also for developer ergonomics (you don't want to have some responses handled by an xml or json library, but have to handle some responses that aren't). But we have also settled on the convention that you can choose between response formats using extensions e.g. GET /api/0.6/node/1.json. If we merged this PR, we would not be able to do GET /api/.../user/preferences/key.json in future.

I'm open to further discussion and any suggestions!

@tsmock
Copy link
Author

tsmock commented Nov 9, 2022

Right now, GET /user/preferences/key will (according to the wiki documentation) "[return] a string with that preference's value." Which makes sense.

So a k="foo" v="{\"json\": true}" would return {"json": true}, regardless of what format is requested (GET /user/preferences/foo.xml). The only things I can think of that could possibly change by adding a format:

  • Wrapping the value, so {"key": "foo", value: "{\"json\": true}" or <preference k="foo" v="{\"json\": true}"/>, which I don't think adds any value to users. They already know the key, and regardless of the response, they still have to do the appropriate action with the value. Just with more steps, and possibly assuming that there was some kind of server-side validation.
  • Validating the value prior to return, although this is probably unwise since the value could be of any arbitrary format (xml, json, base64 encoded binary data), which means there would have to be format sniffers of some type.

The point being that the preference value can be anything, and the application developer is going to have to deal with it (unless it is dealt with inside the website code), so I don't think this specific endpoint (/user/preferences/key) can realistically return anything besides the plain text value.

@gravitystorm
Copy link
Collaborator

So a k="foo" v="{\"json\": true}" would return {"json": true}, regardless of what format is requested

I think you've misunderstood my point, since I wasn't talking about encoding json into the values. In fact I wasn't discussing the values of the user preferences at all.

The point is that for our other API calls, we have a convention of returning responses in either XML or JSON, and not plain text. Furthermore, for calls that return multiple objects we return a list of objects, and for calls for individual objects we return the individual object, not just the "contents" of the object. Compare for example with nodes:

GET /api/0.6/nodes?nodes=123,456

<osm version="0.6" generator="...">
  <node id="123" visible="true" version="12" changeset="..."/>
  <node id="435" visible="true" version="2" changeset="..."/>
</osm>
GET /api/0.6/node/123

<osm version="0.6" generator="...">
  <node id="123" visible="true" version="12" changeset="..."/>
</osm>

Note that for an individual node request we return the whole node object, even though you could argue that the id is redundant and there's no need to return the object, you could just return the attributes. Now we compare with preferences:

GET /api/0.6/user/preferences

<osm version="0.6" generator="...">
  <preferences>
    <preference k="gps.trace.visibility" v="trackable"/>
    <preference k="MerkaartorBookmark000" v="London;..."/>
  </preferences>
</osm>

It seems reasonable (and good for consistency) that asking for a single preference should return a <preference> object, not just one of the attributes of the preference. The same thought process equally applies to JSON format requests and responses (again, this is nothing to do with the contents of the value attribute).

Now at the moment we don't provide a structured responses for individual preference requests, but the reason that I don't like this proposal is that it will prevent us from doing so in future.

In particular, we need to be able to return structured error messages, which is a common complaint about parts of our API. We can't return error messages in a structured format without giving the developers the ability to make format-specific requests! I.e. if you want xml structured error messages then you need to be able to make XML formatted requests (e.g. request urls ending in .xml), and so if requests have a format then the non-error responses should be formatted to match. Again, this highlights why we are moving away from plain-text responses and the importance of allowing format-specific request urls.

@tsmock
Copy link
Author

tsmock commented Nov 10, 2022

It seems reasonable (and good for consistency) that asking for a single preference should return a object, not just one of the attributes of the preference. The same thought process equally applies to JSON format requests and responses (again, this is nothing to do with the contents of the value attribute).

In this case, it sounds like the /user/preferences/key endpoint should be deprecated, since I don't think there is a way to change the current semantics without breaking current users of that endpoint, at least if key.fmt is the desired path forward.

If you want to return a structured message for this specific endpoint, maybe the Accept request header would be the best way to structure the message, and this would allow users to opt-in to the change.

Issues I can see with having .fmt in the URL would be where someone decides to add (maliciously perhaps), something that is a duplicate of a pre-existing key, but with the format. What behavior should be exhibited?

Example:

<osm version="0.6" generator="...">
  <preferences>
    <preference k="gps.trace.visibility" v="private"/>
    <preference k="gps.trace.visibility.json" v="identifiable"/>
    <preference k="gps.trace.visibility.xml" v="identifiable"/>
    <preference k="key.json" v="{\"some\":\"json\"}"/>
    <preference k="key" v="{\"some\":\"unexpected json\"}"/>
  </preferences>
</osm>

Anyway, I'm working on adding support for response formats on that endpoint. Using Accept headers, since I don't think there are any good ways to put the format in the URL without leading to unexpected behavior.

@mmd-osm
Copy link
Collaborator

mmd-osm commented Nov 10, 2022

We could instead add an additional optional filter to GET /api/0.6/user/preferences, like: GET /api/0.6/user/preferences?key=my.key

This should also play nicely with different output formats, e.g. GET /api/0.6/user/preferences.json?key=my.key

@tsmock tsmock marked this pull request as draft November 10, 2022 18:39
…ed key

This does break compatibility if an application requested json/xml in
the Accept header and did not actually expect json/xml back.

Signed-off-by: Taylor Smock <tsmock@meta.com>
@tsmock tsmock marked this pull request as ready for review November 10, 2022 19:21
@tsmock
Copy link
Author

tsmock commented Nov 10, 2022

OK. The GET /user/preferences/key endpoint can now return XML/JSON responses, if and only if the consumer indicates that they will take the XML/JSON response in the Accept request header, otherwise it will return a plain text response.

There might be a better way to detect/change the request format, but the ways I tried didn't work out (:defaults => { :format => "txt" } completely overrode the Accept header).

So I think I've addressed @gravitystorm's problems by getting the endpoint to return XML/JSON responses, unless the .fmt support is a hard requirement.

@gravitystorm
Copy link
Collaborator

unless the .fmt support is a hard requirement.

Yeah, I pretty much think it's a hard requirement - or at least (as I said above) not doing anything now that makes it impossible to implement in future. I think it would be bad to have a convention on our API (using format extensions like .xml and .json that apply to some, but not all, of the endpoints.

Two other things spring to mind:

  • Since the handling of such keys is currently broken, then we know that nobody is using it. This gives us flexibility to fix it properly. For example, if we say you can no longer access keys with . in them in the individual preference API call, that's fine since it doesn't currently work and we won't break any consumers by changing that today.
  • Even if we decide to support . in key names, we currently allow # in key names and they absolutely can't be used in a url. Requests like GET /api/0.6/user/preferences/something#here won't work, since here is not sent to the server, even using curl.

So I iterate back to the situation where we either need to implement the url_safe character set for keys, or we find some way of encoding the keys such that they can work in urls. This could either be as part of the path (as discussed here) or as a query parameter (as suggested by @mmd-osm , which unfortunately also doesn't support #).

@gravitystorm gravitystorm added the api Related to the XML or JSON APIs label Nov 16, 2022
@mmd-osm
Copy link
Collaborator

mmd-osm commented Nov 16, 2022

or as a query parameter (as suggested by [me], which unfortunately also doesn't support #).

Encoding the hash character as %23 should do the job (e.g. encodeURIComponent('#') ). I would hope Rails handles the resulting string properly.

Signed-off-by: Taylor Smock <tsmock@meta.com>
@tsmock
Copy link
Author

tsmock commented Nov 16, 2022

Supporting # is easy -- the consumer just has to URL encode the key and then we have to URL decode. This is (effectively) what my patch currently does (it supports URL decoding). The sticking point has been multiple . in the URL (fixed by not having a (.:format) in the route, and I wasn't able to reproduce outside of our code base).

I do have a hack to allow for .fmt on this API call. It pretty much just checks the trailing characters for .json or .xml and then sets the return format. I don't like it, but I'll go ahead and push it.

I'm looking into better options, but something, somewhere, is causing issues when I do that. Which I'm trying to fix by stepping through the code.

@gravitystorm
Copy link
Collaborator

Can the dots in the key be url encoded? E.g. /user/preference/foo%2Ebar.xml for the key foo.bar?

@tsmock
Copy link
Author

tsmock commented Nov 16, 2022

Can the dots in the key be url encoded? E.g. /user/preference/foo%2Ebar.xml for the key foo.bar?

Technically, yes. %2E is .. See RFC3986 section 2.1 for details (all ASCII characters can be URL encoded, syntax is %<char hex>)

Unfortunately, most common URL encoders that people use will not encode . by default.

…encoded

Clients should URL encode '.' manually when their chosen encoder does
not. The hex code for '.' is 0x2e, so % encoding would make it %2e (or
%2E).

Signed-off-by: Taylor Smock <tsmock@meta.com>
@gravitystorm
Copy link
Collaborator

Technically, yes. %2E is .. See RFC3986 section 2.1 for details (all ASCII characters can be URL encoded, syntax is %<char hex>)

I was just reading section 2.3 "Unreserved Characters" and this concerns me:

Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.

URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.

It then goes on to say:

[they] should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.

So that sounds like gps.trace.visibility and gps%2Etrace%2Evisibility will be treated as the same thing, and so we can't tell use the urlencode approach for these unreserved characters.

@gravitystorm
Copy link
Collaborator

I'm wondering how many preferences, other than the ones in this codebase, use problematic characters from the set INVALID_URL_CHARS = "/;.,?%#".freeze, and what it would take to migrate in order to stop using them. Then all this stuff would be straightforward.

@tsmock
Copy link
Author

tsmock commented Nov 17, 2022

Honestly, I need to figure out why our code doesn't like multiple .s.

I wrote a quick fake site to check and see if it was an upstream bug, and it does not appear to be.

So I'm going to have to figure out why our code doesn't like multiple . in the URL.

@pnorman
Copy link
Contributor

pnorman commented Nov 17, 2022

So that sounds like gps.trace.visibility and gps%2Etrace%2Evisibility will be treated as the same thing, and so we can't tell use the urlencode approach for these unreserved characters.

This is my understanding too.

@tsmock
Copy link
Author

tsmock commented Nov 17, 2022

So I think I found a fix for the multiple . problem in the preference key.

I still don't know why the resources :user_preferences [...] would cause issues with ..

@gravitystorm
Copy link
Collaborator

I don't understand what you are trying to achieve with the "multiple . problem" - can you explain it further? Dropping the resources and moving to hand-crafted routes is a bit of a code smell, so I want to make sure it's worthwhile.

@tsmock tsmock marked this pull request as draft November 18, 2022 14:04
@tsmock
Copy link
Author

tsmock commented Nov 18, 2022

The problem is that user/preferences/gps.trace.visibility will lead to a routing error (it cannot find the route) if resources is used. I'm intending to do some debugging to figure out why that is the case, and hopefully file a PR with the appropriate repo (probably https://github.com/rails/rails, if I had to guess).

This is (partially) fallout from supporting .fmt, I suspect.

It seems to me that this is a bug with the upstream resources behavior. EDIT: This is intended behavior, apparently. It still seems odd to me that a something.with.dots.format would kill routing.

Anyway, I'll keep fiddling with the code to get something that (a) works and (b) doesn't repeat itself several times. EDIT: Done.

Signed-off-by: Taylor Smock <tsmock@meta.com>
@tsmock tsmock marked this pull request as ready for review November 18, 2022 14:33
@gravitystorm
Copy link
Collaborator

The problem is that user/preferences/gps.trace.visibility will lead to a routing error (it cannot find the route) if resources is used.

But I still don't understand what you are trying to achieve! I thought we'd clarified that we can't use any unescaped . characters from the keys in the urls, since we need . for the format separator. So whether rails resource matching can parse multiple . in the url isn't going to affect us - the only . we will see in the url will be the format separator. Not any . from a key.

If you are trying to support both (optional) format extensions and also dotted-keys in the url, then I believe that's a dead-end. You can't distinguish these two different situations:

  • /user/preferences/foo.json - a request for the foo key, response should be json format
  • /user/preferences/foo.json - a request for the foo.json key

Similarly if you consider keys with multiple . characters

  • /user/preferences/foo.bar.xml - a request for the gps.visibility.xml key
  • /user/preferences/foo.bar.xml - a request for the gps.visibility key in XML format

I don't want to see a special-case for the User Preferences API along the lines of "please note that if you have a . in your key and also the characters after the final dot happen to match one of the formats that we support (currently xml and json) then your key cannot be accessed through the direct users preferences API" or something like this. Reading between the lines, I think that's what you might be trying to achieve, but it's not clear to me.

I don't want you to spend more of your time and effort working on debugging something that turns out to be a dead end or not useful, so I recommend that we clarify what you are trying to do before you jump in with more coding.

@tsmock
Copy link
Author

tsmock commented Nov 29, 2022

But I still don't understand what you are trying to achieve! I thought we'd clarified that we can't use any unescaped . characters from the keys in the urls, since we need . for the format separator.

Sorry, that might be a misunderstanding due to issuecomment-1318982892.

In any case, the current code should handle the case where %2e gets decoded to . prior to the extension being handled.

If you are trying to support both (optional) format extensions and also dotted-keys in the url, then I believe that's a dead-end. You can't distinguish these two different situations:

I agree. The problem is that it is a "hard" requirement to have the ability to support .fmt on all endpoints. Which pretty much means I should make certain that .fmt works now. It is kind of why I initially didn't want to support .fmt at all.

Using Accept request headers is fine. Which leads to the question, why didn't we just use the Accept headers in the first place?

I don't want to see a special-case for the User Preferences API along the lines of "please note that if you have a . in your key and also the characters after the final dot happen to match one of the formats that we support (currently xml and json) then your key cannot be accessed through the direct users preferences API" or something like this. Reading between the lines, I think that's what you might be trying to achieve, but it's not clear to me.

I did not want to do any special casing at all. My original intent was to support arbitrary keys for the endpoint. Unfortunately, if we do support .fmt, we will of necessity need to deal with the edge case of adding another format (.pbf) or forcing clients to URL encode . themselves.

Current options:

  • Require that clients URL encode . to %2e and hope some future update of rails doesn't bork it by url decoding prior to doing the routing
    • I don't like this since it isn't exactly documented behavior. It works right now.
  • Don't support .fmt on this endpoint and instead require clients use Accept
  • Don't support .fmt on this endpoint and instead allow clients to specify an Accept header for error codes
    • Clients can check the return type and go from there (if [ media_type == "text/plain" ]; do ok_stuff; else handle_error; fi)
  • Special case .json/.xml so that .fmt works and allow Accept headers
    • This is where we have to start worrying about key, key.json, and key.xml and future formats. Example: key.txt. Unfortunately, we cannot use a regex like (.+?)(?=(\..*|)$) This would get all extensions, but would also get stuff like key.not_an_ext. We could try limiting the length of the extension (.+?)(?=(\..{,4}|)$), but we still have potential edge cases.

I don't want you to spend more of your time and effort working on debugging something that turns out to be a dead end or not useful, so I recommend that we clarify what you are trying to do before you jump in with more coding.

Thank you for the consideration. I've been busy the past week or so fixing JOSM tickets prior to (what I had hoped to be) a release week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the XML or JSON APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants