Replies: 3 comments 8 replies
-
Thank you for writing this up! This is also a general issue concerning data sanitation across Openverse. My thoughts on what is best: ProvidersProviders already have url-safe and readable provider identifiers. These are fine to use as paths.
TagsTags are a property of the Openverse API, identifiers used for filtering and to express relationships between records, so I am of the opinion these should be sanitized and normalized to be url-safe and not require any encoding as well. CreatorsCreator names are text data belonging to our records. As such they need to be preserved exactly. These are not identifiers. The closest thing we store as a unique identifier for the creator is the creator URL. The creator name is not appropriate for a URL path. Options we could take are:
|
Beta Was this translation helpful? Give feedback.
-
Edit: I need to sit with this a bit more before replying. My original reply was a little rash and didn't actually choose from the proposed solutions. It was more of an aspirational description of how things should be rather than choosing a path forward. |
Beta Was this translation helpful? Give feedback.
-
After a comment on why having slashes in the frontend path parameters is not possible, and whether it is a Nuxt issue, I went back to the Nuxt path parsing implementation, and now I think I'm with @sarayourfriend on this. We should probably plan refactoring the paths to use URL query parameters for Frontend
|
Beta Was this translation helpful? Give feedback.
-
The additional search views IP proposed to use the path parameters for
tag
,creator
andsource
values:/images/source/flickr/
,/images/source/flickr/creator/someone
,/images/tag/tag name
. Path parameters make the URLs more readable, easier to share, easier to cache and perform cache invalidation required by #1969.However, this proposal did not take into the account the fact that the creator names and tag names can have the special characters that are significant for URL path parsing:
/
and?
. Encoding these characters usingencodeURIComponent
converts them to%2F
and%3F
, however, routers, proxy servers often treat the URL-encoded and non-URL-encoded versions of these symbols interchangeably. Browsers (specifically Firefox) also automatically decode these symbols even when they are URL-encoded. So, when you enter https://staging.openverse.org/image/source/flickr/creator/me%2Fyou in the address bar, the browser requests https://staging.openverse.org/image/source/flickr/creator/me/you.Alternative solutions
1. Encoding the values twice
Allow using special characters in creator name and tags uses a twice-encoded value for the
/
and?
special characters, both for the frontend and the API. For instance,creator/name?me
is encoded ascreator%252Fname%253Fme
. This allows us to keep using the path parameters for all of their benefits. Besides, this would not require considerable changes to the API/frontend code, and changing the API version.The drawback of this solution is that it would introduce an unexpected requirement for the users to encode the path parameters twice. It would also prevent the pattern of typing the URL in clear text and expecting the browser to correctly handle this URl.
In general, this solution is "fighting the URL path/browser standards".
2. Convert the path parameters to query parameters
Query parameters are expected to be URL-encoded, so we are using the standards.
One option is to have the following paths:
Frontend
https://openverse.org/image/collection?tag=tag
https://openverse.org/image/collection?source=flickr
https://openverse.org/image/collection?source=flickr&creator=creator%2Fname
API
https://api.openverse.engineering/v2/image/collection?tag=tag
https://api.openverse.engineering/v2/image/collection?source=flickr
https://api.openverse.engineering/v2/image/collection?source=flickr&creator=creator%2Fname
We would have to handle the edge case when the user sets all of the parameters: https://api.openverse.engineering/v2/image/collection?source=flickr&creator=creator%2Fname&tag=tag%20name . One option would be to drop the last one.
The drawbacks are:
My preference would be the first option. The main reasons are that the working solution is available in #3793, and can document the requirement to encode the path parameters in a special way (and add this note to the error message on the endpoint), and won't need to update the API version.
@sarayourfriend's preference is the second option, with the reasons given in the PR comment.
@WordPress/openverse-api , @WordPress/openverse-frontend, what do you think?
Beta Was this translation helpful? Give feedback.
All reactions