Open
Description
The playground has a proxy to help load HTTP contexts from the HTTPS playground:
https://github.com/json-ld/json-ld.org/blob/main/playground/proxy.php
There are some issues with this:
- The proxy could be abused. It should have more checks to make it only useful for the playground.
- It might have security issues. (See Prevent potentially dangerous behaviour within proxy script #754)
- It doesn't work well!
- It currently follows redirects via curl. But it only returns content, not all headers. In some cases this might work, others not so much.
- A current failure case is HTTP schema.org which redirects to HTTPS then returns HTML with a link header. That header and others are not returned. And if they were, the link target ref is to a relative file to schema.org. The current code and XHR doc loader that rewrites the target URL to a proxy URL would interpret the link target as a relative URL to the playground. There are multiple problems here at different levels.
- The current fix for the schema.org issue above is to rewrite that particular HTTP URL to HTTPS. But other sites with similar issues would fail.
Ideally the proxy would not be needed, but if the playground is to be HTTPS, then a workaround to load HTTP resources is needed.
I think the longer term fixes that are needed are:
- Simplify proxy to only do a single request and return what it gets. Do not follow redirects, let the caller handle that.
- Improve proxy to only handle content types the playground needs. At least via headers, but maybe content inspection too.
- Add other proxy features to make it only useful for the playground.
- Update XHR document loader with some of the node features to handle redirects if needed. Also consider either a native fetch doc loader, or ensure the node one works in a browser since it's indirectly based on fetch API now.
- May need to make doc loaders proxy aware so link targets work.
- Improve the special rewrite rule for schema.org to be more general in case it's needed for other situations.
- Ensure cache headers get passed through and are used properly.
See also: #798