Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementations allow all values in type getter #43

Open
Manishearth opened this issue Jul 13, 2016 · 17 comments
Open

Implementations allow all values in type getter #43

Manishearth opened this issue Jul 13, 2016 · 17 comments

Comments

@Manishearth
Copy link
Member

The following code:

a = new Blob([1,2,3], {type: "text/(plain"});
console.log(a.type) // prints text/(plain

works in Chrome and Firefox. It also works if you replace ( with \x02 or something

The spec, however, asks user agents to restrict values in the range U+0020 to U+007E in the constructor, and in the getter it only allows a nonempty value to be returned if it is a parseable MIME type.

Perhaps we should relax these restrictions in the spec?

@inexorabletash
Copy link
Member

inexorabletash commented Jul 13, 2016

See also #13

FWIW, Chrome throws on non-ASCII in the ctor, e.g.

new Blob([], {type: '\x80'})

I've had a local patch to align with the spec (don't throw, just replace type with empty string if it contains anything outsize U+0020 ... U+007E, and have getter return empty on nonparsable MIME type) but I haven't landed because I agree the spec seems weird (and no-one else implements it as written)

@sicking
Copy link

sicking commented Jul 13, 2016

I think the two behaviors that would make sense to me are:

  • Don't have any restrictions at all. Just treat the string as an opaque string. Maybe add rules that say that the .type is ignored if not valid when the Blob is sent over the network as part of a FormData.
  • Ensure that the .type is a valid mimetype. Including that it is restricted to U+0020 ... U+007E, and that it parses as a valid mimetype.

I really don't care strongly though.

@Manishearth
Copy link
Member Author

FWIW, Safari follows the Chrome behavior (allow bad mime types, restrict characters), and IE follows the Firefox behavior (allow all the things)

@annevk
Copy link
Member

annevk commented Sep 25, 2017

I think it would be nicer if we made parsed and serialized. If characters are already restricted it seems there's some leeway here to fix this.

See also whatwg/mimesniff#30.

@annevk
Copy link
Member

annevk commented Sep 25, 2017

I also think we should allow parameters by the way, since some MIME types do not work without them.

@annevk
Copy link
Member

annevk commented Nov 23, 2017

@inexorabletash @mkruisselbrink @yutakahirano can we try to make some progress here? Not having a good story for Blob's type basically means MIME types cannot be tested to the extent they deserve: whatwg/mimesniff#42.

@annevk
Copy link
Member

annevk commented Nov 24, 2017

@pwnall I heard you might be able to help here as well?

@inexorabletash
Copy link
Member

FWIW, I made Chrome align (hopefully) with Firefox about 9 months ago: https://chromium.googlesource.com/chromium/src/+/0b46f3a24bf36d7cf91ec2d88132d15563b85d0e - no more character restrictions on input.

(Also FWIW, it should now be less objectionable to implement this in Blink since it won't involve adding a second MIME parser thanks to internal code reorganization.)

@annevk
Copy link
Member

annevk commented Nov 25, 2017

What I want is that a Blob holds a MIME type in an internal slot and type returns that serialized. And to be clear, that MIME type can have parameters as well, otherwise there's severe issues for multipart/* (needs boundary) and most MIME types that need charset.

And then on the input side, if a MIME type cannot be parsed out of the input, we use application/octet-stream or some such as replacement.

I'm willing to write all the tests for this if we can agree on this model.

@mkruisselbrink
Copy link
Collaborator

I don't really know enough of the history here (or where these mime types are used outside of the FileAPI spec), but what @sicking said in his earlier comment makes sense to me: either allow anything and just return it back (except maybe in some cases where it is sent over the network), or only allow valid mime types (which I think is what the spec is already trying to say?).

Not sure how @annevk's suggestion lines up with that? You're saying the constructor should only accept valid mime types, but rather than using empty string for invalid input you'd like us to use some arbitrary other mime type as replacement?

I don't like APIs that silently ignore input (which seems to be how invalid mime types are treated, either by replacing with empty string or replacing with some arbitrary replacement), so I'd lean slightly towards just allowing any string and not doing any validation, but presumably there were good reasons why the valid mime type constraints were introduced in the first place? It's unfortunate that we probably can't get away with throwing on invalid mime types...

@annevk
Copy link
Member

annevk commented Nov 28, 2017

Using the empty string (which indicates a missing MIME type) is fine. I didn't mean to suggest changing that (if someone wants to suggest that anyway, let's do that in a separate issue).

I think if we parse and then serialize we create less trouble for downstream consumers of these objects (e.g., consumers who might have parsers that could easily be broken by malicious input). And these objects would then also offer a consistent API with when the browser is responsible for generating them as the MIME type would always be valid.

@pwnall
Copy link

pwnall commented Nov 29, 2017

@annevk Sorry I didn't see your ping until now.

Here's the history that I know -- at least one implementation (WebKit) used to dump the Blob MIME type directly into a HTTP request string when sent via XHR's send(Blob) API or as FormData (I don't remember which case is covered). IIRC, when we fixed that bug in WebKit, we got the spec tightened to only allow characters in \x20-\x7F. We dropped MIME type silently, as opposed to throwing an exception, to reduce the probability of breaking existing content.

FWIW, no matter what we do, we'll need some way to spec what happens in the above cases. Some characters are dangerous (hard to reason about) if used in HTTP headers (newlines, quotes, colons). Rather than creating a footgun that downstream specs and implementations need to handle on a case-by-case basis, I'd prefer that we keep the current spec behavior or tighten it further by having the Blob constructor throw exceptions.

I'm happy to help write (non-normative?) text explaining the need for the charset restriction, or to help adjusting the Blink implementation to whatever is decided here.

@annevk
Copy link
Member

annevk commented Nov 29, 2017

whatwg/mimesniff#36 is my new specification for parsing MIME types. Any MIME type record (as defined there) serialized will be HTTP header safe.

So I think we're in agreement, other than on throwing. I strongly suspect throwing is not compatible, but I won't stop you if you want to try.

@mkruisselbrink
Copy link
Collaborator

Agreed that throwing unfortunately isn't likely to be compatible. So what you're suggesting sounds good to me.

@annevk
Copy link
Member

annevk commented Nov 29, 2017

I have created tests here: web-platform-tests/wpt#7764 (in particular the parsing.any.js resource).

@annevk
Copy link
Member

annevk commented Apr 19, 2018

Note that the tests have landed. What is still needed here is that we update the specification. And maybe file dedicated browser bugs for making use of the MIME type parser for Blob object constructors.

@andreubotella
Copy link
Member

andreubotella commented May 13, 2021

As I pointed out in #170, the current spec is broken in more ways than you might think, since parameter values in MIME types can be non-ASCII (see whatwg/mimesniff#141), and they also shouldn't be assumed to have the same meaning when lowercased (see whatwg/html#6251). If we just settle on parsing and then serializing, we wouldn't have this kind of problem with unnecessarily limiting valid MIME types.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jul 11, 2022
…ershaw

Since w3c/FileAPI#43 is still open, it is unclear how body.type should work.
The current wpts expect some behavior which isn't in the specifications.
So, since the situation is very messy in the specifications, the patch is doing a
spot fix for boundary handling. It is ugly, but shouldn't change other behavior.

Differential Revision: https://phabricator.services.mozilla.com/D150018
moz-wptsync-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jul 11, 2022
Since w3c/FileAPI#43 is still open, it is unclear how body.type should work.
The current wpts expect some behavior which isn't in the specifications.
So, since the situation is very messy in the specifications, the patch is doing a
spot fix for boundary handling. It is ugly, but shouldn't change other behavior.

Differential Revision: https://phabricator.services.mozilla.com/D150018

bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1775501
gecko-commit: 22a157de1bfa502e10ec05c9b17611f2eba47b0e
gecko-reviewers: kershaw
moz-wptsync-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jul 11, 2022
Since w3c/FileAPI#43 is still open, it is unclear how body.type should work.
The current wpts expect some behavior which isn't in the specifications.
So, since the situation is very messy in the specifications, the patch is doing a
spot fix for boundary handling. It is ugly, but shouldn't change other behavior.

Differential Revision: https://phabricator.services.mozilla.com/D150018

bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1775501
gecko-commit: 22a157de1bfa502e10ec05c9b17611f2eba47b0e
gecko-reviewers: kershaw
aosmond pushed a commit to aosmond/gecko that referenced this issue Jul 15, 2022
…ershaw

Since w3c/FileAPI#43 is still open, it is unclear how body.type should work.
The current wpts expect some behavior which isn't in the specifications.
So, since the situation is very messy in the specifications, the patch is doing a
spot fix for boundary handling. It is ugly, but shouldn't change other behavior.

Differential Revision: https://phabricator.services.mozilla.com/D150018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants