Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIME type parameter parsing tests #7764

Merged
merged 17 commits into from
Dec 7, 2017
Merged

Conversation

annevk
Copy link
Member

@annevk annevk commented Oct 13, 2017

In the hope that the charset parameter is a good proxy for the whole thing. Would be good to run these through more endpoints though.

Relates to issues 30-41 on https://github.com/whatwg/mimesniff/issues.

@ghost
Copy link

ghost commented Oct 13, 2017

Build PASSED

Started: 2017-12-05 11:25:26
Finished: 2017-12-05 11:31:00

This report has been truncated because the number of unstable tests exceeds GitHub.com's character limit for comments (65536 characters).

Failing Jobs

  • chrome:unstable
  • MicrosoftEdge:14.14393

Unstable Browsers

Browser: "Microsoftedge 14.14393" (failures allowed)

View in: WPT PR Status | TravisCI

In the hope that the charset parameter is a good proxy for the whole thing. Would be good to run these through more endpoints though.

Relates to issues 30-41 on https://github.com/whatwg/mimesniff/issues.
@wpt-pr-bot
Copy link
Collaborator

There are no owners for this pull request. Please reach out on W3C's irc server (irc.w3.org, port 6665) on channel #testing (web client) to get help with this. Thank you!

@domenic
Copy link
Member

domenic commented Nov 29, 2017

Things I don't see tested:

  • Code points > U+00FF
  • Type longer than 127
  • Subtype longer than 127
  • Multiple slashes before ;
  • Multiple consecutive ;
  • Types and subtypes containing the full range of HTTP token code points, including especially punctuation
  • Maybe these should be generated somehow; not sure how to square that with the JSON-ness:
    • Exhaustive tests for types and subtypes that contain code points less than U+00FF that are not HTTP token code points
    • Exhaustive tests for parameters that contain code points less than U+00FF that are not not HTTP quoted-string token code points
      • Both quoted and unquoted would be ideal
  • Server-side tests which ensure the server only sees the parsed-then-serialized output. Ideally in a way that doesn't go through other canonicalizations like Blob or Response... maybe <form> can help here? XHR? Would be good to test as many code paths as possible. [Addressed through https://github.com/Adjust XMLHttpRequest Content-Type handling #8422 per IRC discussion.]

@annevk
Copy link
Member Author

annevk commented Nov 29, 2017

I still don't buy that 127 is an actual thing. No code has it and it seems rather arbitrary.

Server-side tests which ensure the server only sees the parsed-then-serialized output.

What code path would this test? Note that Response/Request only store the parsed output internally for generating Blob objects...

Do you want to block on having most of those tests or can they be follow-up? I left whatwg/mimesniff#45 open quite intentionally. It requires some additional thought into how to structure the tests as not all places will react to non-ASCII input the same way.

@domenic
Copy link
Member

domenic commented Nov 29, 2017

I still don't buy that 127 is an actual thing. No code has it and it seems rather arbitrary.

I think in general when changing a standard it's a good idea to test that the changes you made are supported by browsers, even if the old version doesn't have tests backing it up.

What code path would this test? Note that Response/Request only store the parsed output internally for generating Blob objects...

The idea is to test the networking code, which at least in Chrome I believe uses a completely different MIME type parser from other parts of the codebase.

Do you want to block on having most of those tests or can they be follow-up?

No need to block, but the more tests are in place before I start writing a JS implementation, the better.

@annevk
Copy link
Member Author

annevk commented Nov 29, 2017

@domenic you're talking about the MIME type the server sees though. There's already quite a few tests that ensure the browser networking stack doesn't touch the value of a custom Content-Type header. So I wonder what API entry point you have in mind.

@domenic
Copy link
Member

domenic commented Nov 29, 2017

Well so for example fetch or, perhaps more directly, XHR, can send Content-Type headers. Probably they should only send ones that have been parsed-then-serialized.

@annevk
Copy link
Member Author

annevk commented Nov 29, 2017

No they shouldn't (and we already test this). We don't want to embed header-specific knowledge in the networking library. That would prevent web pages from experimenting with new formats.

@annevk
Copy link
Member Author

annevk commented Nov 29, 2017

There's a specific case with XMLHttpRequest where we do modify the value and in that case it's indeed parsed and serialized; that's #8422.

@annevk
Copy link
Member Author

annevk commented Nov 30, 2017

Generating tests with a Python script is doable. The way to do those tests would be to write a Python script that generates a JSON resource. Ensure both are committed and ensure ci_built_diff.sh runs the Python script so modifications that do not keep the script and generated resource in sync are detected. Documenting this here mostly for future reference, since I won't get to it today.

const mime = val.input;
async_test(t => {
const frame = document.createElement("iframe"),
expectedEncoding = val.encoding === "" ? "UTF-8" : val.encoding;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think null, instead of the empty string, should be used to signify no parsed encoding.

"type/subtype longer than 127",
{
"input": "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789/0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789",
"output": "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789/0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only entry that parses correctly but does not have an "encoding" line. The unpredictable data format made it harder to write a test runner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

navigable and encoding are both optional and only make sense in specific contexts. I'll document it in more clearly in the README.

"navigable": true,
"encoding": "GBK"
},
"Single quotes (invalid)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes are valid per https://pr-preview.s3.amazonaws.com/whatwg/mimesniff/pull/36.html#http-token-code-point, so these next three tests seem wrong.

@domenic
Copy link
Member

domenic commented Nov 30, 2017

Coverage report at https://lcov-report-sngrqbipez.now.sh. Missing one branch in the parser, plus any mixed-case tests. Pretty good!

@domenic
Copy link
Member

domenic commented Nov 30, 2017

Oh, I guess several "else" paths were never taken, mostly where the input ends prematurely. Those are more serious coverage gaps.

@annevk
Copy link
Member Author

annevk commented Dec 1, 2017

I added generated tests. I think we have ample coverage now for a v0/v1 of this.

},
{
"input": "x/x;x=\t;bonus=x",
"output": "x/x;x=\"\t\";bonus=x"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My parser says this should be "x/x;bonus=x". Trailing whitespace gets removed from the parameter value, then it becomes the empty string, so it's omitted.

},
{
"input": "x/x;x= ;bonus=x",
"output": "x/x;x=\" \";bonus=x"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My parser says this should be "x/x;bonus=x". Trailing whitespace gets removed from the parameter value, then it becomes the empty string, so it's omitted.

"Valid",
{
"input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz",
"output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"w" is missing in a few places here; the input has 1 w (should be 2) and the output has 0.

},
{
"input": "x/x;x=\" !\\\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u0080\u0081\u0082\u0083\u0084\u0085\u0086\u0087\u0088\u0089\u008A\u008B\u008C\u008D\u008E\u008F\u0090\u0091\u0092\u0093\u0094\u0095\u0096\u0097\u0098\u0099\u009A\u009B\u009C\u009D\u009E\u009F\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF\"",
"output": "x/x;x=\" !\\\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u0080\u0081\u0082\u0083\u0084\u0085\u0086\u0087\u0088\u0089\u008A\u008B\u008C\u008D\u008E\u008F\u0090\u0091\u0092\u0093\u0094\u0095\u0096\u0097\u0098\u0099\u009A\u009B\u009C\u009D\u009E\u009F\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF\""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output slashes here (between the []s) is, per my parser, removed.

@domenic
Copy link
Member

domenic commented Dec 1, 2017

Coverage is 100%, but found some test bugs. Looking good though.

@@ -0,0 +1,3 @@
def main(request, response):
response.headers.set("Content-Type", request.GET.first("type"));
response.content = "<meta charset=utf-8>\n<script>document.write(document.characterSet)</script>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW you can also write this as

def  main(request, response):
    return ([("Content-Type", request.GET.first("type"))],
            "<meta charset=utf-8>\n<script>document.write(document.characterSet)</script>")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you; it was just in case you didn't know.

@@ -18,6 +18,7 @@ main() {
)

./update-built-tests.sh
python ./mimesniff/mime-types/resources/generated-mime-types.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry this should go in update_built_tests.sh. I guess the distinction is to make it easy to bulk update all the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

"input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz",
"output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz"
"input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
"output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvwxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing a "w". You can tell because the strings start in different columns but end in the same column.

},
{
"input": "x/x;\" ",
"output": "x/x;\" \""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my parser the output is x/x, with no parameters. There is no parameter value here, so this is expected.

},
{
"input": "x/x;x=\"\t",
"output": "x/x;x=\"\t\""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my parser the output is x/x, with no parameters. Remember that you remove leading and trailing whitespace from the whole MIME type in step 1 of the entire parser algorithm, so this is equivalent to parsing x/x;x="

@domenic
Copy link
Member

domenic commented Dec 5, 2017

Generated tests still not quite there: x/x;x=\t;bonus=x and x/x;x= ;bonus=x should get their x parameter removed, not quoted.

Maybe we should try to do this while we're both online? Ping me sometime perhaps?

@annevk
Copy link
Member Author

annevk commented Dec 6, 2017

Are you sure you're looking at the latest version? I don't see those tests in generated-mime-types.json.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, my script was not properly updating generated-mime-types.json. Woohoo!!

@annevk annevk merged commit b15e885 into master Dec 7, 2017
@annevk annevk deleted the annevk/mime-type-parameters branch December 7, 2017 13:11
annevk added a commit to whatwg/mimesniff that referenced this pull request Dec 7, 2017
This addresses all open inline issues with respect to the parser and serializer, aligns both closer with implementations, except where those stood in the way of an improved model.

This also updates all of it to make extensive use of the Infra Standard.

See #42 for the testing story (included all linked issues) and web-platform-tests/wpt#7764 for the majority of tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants