Skip to content

Add files via upload #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

patrickdark
Copy link

@patrickdark patrickdark commented Apr 27, 2017

Adds badge text-based feedback indicating invalid |accept-language| strings, a single |accept-language| substring, multiple valid |accept-language| substrings, and all |accept-language| having |;q=0|.

Not sure if "q=0" is the best feedback. It could be "!language" or something along those lines.

Adds badge text-based feedback indicating invalid |accept-language| strings, a single |accept-language|  substring, multiple valid |accept-language| substrings, and all |accept-language| have |;q=0|.
Changed an operator from `>=` to `===`. No effect on functionality, but the latter is more correct.
Added a Unicode variation selector 16 (`\u{fe0f}`) character after the warning sign (⚠) character to force a multicolor emoji version of the character since that's more readable.
Fix use of double quotation marks to use single quotation marks.
Removed two unnecessary else statements. (No need for them since previous blocks use `return`.)
Added an `if` statement that accounts for empty strings so they aren't treated as errors.
@callahad
Copy link
Owner

Woohoo! This seems to work really well. The use of emoji to indicate errors is super clever.

I think we can determine the highest qvalue language more concisely via map/reduce. I believe this does the same thing, given a valid input string:

  let best = (input) => {
    return input.split(',')
                .map(token => token.trim().split(';q='))
                .map(([lang, qval]) => [lang, qval ? parseFloat(qval) : 1])
                .reduce(([bestLang, bestQval], [lang, qval]) => {
                  if (qval > bestQval) {
                    return [lang, qval];
                  } else if (qval === bestQval) {
                    return ['mul', bestQval]
                  } else {
                    return [bestLang, bestQval];
                  }
                }, ['', -1]);
  }

For example:

> best('en, de')
[ 'mul', 1 ]
> best('en, de;q=0.5')
[ 'en', 1 ]
> best('en;q=0.1, de;q=0.5')
[ 'de', 0.5 ]

Does that function look right to you?

@patrickdark
Copy link
Author

patrickdark commented Apr 28, 2017

@callahad I've never used the map method, reduce method, or destructured arrays, so I find this code is somewhat confusing. That said, on researching those things and in testing the shown code, your code does appear to generally return the correct best array on invocation.

The code is incomplete though since it doesn't evaluate the validity of the string. One can't accurately determine the bestQval value without first examining whether the input is valid. Perhaps the omission was intentional though. I guess it gives a useful value for the invalid case involving an empty string, at least; the bestQval of 1 can simply be discarded.

It also doesn't handle a bestQval value of 0. For example, (input = 'en;q=0') is explicitly disallowing English without otherwise expressing a preference for any other language, so this case requires special handling since en is not the best language, but a prohibited language. In those cases, it seems that the label should be an empty string, !en or !mul, or some placeholder value like q=0 or 🛇.

This code, based on yours, seems to handle the missing cases:

let valid = /^[a-z]{1,8}(?:-[a-z]{1,8})?(?:;q=(?:0(?:\.\d{0,3})?|1(?:\.0{0,3})?))?(?:, *[a-z]{1,8}(?:-[a-z]{1,8})?(?:;q=(?:0(?:\.\d{0,3})?|1(?:\.0{0,3})?))?)*$/i;
let best = input => {
  return valid.test(input) ? (() => {
    let output = input.split(',')
      .map(token => token.trim().split(';q='))
      .map(([lang, qval]) => [lang, qval ? parseFloat(qval) : 1])
      .reduce(([bestLang, bestQval], [lang, qval]) => {
        return qval > bestQval ? [lang, qval] : qval === bestQval ? ['mul', bestQval] : [bestLang, bestQval];
      }, ['', -1])
    return output[1] ? output[0] : '🛇\ufe0f';
  })() : !input ? '' : '⚠\ufe0f';
};

(There's no emoji for 🛇 on Windows. This just future-proofs the character.)

@patrickdark
Copy link
Author

patrickdark commented Apr 28, 2017

It seems that my regex is actually inaccurate. Apparently RFC2616 is superseded by RFC7231, which allows both numbers (in addition to letters) after the language-range's hyphen-minus character and optional whitespace (spaces and tabs) around the ;q= semicolon. This means that a simple split(';q=') isn't sufficient to split the tokens.

Still can't figure out what the formal rules are for whitespace around commas.

@patrickdark
Copy link
Author

patrickdark commented Apr 28, 2017

So RFC3282 specifies a ridiculously permissive model that allows CFWS (comments and "folding whitespace") around the ;, q, and =; after token-separating commas; and around the entire accept-language header value. See https://tools.ietf.org/html/rfc3282#section-3 and https://tools.ietf.org/html/rfc2822#section-3.2.3.

I have no idea how to implement comment detection since I can't understand what the spec means when it defines quoted-pair.

The next best solution seems to be (A) to prohibit entry of anything other than [\t ,\-\.0-9;=A-Za-z] since there's no "folding" without being able to enter CRLF pairs and no comments without being able to enter parentheses or (B) to treat potentially valid values as errors. Continuing to allow (B) would continue to allow people to use this extension to test intentionally invalid accept-language headers at the expense of incorrectly flagging valid, "exotic" header values as errors in the UI.

There's also (C): figure out comment detection, but that may be awhile.

Using \[\t ]+\g.replace('') before processing should continue to allow split(';q=') while also allowing one to drop calls to trim().

/^[\t ]*[a-z]{1,8}(?:-[0-9a-z]{1,8})?(?:[\t ]*;[\t ]*q[\t ]*=[\t ]*(?:0(?:\.\d{0,3})?|1(?:\.0{0,3})?))?(?:,[\t ]*[a-z]{1,8}(?:-[a-z0-9]{1,8})?(?:[\t ]*;[\t ]*[\t ]*q[\t ]*=[\t ]*(?:0(?:\.\d{0,3})?|1(?:\.0{0,3})?))?)*[\t ]*$/i handles the cases of optional whitespace (sans CLRF) and adds support for numbers in the language strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants