Description
Line 266 in bc47bf7
This accepts an array of schemes, but the output is case sensitive, whereas the RFCs specify case insensitivity.
irb(main):055:0> URI::regexp(["http"]).match("HTTP://WWW.GOOGLE.COM")
=> nil
irb(main):056:0> URI::regexp(["HTTP"]).match("HTTP://WWW.GOOGLE.COM")
=> #<MatchData "HTTP://WWW.GOOGLE.COM" 1:"HTTP" 2:nil 3:nil 4:"WWW.GOOGLE.COM" 5:nil 6:nil 7:nil 8:nil 9:nil>
RFC2396:
- URI Normalization and Equivalence
In many cases, different URI strings may actually identify the
identical resource. For example, the host names used in URL are
actually case insensitive, and the URL http://www.XEROX.com is
equivalent to http://www.xerox.com. In general, the rules for
equivalence and definition of a normal form, if any, are scheme
dependent. When a scheme uses elements of the common syntax, it will
also use the common syntax equivalence rules, namely that the scheme
and hostname are case insensitive and a URL with an explicit ":port",
where the port is the default for the scheme, is equivalent to one
where the port is elided.
This is also the case in RFC3986 as well:
Although schemes are case-insensitive, the canonical form is lowercase and documents thatspecify schemes must do so with lowercase letters. An implementation should accept uppercase letters as equivalent to lowercase in scheme names (e.g., allow "HTTP" as well as "http") for the sake of robustness but should only produce lowercase scheme names for consistency.
Expected behavior would be that the scheme's casing is ignored. Similar to:
irb(main):008:0> URI("HTTP://WWW.GOOGLE.COM").scheme
=> "http"
I'm guessing that the regexp just needs to have the i flag passed to it like:
/(?=#{Regexp.union(*schemes)}:)#{@pattern[:X_ABS_URI]}/xi
Is this a bug or am I misunderstanding the code? Thanks!
Activity