Skip to content

URI::regexp schemes are case sensitive #38

Open
@nws-td

Description

/(?=#{Regexp.union(*schemes)}:)#{@pattern[:X_ABS_URI]}/x

This accepts an array of schemes, but the output is case sensitive, whereas the RFCs specify case insensitivity.

irb(main):055:0> URI::regexp(["http"]).match("HTTP://WWW.GOOGLE.COM")
=> nil
irb(main):056:0> URI::regexp(["HTTP"]).match("HTTP://WWW.GOOGLE.COM")
=> #<MatchData "HTTP://WWW.GOOGLE.COM" 1:"HTTP" 2:nil 3:nil 4:"WWW.GOOGLE.COM" 5:nil 6:nil 7:nil 8:nil 9:nil>

RFC2396:

  1. URI Normalization and Equivalence

In many cases, different URI strings may actually identify the
identical resource. For example, the host names used in URL are
actually case insensitive, and the URL http://www.XEROX.com is
equivalent to http://www.xerox.com. In general, the rules for
equivalence and definition of a normal form, if any, are scheme
dependent. When a scheme uses elements of the common syntax, it will
also use the common syntax equivalence rules, namely that the scheme
and hostname are case insensitive
and a URL with an explicit ":port",
where the port is the default for the scheme, is equivalent to one
where the port is elided.

This is also the case in RFC3986 as well:

Although schemes are case-insensitive, the canonical form is lowercase and documents thatspecify schemes must do so with lowercase letters. An implementation should accept uppercase letters as equivalent to lowercase in scheme names (e.g., allow "HTTP" as well as "http") for the sake of robustness but should only produce lowercase scheme names for consistency.

Expected behavior would be that the scheme's casing is ignored. Similar to:

irb(main):008:0> URI("HTTP://WWW.GOOGLE.COM").scheme
=> "http"

I'm guessing that the regexp just needs to have the i flag passed to it like:
/(?=#{Regexp.union(*schemes)}:)#{@pattern[:X_ABS_URI]}/xi

Is this a bug or am I misunderstanding the code? Thanks!

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions