Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo path parsing for non-special URLs #212

Closed
annevk opened this issue Jan 20, 2017 · 2 comments
Closed

Redo path parsing for non-special URLs #212

annevk opened this issue Jan 20, 2017 · 2 comments

Comments

@annevk
Copy link
Member

annevk commented Jan 20, 2017

As part of #148 and #185 we noticed that if we want to keep parsing non-special URLs in a way that preserves the ability for them to be base/relative URLs we need to make some changes, that would also bring them closer to the original RFCs.

I tried to do that as part of #185 but failed multiple times, so I'm going to make it a dedicated issue. What I think needs to happen is that we stop eating "/" early on. I think we need to drop the "path start state" as well.

What we want is that for special URLs we ensure that path has at least one item (potentially the empty string) and that for non-special URLs it can be empty if there is no path (e.g., as in foo://x#x having no path, but http://x#x becoming http://x/#x because special).

Currently "relative slash state" goes down "special authority ignore slashes state" even for non-special URLs. That means ///x against foo://y results in foo://x rather than foo:///x. That also needs to be fixed.

annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 20, 2017
@annevk
Copy link
Member Author

annevk commented Jan 21, 2017

With these changes I get passing tests again (apart from five tests that require the opaque host changes to land):

diff --git a/lib/URL-impl.js b/lib/URL-impl.js
index 9e7d67c..0a394a3 100644
--- a/lib/URL-impl.js
+++ b/lib/URL-impl.js
@@ -136,6 +136,10 @@ exports.implementation = class URLImpl {
       return this._url.path[0];
     }
 
+    if (this._url.path.length === 0) {
+      return "";
+    }
+
     return "/" + this._url.path.join("/");
   }
 
diff --git a/src/url-state-machine.js b/src/url-state-machine.js
index 4534bd6..85bb11d 100644
--- a/src/url-state-machine.js
+++ b/src/url-state-machine.js
@@ -680,11 +680,13 @@ URLStateMachine.prototype["parse relative"] = function parseRelative(c) {
 };
 
 URLStateMachine.prototype["parse relative slash"] = function parseRelativeSlash(c) {
-  if (c === p("/") || (isSpecial(this.url) && c === p("\\"))) {
+  if (isSpecial(this.url) && (c === p("/") || c === p("\\"))) {
     if (c === p("\\")) {
       this.parseError = true;
     }
     this.state = "special authority ignore slashes";
+  } else if(c === p("/")) {
+    this.state = "authority";
   } else {
     this.url.username = this.base.username;
     this.url.password = this.base.password;
@@ -930,12 +932,27 @@ URLStateMachine.prototype["parse file host"] = function parseFileHost(c, cStr) {
 };
 
 URLStateMachine.prototype["parse path start"] = function parsePathStart(c) {
-  if (isSpecial(this.url) && c === p("\\")) {
-    this.parseError = true;
-  }
-  this.state = "path";
-  if (c !== p("/") && !(isSpecial(this.url) && c === p("\\"))) {
-    --this.pointer;
+  if (!isSpecial(this.url)) {
+    if (!this.stateOverride && c === p("?")) {
+      this.url.query = "";
+      this.state = "query";
+    } else if (!this.stateOverride && c === p("#")) {
+      this.url.fragment = "";
+      this.state = "fragment";
+    } else if (!isNaN(c)) {
+      if (c !== p("/")) {
+        --this.pointer;
+      }
+      this.state = "path";
+    }
+  } else {
+    if (c !== p("/") && c !== p("\\")) {
+      --this.pointer;
+    }
+    if(c === p("\\")) {
+      this.parseError = true;
+    }
+    this.state = "path";
   }
 
   return true;
@@ -1096,7 +1113,7 @@ function serializeURL(url, excludeFragment) {
 
   if (url.cannotBeABaseURL) {
     output += url.path[0];
-  } else {
+  } else if (url.path.length !== 0) {
     output += "/" + url.path.join("/");
   }

@annevk
Copy link
Member Author

annevk commented Jan 21, 2017

However, that probably means I should write more tests, since that seems like it would fail with state override into effect. Done and updated the patch above.

annevk added a commit that referenced this issue Jan 21, 2017
This allows paths to be empty for non-special URLs and also takes that
into account when serializing.

Tests: web-platform-tests/wpt#4586.

Fixes #212.
hubot pushed a commit to WebKit/WebKit-http that referenced this issue Jan 23, 2017
…sh after the host more compatible

https://bugs.webkit.org/show_bug.cgi?id=167317
Source/WebCore:

<rdar://problem/29526875>

Reviewed by Sam Weinig.

This is currently being added to the URL spec in whatwg/url#212
Covered by new API tests.

* platform/URLParser.cpp:
(WebCore::URLParser::parse):
Only add a slash if there wasn't one if the URL has a special scheme.
This new behavior matches the old behavior of URL::parse.

Tools:


Reviewed by Sam Weinig.

* TestWebKitAPI/Tests/WebCore/URLParser.cpp:
(TestWebKitAPI::TEST_F):



git-svn-id: http://svn.webkit.org/repository/webkit/trunk@211058 268f45cc-cd09-0410-ab3c-d52691b4dbfc
annevk added a commit that referenced this issue Jan 24, 2017
This allows paths to be empty for non-special URLs and also takes that
into account when serializing.

Tests: web-platform-tests/wpt#4586.

Fixes #212.
annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 24, 2017
annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 27, 2017
annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 31, 2017
Some of these used to be part of #4406.

See also whatwg/url#212.
annevk added a commit that referenced this issue Jan 31, 2017
This allows paths to be empty for non-special URLs and also takes that into account when serializing.

Tests: web-platform-tests/wpt#4586.

Fixes #212.
annevk pushed a commit to whatwg/html that referenced this issue Feb 7, 2017
Updates the pathname getter to return the empty string when the URL's path is empty, which mirrors the change in whatwg/url@b087fe2 which in turn fixed whatwg/url#212.

(This was tested as part of the URL Standard change. It is not tested for the Location object as it is not feasible to do so as discussed in the pull request for this change.)
alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
Updates the pathname getter to return the empty string when the URL's path is empty, which mirrors the change in whatwg/url@b087fe2 which in turn fixed whatwg/url#212.

(This was tested as part of the URL Standard change. It is not tested for the Location object as it is not feasible to do so as discussed in the pull request for this change.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant