Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Use GURL as HTTP URL parser utility #11670

Merged
merged 14 commits into from
Jul 9, 2020
Merged

http: Use GURL as HTTP URL parser utility #11670

merged 14 commits into from
Jul 9, 2020

Conversation

dio
Copy link
Member

@dio dio commented Jun 20, 2020

Commit Message: http: Use GURL as HTTP URL parser utility
Additional Description: This replaces http_parser's URL parser with GURL.
Risk Level: Medium?
Testing: Unit
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Dhi Aurrahman dio@tetrate.io

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
dio added 5 commits June 21, 2020 05:53
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
bazel/repository_locations.bzl Outdated Show resolved Hide resolved
-#if defined(__clang__) && __has_attribute(uninitialized)
+#if defined(__clang__)
+#if defined(__has_attribute)
+#if __has_attribute(uninitialized)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we put /Za when compiling on Windows, MSVC's C4067 warning is treated as an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it interesting given that googletest itself supports msvc, but does not support clang-cl.

cc_library(
name = "strings",
srcs = [
- "string16.cc",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string16 doesn't make sense to use on 2-byte wchar_t systems (e.g. Windows).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust this patch is operating entirely on ISO-8859-1 with utf-8 enablement? Should be no wide char anything, this isn't java.

"string_piece.h",
"string_piece_forward.h",
"string_util.h",
- "string_util_posix.h",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch: What's the rationale for this patch to grow larger?
@dio: This is for accommodating windows with msvc compiler that we have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm still not super happy with the size of this patch. @wrowe @sunjayBhatia do you have any thoughts on what we can do to avoid having to hand craft this in Envoy? If @dio merges, would you folks be able to followup with upstream to eliminate this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this review on full pause until we have 11107 merged, and rebased? That will pick up //test/... and we'll be in a better position to give this our full attention, since regressions are taking up most of our free cycles for the past days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I leave it to @htuch to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can wait for #11107.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So today we should see 11107 merged. We did go ahead this week and build dio's branch locally with //test/... on Windows, and didn't observe any significant or outright failures, although there are several compilation emits that should be addressed before we merge this to master. This will become apparent to all when it is rebuilt after 11107.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make things very clear, we are patching the deliberate exclusion of one particular architecture, which served no purpose and discouraged development by any interested contributor. I don't know how a patch to revert such a silly commit would be received at the upstream community, so we can make no promises how long a patch will have to live at envoy, @htuch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11107 is merged and this PR has picked it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I'm wondering if we can get some agreement on the long term ownership of this patch. From my perspective, a patch this size is not sustainable in Envoy and that can't be the long term strategy; this will impact our velocity and add friction to upgrading to new versions of GURL in response to security issues. OTOH, I'm eager to move forward and we can accept a patch in the interim.

@dio would it make sense to just try upstream this and see what happens?
@wrowe would you folks be able to take over this patch and long term efforts to shrink it down via tactical upstreaming if this can't be easily done?

tags = ["requires-rtti"],
visibility = ["//visibility:private"],
deps = [":headers"],
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch: This is an autoconf dependency, can we just use foreign_cc to avoid having to hand-maintain a bespoke BUILD file for the project?
@dio: I'm testing it, but it seems the build time is significantly increasing.
@htuch: Interesting, I wonder if it's doing something like building tests or additional libraries. Might be possible to control with some autoconf flags...
@dio: I believe I have turned off everything (that makes sense, like examples, tests, even tools). And it is tricky to force using msvc (when using configure it tends to use mingw and it is failed at configure stage, and I don't think we want to mix mingw and msvc).

Ref: #10599 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, what is an autoconf dependency? If one is introduced by this patch, that will break the contribution of all four engineers working on the Windows port by breaking master. The one I am familiar with is the opentracing dependency which we have excluded for now, but that only breaks tracing extensions. Breaking http/1 seems unwise. Otherwise the basic idea of this bazel BUILD hack looks entirely reasonable, and there is no expectation it would be fixed upstream, because we are cherry-picking a project for subcomponents that don't have any independent project management and don't appear to be supported stand-alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this BUILD (as it isn't a huge patch). In terms of being an autoconf dependency, I'm basically saying it should build under rules_foreign_cc, which knows how to build autoconf.

gperftools is an autoconf dep today that we build this way, see https://github.com/envoyproxy/envoy/blob/master/bazel/foreign_cc/BUILD#L13.

Are you saying in the future, we can't take any external dependency that is autoconf based?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, autoconf is gnu tooling and while it runs in tandem the pseudo-unix cygwin toolchain there is no Windows port, and it's likely to be a thorn in the side of other odd SDKs on mobile and such.

Most every project building on autoconf offers an alternative mechanism for some windows build, however well thought out and supported, or not. external/org_unicode_icuuc/icu4c/source/allinone/ is the Windows build logic by the icu4c maintainers.

Most projects are moving in the direction of supporting CMake for a much simpler cross-platform experience, even as they retain autoconf. This makes the build agnostic of the flavor of Visual Studio and even msvc vs llvm clang-cl.

For Envoy, maintaining a bazel BUILD file for this project makes perfect sense, since the lib objects all match the compiler and link choices of this project. We need to evaluate what the autoconf results produce in terms of source file toggles and changes, even if we avoid these long-term.

For upstream, propose a CMakeFiles.txt to cover their existing autoconf determinations. Our rules_foreign_cc support in bazel is much better with CMake than less-predictable autoconf output. This probably results in the most flexibility to the maintainers and would be our simplest solution short of this short-term BUILD maintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrowe can you propose some amendment to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md based on this?

I don't think it sounds unreasonable, but we should take this through a review process and ensure folks are aligned, as it's a major project restriction.

How come gperftools isn't impacting you? Don't you use tcmalloc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tcmalloc is disabled on Windows if you look at .bazelrc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gperftools as of 2015 was being ported to Windows using the tcmalloc_minimal implementation. We have have documented this exclusion for Windows both in .bazelrc and in bazel/foreign_cc/BUILD.

Until bazel test //test/... is largely building clean with all currently tagged "fails_on_windows", gperftools hasn't been and won't be a particularly high priority. The same is true for opentracing and all tracing extensions on Windows.

THE major difference between all that and this proposed enhancement is that the envoyproxy is really nothing without a url parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I would like to get this down in writing, so that we can ensure everyone is on-board and don't relitigate this each time we add a new dependency. Can you do the PR to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md arguing to remove autoconf in rules_foreign_cc for Windows reasons for all deps that are core across platforms?

Comment on lines +58 to +81
bool Url::initializeForConnect(GURL&& url) {
// CONNECT requests can only contain "hostname:port"
// https://github.com/nodejs/http-parser/blob/d9275da4650fd1133ddc96480df32a9efe4b059b/http_parser.c#L2503-L2506.
if (!url.is_valid() || url.IsStandard()) {
return false;
}

const auto& parsed = url.parsed_for_possibly_invalid_spec();
// The parsed.scheme contains the URL's hostname (stored by GURL). While host and port have -1
// as its length.
if (parsed.scheme.len <= 0 || parsed.host.len > 0 || parsed.port.len > 0) {
return false;
}

host_and_port_ = url.possibly_invalid_spec();
const auto& parts = StringUtil::splitToken(host_and_port_, ":", /*keep_empty_string=*/true,
/*trim_whitespace=*/false);
if (parts.size() != 2 || static_cast<size_t>(parsed.scheme.len) != parts.at(0).size() ||
!validPortForConnect(parts.at(1))) {
return false;
}

return true;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk I'm not sure if this is the right thing to do for making sure CONNECT works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm a bit confused why EXPECT_FALSE(url.initialize("foo.com:65536", is_connect)); fails. Could be I'm insufficiently caffeinated though :-P

Also as long as I'm looking, do we think this should be runtime guarded, even with a shorter duration? I could imagine this having behavioral changes in weird corner cases though ideally it should be good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

65536 is greater than std::numeric_limits<uint16_t>::max(). Not in 1-65535 valid port range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: Runtime guard. SGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. maybe comments there but yeah, otherwise this looks fine.
If folks prefer we could also split out connect validation into its own class since it's a weird one-off but I suspect it'll be useful to have a consistent API and this how the old class worked so LGTM!

@dio dio changed the title WIP: http: Use GURL as HTTP URL parser utility http: Use GURL as HTTP URL parser utility Jun 21, 2020
@dio dio marked this pull request as ready for review June 21, 2020 13:13
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for reviving. My main concern is just around the size of the manual patch for GURL and ICUUC. If there is any way or plan to reduce it, it would be really awesome. Otherwise I'm happy to ship and iterate.

"string_piece.h",
"string_piece_forward.h",
"string_util.h",
- "string_util_posix.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm still not super happy with the size of this patch. @wrowe @sunjayBhatia do you have any thoughts on what we can do to avoid having to hand craft this in Envoy? If @dio merges, would you folks be able to followup with upstream to eliminate this patch?

source/common/http/url_utility.cc Outdated Show resolved Hide resolved
@dio
Copy link
Member Author

dio commented Jun 23, 2020

@htuch after this is merged, I'll try to play around with the autoconf flags once again. Since the time taken by the build using the resulted Makefile takes a longer time, it probably requires more patience 🙂.

dio added 3 commits June 24, 2020 04:33
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@dio
Copy link
Member Author

dio commented Jun 24, 2020

/azp run envoy-presubmit

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dio
Copy link
Member Author

dio commented Jun 29, 2020

Merging master to pick up #11107.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the patch/ICUUC threads. Excited to see this ready to land..

"string_piece.h",
"string_piece_forward.h",
"string_util.h",
- "string_util_posix.h",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I'm wondering if we can get some agreement on the long term ownership of this patch. From my perspective, a patch this size is not sustainable in Envoy and that can't be the long term strategy; this will impact our velocity and add friction to upgrading to new versions of GURL in response to security issues. OTOH, I'm eager to move forward and we can accept a patch in the interim.

@dio would it make sense to just try upstream this and see what happens?
@wrowe would you folks be able to take over this patch and long term efforts to shrink it down via tactical upstreaming if this can't be easily done?

tags = ["requires-rtti"],
visibility = ["//visibility:private"],
deps = [":headers"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I would like to get this down in writing, so that we can ensure everyone is on-board and don't relitigate this each time we add a new dependency. Can you do the PR to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md arguing to remove autoconf in rules_foreign_cc for Windows reasons for all deps that are core across platforms?

@dio
Copy link
Member Author

dio commented Jun 29, 2020

would it make sense to just try upstream this and see what happens?

Do you mean to submit it to the Chromium project? I can try. Probably, we need @alyssawilk's perspective and guidance on how to do that properly.

@htuch
Copy link
Member

htuch commented Jun 29, 2020

@dio yep. I think @alyssawilk and @danzh2010 probably could both give some advice on how to approach. Doesn't Chromium build for Windows anyway? I'm pretty sure I remember running Chrome on Windows last time I used it :-P

@dio
Copy link
Member Author

dio commented Jun 29, 2020

@htuch haha, yeah. However only clang-cl is supported on Windows, see https://crbug.com/988071. We're doing MSVC. I'm a bit scared. 😶

@htuch
Copy link
Member

htuch commented Jun 29, 2020

@dio maybe just start by opening a Chromium issue and see if it's not too controversial?

@stale
Copy link

stale bot commented Jul 6, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 6, 2020
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jul 6, 2020
@dio
Copy link
Member Author

dio commented Jul 6, 2020

I'm trying to join the discussion here: https://bugs.chromium.org/p/chromium/issues/detail?id=1053958.

@htuch
Copy link
Member

htuch commented Jul 9, 2020

@dio I think if you can just tag the patch with a TODO and #11974 for now, we can move forward with this PR. Ideally we can undo the patch once we move to clang-cl.exe.

@dio
Copy link
Member Author

dio commented Jul 9, 2020

Thanks, @htuch! Updated the patch with TODO.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Epic.

@htuch htuch merged commit 2d69e30 into envoyproxy:master Jul 9, 2020
org_unicode_icuuc = dict(
strip_prefix = "icu-release-64-2",
sha256 = "524960ac99d086cdb6988d2a92fc163436fd3c6ec0a84c475c6382fbf989be05",
urls = ["https://github.com/unicode-org/icu/archive/release-64-2.tar.gz"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dio is there any reason you picked this April 2019 rather than the latest https://github.com/unicode-org/icu/releases/tag/release-67-1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BUILD file is inspired from Tensorflow project as known to be stable. Will try to update along the way. Thanks for the reminder! https://github.com/tensorflow/tensorflow/blob/master/third_party/icu/workspace.bzl. Tracked in here. #12015

rgs1 pushed a commit to rgs1/envoy that referenced this pull request Jul 16, 2020
After envoyproxy#11670, the CSRF filter started failing for us.

This change fixes 3 issues that were uncovered after moving
to gURL for parsing URLs:

1) the hostAndPort() utility method, in the CSRF filter, was
returning a string view of a stack variable.

2) the Origin header always includes the scheme, so let's ensure
this is illustrated in tests (which were missing this and passing
due to relaxed checks).

3) the Url::initialize method expects an absolute URL, something that
the CSRF filter wasn't complying with.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
mattklein123 pushed a commit that referenced this pull request Jul 17, 2020
After #11670, the CSRF filter started failing for us.

This change fixes 3 issues that were uncovered after moving
to gURL for parsing URLs:

1) the hostAndPort() utility method, in the CSRF filter, was
returning a string view of a stack variable.

2) the Origin header always includes the scheme, so let's ensure
this is illustrated in tests (which were missing this and passing
due to relaxed checks).

3) the Url::initialize method expects an absolute URL, something that
the CSRF filter wasn't complying with.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020
After envoyproxy#11670, the CSRF filter started failing for us.

This change fixes 3 issues that were uncovered after moving
to gURL for parsing URLs:

1) the hostAndPort() utility method, in the CSRF filter, was
returning a string view of a stack variable.

2) the Origin header always includes the scheme, so let's ensure
this is illustrated in tests (which were missing this and passing
due to relaxed checks).

3) the Url::initialize method expects an absolute URL, something that
the CSRF filter wasn't complying with.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
This replaces http_parser's URL parser with GURL.

Risk Level: Medium
Testing: Unit
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: scheler <santosh.cheler@appdynamics.com>
scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
After envoyproxy#11670, the CSRF filter started failing for us.

This change fixes 3 issues that were uncovered after moving
to gURL for parsing URLs:

1) the hostAndPort() utility method, in the CSRF filter, was
returning a string view of a stack variable.

2) the Origin header always includes the scheme, so let's ensure
this is illustrated in tests (which were missing this and passing
due to relaxed checks).

3) the Url::initialize method expects an absolute URL, something that
the CSRF filter wasn't complying with.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Signed-off-by: scheler <santosh.cheler@appdynamics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants