Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Use GURL as HTTP URL parser utility #11670

Merged
merged 14 commits into from
Jul 9, 2020
119 changes: 119 additions & 0 deletions bazel/external/googleurl.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# TODO(dio): Consider to remove this patch when we have the ability to compile the project using
# clang-cl. Tracked in https://github.com/envoyproxy/envoy/issues/11974.

diff --git a/base/compiler_specific.h b/base/compiler_specific.h
index 2962537..6193b56 100644
--- a/base/compiler_specific.h
+++ b/base/compiler_specific.h
@@ -7,10 +7,6 @@

#include "build/build_config.h"

-#if defined(COMPILER_MSVC) && !defined(__clang__)
-#error "Only clang-cl is supported on Windows, see https://crbug.com/988071"
-#endif
-
// Annotate a variable indicating it's ok if the variable is not used.
// (Typically used to silence a compiler warning when the assignment
// is important for some other reason.)
@@ -212,7 +208,9 @@
#endif
#endif

-#if defined(__clang__) && __has_attribute(uninitialized)
+#if defined(__clang__)
+#if defined(__has_attribute)
+#if __has_attribute(uninitialized)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we put /Za when compiling on Windows, MSVC's C4067 warning is treated as an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it interesting given that googletest itself supports msvc, but does not support clang-cl.

// Attribute "uninitialized" disables -ftrivial-auto-var-init=pattern for
// the specified variable.
// Library-wide alternative is
@@ -243,6 +241,8 @@
// E.g. platform, bot, benchmark or test name in patch description or next to
// the attribute.
#define STACK_UNINITIALIZED __attribute__((uninitialized))
+#endif
+#endif
#else
#define STACK_UNINITIALIZED
#endif
diff --git a/base/strings/BUILD b/base/strings/BUILD
index 7a06170..7c86a5f 100644
--- a/base/strings/BUILD
+++ b/base/strings/BUILD
@@ -6,23 +6,21 @@ load("//:build_config.bzl", "build_config")
cc_library(
name = "strings",
srcs = [
- "string16.cc",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string16 doesn't make sense to use on 2-byte wchar_t systems (e.g. Windows).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust this patch is operating entirely on ISO-8859-1 with utf-8 enablement? Should be no wide char anything, this isn't java.

"string_piece.cc",
"string_util.cc",
"string_util_constants.cc",
"utf_string_conversion_utils.cc",
"utf_string_conversions.cc",
- ],
+ ] + build_config.strings_srcs,
hdrs = [
"char_traits.h",
"string16.h",
"string_piece.h",
"string_piece_forward.h",
"string_util.h",
- "string_util_posix.h",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch: What's the rationale for this patch to grow larger?
@dio: This is for accommodating windows with msvc compiler that we have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm still not super happy with the size of this patch. @wrowe @sunjayBhatia do you have any thoughts on what we can do to avoid having to hand craft this in Envoy? If @dio merges, would you folks be able to followup with upstream to eliminate this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this review on full pause until we have 11107 merged, and rebased? That will pick up //test/... and we'll be in a better position to give this our full attention, since regressions are taking up most of our free cycles for the past days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I leave it to @htuch to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can wait for #11107.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So today we should see 11107 merged. We did go ahead this week and build dio's branch locally with //test/... on Windows, and didn't observe any significant or outright failures, although there are several compilation emits that should be addressed before we merge this to master. This will become apparent to all when it is rebuilt after 11107.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make things very clear, we are patching the deliberate exclusion of one particular architecture, which served no purpose and discouraged development by any interested contributor. I don't know how a patch to revert such a silly commit would be received at the upstream community, so we can make no promises how long a patch will have to live at envoy, @htuch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11107 is merged and this PR has picked it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I'm wondering if we can get some agreement on the long term ownership of this patch. From my perspective, a patch this size is not sustainable in Envoy and that can't be the long term strategy; this will impact our velocity and add friction to upgrading to new versions of GURL in response to security issues. OTOH, I'm eager to move forward and we can accept a patch in the interim.

@dio would it make sense to just try upstream this and see what happens?
@wrowe would you folks be able to take over this patch and long term efforts to shrink it down via tactical upstreaming if this can't be easily done?

"utf_string_conversion_utils.h",
"utf_string_conversions.h",
- ],
+ ] + build_config.strings_hdrs,
copts = build_config.default_copts,
visibility = ["//visibility:public"],
deps = [
diff --git a/build_config.bzl b/build_config.bzl
index d5fca65..fc0d7e5 100644
--- a/build_config/build_config.bzl
+++ b/build_config/build_config.bzl
@@ -1,8 +1,25 @@
-_default_copts = [
- "-std=c++14",
- "-fno-strict-aliasing",
-]
+_default_copts = select({
+ "@envoy//bazel:windows_x86_64": [
+ "/std:c++14",
+ ],
+ "//conditions:default": [
+ "-std=c++14",
+ "-fno-strict-aliasing",
+ ],
+})
+
+_strings_srcs = select({
+ "@envoy//bazel:windows_x86_64": [],
+ "//conditions:default": ["string16.cc"],
+})
+
+_strings_hdrs = select({
+ "@envoy//bazel:windows_x86_64": ["string_util_win.h"],
+ "//conditions:default": ["string_util_posix.h"],
+})

build_config = struct(
default_copts = _default_copts,
+ strings_srcs = _strings_srcs,
+ strings_hdrs = _strings_hdrs,
)
diff --git a/url/BUILD b/url/BUILD
index 0126bdc..5d1a171 100644
--- a/url/BUILD
+++ b/url/BUILD
@@ -43,11 +43,11 @@ cc_library(
"url_util.h",
],
copts = build_config.default_copts,
- linkopts = ["-licuuc"],
htuch marked this conversation as resolved.
Show resolved Hide resolved
visibility = ["//visibility:public"],
deps = [
"//base",
"//base/strings",
"//polyfills",
+ "@org_unicode_icuuc//:common",
],
)
59 changes: 59 additions & 0 deletions bazel/external/icuuc.BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
load("@rules_cc//cc:defs.bzl", "cc_library")

licenses(["notice"]) # Apache 2

exports_files([
"icu4c/LICENSE",
"icu4j/main/shared/licenses/LICENSE",
])

icuuc_copts = [
"-DU_STATIC_IMPLEMENTATION",
"-DU_COMMON_IMPLEMENTATION",
"-DU_HAVE_STD_ATOMICS",
] + select({
"@envoy//bazel:apple": [
"-Wno-shorten-64-to-32",
"-Wno-unused-variable",
],
"@envoy//bazel:windows_x86_64": [
"/utf-8",
"/DLOCALE_ALLOW_NEUTRAL_NAMES=0",
],
# TODO(dio): Add "@envoy//bazel:android" when we have it.
# "@envoy//bazel:android": [
# "-fdata-sections",
# "-DU_HAVE_NL_LANGINFO_CODESET=0",
# "-Wno-deprecated-declarations",
# ],
"//conditions:default": [],
})

cc_library(
name = "headers",
hdrs = glob(["icu4c/source/common/unicode/*.h"]),
includes = ["icu4c/source/common"],
visibility = ["//visibility:public"],
)

cc_library(
name = "common",
hdrs = glob(["icu4c/source/common/unicode/*.h"]),
includes = ["icu4c/source/common"],
visibility = ["//visibility:public"],
deps = [":icuuc"],
)

cc_library(
name = "icuuc",
srcs = glob([
"icu4c/source/common/*.c",
"icu4c/source/common/*.cpp",
"icu4c/source/stubdata/*.cpp",
]),
hdrs = glob(["icu4c/source/common/*.h"]),
copts = icuuc_copts,
tags = ["requires-rtti"],
visibility = ["//visibility:private"],
deps = [":headers"],
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch: This is an autoconf dependency, can we just use foreign_cc to avoid having to hand-maintain a bespoke BUILD file for the project?
@dio: I'm testing it, but it seems the build time is significantly increasing.
@htuch: Interesting, I wonder if it's doing something like building tests or additional libraries. Might be possible to control with some autoconf flags...
@dio: I believe I have turned off everything (that makes sense, like examples, tests, even tools). And it is tricky to force using msvc (when using configure it tends to use mingw and it is failed at configure stage, and I don't think we want to mix mingw and msvc).

Ref: #10599 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, what is an autoconf dependency? If one is introduced by this patch, that will break the contribution of all four engineers working on the Windows port by breaking master. The one I am familiar with is the opentracing dependency which we have excluded for now, but that only breaks tracing extensions. Breaking http/1 seems unwise. Otherwise the basic idea of this bazel BUILD hack looks entirely reasonable, and there is no expectation it would be fixed upstream, because we are cherry-picking a project for subcomponents that don't have any independent project management and don't appear to be supported stand-alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this BUILD (as it isn't a huge patch). In terms of being an autoconf dependency, I'm basically saying it should build under rules_foreign_cc, which knows how to build autoconf.

gperftools is an autoconf dep today that we build this way, see https://github.com/envoyproxy/envoy/blob/master/bazel/foreign_cc/BUILD#L13.

Are you saying in the future, we can't take any external dependency that is autoconf based?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, autoconf is gnu tooling and while it runs in tandem the pseudo-unix cygwin toolchain there is no Windows port, and it's likely to be a thorn in the side of other odd SDKs on mobile and such.

Most every project building on autoconf offers an alternative mechanism for some windows build, however well thought out and supported, or not. external/org_unicode_icuuc/icu4c/source/allinone/ is the Windows build logic by the icu4c maintainers.

Most projects are moving in the direction of supporting CMake for a much simpler cross-platform experience, even as they retain autoconf. This makes the build agnostic of the flavor of Visual Studio and even msvc vs llvm clang-cl.

For Envoy, maintaining a bazel BUILD file for this project makes perfect sense, since the lib objects all match the compiler and link choices of this project. We need to evaluate what the autoconf results produce in terms of source file toggles and changes, even if we avoid these long-term.

For upstream, propose a CMakeFiles.txt to cover their existing autoconf determinations. Our rules_foreign_cc support in bazel is much better with CMake than less-predictable autoconf output. This probably results in the most flexibility to the maintainers and would be our simplest solution short of this short-term BUILD maintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wrowe can you propose some amendment to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md based on this?

I don't think it sounds unreasonable, but we should take this through a review process and ensure folks are aligned, as it's a major project restriction.

How come gperftools isn't impacting you? Don't you use tcmalloc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tcmalloc is disabled on Windows if you look at .bazelrc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gperftools as of 2015 was being ported to Windows using the tcmalloc_minimal implementation. We have have documented this exclusion for Windows both in .bazelrc and in bazel/foreign_cc/BUILD.

Until bazel test //test/... is largely building clean with all currently tagged "fails_on_windows", gperftools hasn't been and won't be a particularly high priority. The same is true for opentracing and all tracing extensions on Windows.

THE major difference between all that and this proposed enhancement is that the envoyproxy is really nothing without a url parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I would like to get this down in writing, so that we can ensure everyone is on-board and don't relitigate this each time we add a new dependency. Can you do the PR to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md arguing to remove autoconf in rules_foreign_cc for Windows reasons for all deps that are core across platforms?

14 changes: 14 additions & 0 deletions bazel/repositories.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ def envoy_dependencies(skip_targets = []):
_repository_impl("bazel_compdb")
_repository_impl("envoy_build_tools")
_repository_impl("rules_cc")
_org_unicode_icuuc()

# Unconditional, since we use this only for compiler-agnostic fuzzing utils.
_org_llvm_releases_compiler_rt()
Expand Down Expand Up @@ -699,6 +700,8 @@ def _com_googlesource_quiche():
def _com_googlesource_googleurl():
_repository_impl(
name = "com_googlesource_googleurl",
patches = ["@envoy//bazel/external:googleurl.patch"],
patch_args = ["-p1"],
)
native.bind(
name = "googleurl",
Expand Down Expand Up @@ -858,6 +861,17 @@ filegroup(
**_get_location("kafka_python_client")
)

def _org_unicode_icuuc():
_repository_impl(
name = "org_unicode_icuuc",
build_file = "@envoy//bazel/external:icuuc.BUILD",
# TODO(dio): Consider patching udata when we need to embed some data.
)
native.bind(
name = "icuuc",
actual = "@org_unicode_icuuc//:common",
)

def _foreign_cc_dependencies():
_repository_impl("rules_foreign_cc")

Expand Down
7 changes: 7 additions & 0 deletions bazel/repository_locations.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -477,4 +477,11 @@ DEPENDENCY_REPOSITORIES = dict(
urls = ["https://github.com/dpkp/kafka-python/archive/2.0.0.tar.gz"],
use_category = ["test"],
),
org_unicode_icuuc = dict(
strip_prefix = "icu-release-64-2",
sha256 = "524960ac99d086cdb6988d2a92fc163436fd3c6ec0a84c475c6382fbf989be05",
urls = ["https://github.com/unicode-org/icu/archive/release-64-2.tar.gz"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dio is there any reason you picked this April 2019 rather than the latest https://github.com/unicode-org/icu/releases/tag/release-67-1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BUILD file is inspired from Tensorflow project as known to be stable. Will try to update along the way. Thanks for the reminder! https://github.com/tensorflow/tensorflow/blob/master/third_party/icu/workspace.bzl. Tracked in here. #12015

use_category = ["dataplane"],
cpe = "cpe:2.3:a:icu-project:international_components_for_unicode",
),
)
15 changes: 14 additions & 1 deletion source/common/http/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,6 @@ envoy_cc_library(
hdrs = ["utility.h"],
external_deps = [
"abseil_optional",
"http_parser",
"nghttp2",
],
deps = [
Expand Down Expand Up @@ -428,3 +427,17 @@ envoy_cc_library(
"//source/common/common:assert_lib",
],
)

envoy_cc_library(
name = "url_utility_lib",
srcs = ["url_utility.cc"],
hdrs = ["url_utility.h"],
external_deps = [
"googleurl",
],
deps = [
"//source/common/common:assert_lib",
"//source/common/common:empty_string",
"//source/common/common:utility_lib",
],
)
1 change: 1 addition & 0 deletions source/common/http/http1/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ envoy_cc_library(
"//source/common/http:header_utility_lib",
"//source/common/http:headers_lib",
"//source/common/http:status_lib",
"//source/common/http:url_utility_lib",
"//source/common/http:utility_lib",
"//source/common/runtime:runtime_features_lib",
"@envoy_api//envoy/config/core/v3:pkg_cc_proto",
Expand Down
1 change: 1 addition & 0 deletions source/common/http/http1/codec_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "common/http/header_utility.h"
#include "common/http/headers.h"
#include "common/http/http1/header_formatter.h"
#include "common/http/url_utility.h"
#include "common/http/utility.h"
#include "common/runtime/runtime_features.h"

Expand Down
95 changes: 95 additions & 0 deletions source/common/http/url_utility.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#include "common/http/url_utility.h"

#include <sys/types.h>

#include <iostream>
#include <string>

#include "common/common/assert.h"
#include "common/common/empty_string.h"
#include "common/common/utility.h"

#include "absl/strings/numbers.h"
#include "absl/strings/str_cat.h"

namespace Envoy {
namespace Http {
namespace Utility {

bool Url::initialize(absl::string_view absolute_url, bool is_connect) {
htuch marked this conversation as resolved.
Show resolved Hide resolved
// TODO(dio): When we have access to base::StringPiece, probably we can convert absolute_url to
// that instead.
GURL parsed(std::string{absolute_url});
if (is_connect) {
return initializeForConnect(std::move(parsed));
}

// TODO(dio): Check if we need to accommodate to strictly validate only http(s) AND ws(s) schemes.
// Currently, we only accept http(s).
if (!parsed.is_valid() || !parsed.SchemeIsHTTPOrHTTPS()) {
return false;
}

scheme_ = parsed.scheme();

// Only non-default ports will be rendered as part of host_and_port_. For example,
// http://www.host.com:80 has port component (i.e. 80). However, since 80 is a default port for
// http scheme, host_and_port_ will be rendered as www.host.com (without port). The same case with
// https scheme (with port 443) as well.
host_and_port_ =
absl::StrCat(parsed.host(), parsed.has_port() ? ":" : EMPTY_STRING, parsed.port());

const int port = parsed.EffectiveIntPort();
if (port <= 0 || port > std::numeric_limits<uint16_t>::max()) {
return false;
}
port_ = static_cast<uint16_t>(port);

// RFC allows the absolute URI to not end in "/", but the absolute path form must start with "/".
path_and_query_params_ = parsed.PathForRequest();
if (parsed.has_ref()) {
absl::StrAppend(&path_and_query_params_, "#", parsed.ref());
}

return true;
}

bool Url::initializeForConnect(GURL&& url) {
// CONNECT requests can only contain "hostname:port"
// https://github.com/nodejs/http-parser/blob/d9275da4650fd1133ddc96480df32a9efe4b059b/http_parser.c#L2503-L2506.
if (!url.is_valid() || url.IsStandard()) {
return false;
}

const auto& parsed = url.parsed_for_possibly_invalid_spec();
// The parsed.scheme contains the URL's hostname (stored by GURL). While host and port have -1
// as its length.
if (parsed.scheme.len <= 0 || parsed.host.len > 0 || parsed.port.len > 0) {
return false;
}

host_and_port_ = url.possibly_invalid_spec();
const auto& parts = StringUtil::splitToken(host_and_port_, ":", /*keep_empty_string=*/true,
/*trim_whitespace=*/false);
if (parts.size() != 2 || static_cast<size_t>(parsed.scheme.len) != parts.at(0).size() ||
!validPortForConnect(parts.at(1))) {
return false;
}

return true;
}
Comment on lines +57 to +80
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk I'm not sure if this is the right thing to do for making sure CONNECT works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm a bit confused why EXPECT_FALSE(url.initialize("foo.com:65536", is_connect)); fails. Could be I'm insufficiently caffeinated though :-P

Also as long as I'm looking, do we think this should be runtime guarded, even with a shorter duration? I could imagine this having behavioral changes in weird corner cases though ideally it should be good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

65536 is greater than std::numeric_limits<uint16_t>::max(). Not in 1-65535 valid port range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: Runtime guard. SGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. maybe comments there but yeah, otherwise this looks fine.
If folks prefer we could also split out connect validation into its own class since it's a weird one-off but I suspect it'll be useful to have a consistent API and this how the old class worked so LGTM!


bool Url::validPortForConnect(absl::string_view port_string) {
int port;
const bool valid = absl::SimpleAtoi(port_string, &port);
// Only a port value in valid range (1-65535) is allowed.
if (!valid || port <= 0 || port > std::numeric_limits<uint16_t>::max()) {
return false;
}
port_ = static_cast<uint16_t>(port);
return true;
}

} // namespace Utility
} // namespace Http
} // namespace Envoy
Loading