-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: Use GURL as HTTP URL parser utility #11670
Changes from all commits
ab13d65
d1eface
9f24a3d
e665abf
74f39e1
9faa65e
69f8761
4d375a1
d3b56ca
10fd07c
b922a45
88fc88c
e287563
15f008c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# TODO(dio): Consider to remove this patch when we have the ability to compile the project using | ||
# clang-cl. Tracked in https://github.com/envoyproxy/envoy/issues/11974. | ||
|
||
diff --git a/base/compiler_specific.h b/base/compiler_specific.h | ||
index 2962537..6193b56 100644 | ||
--- a/base/compiler_specific.h | ||
+++ b/base/compiler_specific.h | ||
@@ -7,10 +7,6 @@ | ||
|
||
#include "build/build_config.h" | ||
|
||
-#if defined(COMPILER_MSVC) && !defined(__clang__) | ||
-#error "Only clang-cl is supported on Windows, see https://crbug.com/988071" | ||
-#endif | ||
- | ||
// Annotate a variable indicating it's ok if the variable is not used. | ||
// (Typically used to silence a compiler warning when the assignment | ||
// is important for some other reason.) | ||
@@ -212,7 +208,9 @@ | ||
#endif | ||
#endif | ||
|
||
-#if defined(__clang__) && __has_attribute(uninitialized) | ||
+#if defined(__clang__) | ||
+#if defined(__has_attribute) | ||
+#if __has_attribute(uninitialized) | ||
// Attribute "uninitialized" disables -ftrivial-auto-var-init=pattern for | ||
// the specified variable. | ||
// Library-wide alternative is | ||
@@ -243,6 +241,8 @@ | ||
// E.g. platform, bot, benchmark or test name in patch description or next to | ||
// the attribute. | ||
#define STACK_UNINITIALIZED __attribute__((uninitialized)) | ||
+#endif | ||
+#endif | ||
#else | ||
#define STACK_UNINITIALIZED | ||
#endif | ||
diff --git a/base/strings/BUILD b/base/strings/BUILD | ||
index 7a06170..7c86a5f 100644 | ||
--- a/base/strings/BUILD | ||
+++ b/base/strings/BUILD | ||
@@ -6,23 +6,21 @@ load("//:build_config.bzl", "build_config") | ||
cc_library( | ||
name = "strings", | ||
srcs = [ | ||
- "string16.cc", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I trust this patch is operating entirely on ISO-8859-1 with utf-8 enablement? Should be no wide char anything, this isn't java. |
||
"string_piece.cc", | ||
"string_util.cc", | ||
"string_util_constants.cc", | ||
"utf_string_conversion_utils.cc", | ||
"utf_string_conversions.cc", | ||
- ], | ||
+ ] + build_config.strings_srcs, | ||
hdrs = [ | ||
"char_traits.h", | ||
"string16.h", | ||
"string_piece.h", | ||
"string_piece_forward.h", | ||
"string_util.h", | ||
- "string_util_posix.h", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, I'm still not super happy with the size of this patch. @wrowe @sunjayBhatia do you have any thoughts on what we can do to avoid having to hand craft this in Envoy? If @dio merges, would you folks be able to followup with upstream to eliminate this patch? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we put this review on full pause until we have 11107 merged, and rebased? That will pick up //test/... and we'll be in a better position to give this our full attention, since regressions are taking up most of our free cycles for the past days. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM. I leave it to @htuch to decide. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, we can wait for #11107. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So today we should see 11107 merged. We did go ahead this week and build dio's branch locally with //test/... on Windows, and didn't observe any significant or outright failures, although there are several compilation emits that should be addressed before we merge this to master. This will become apparent to all when it is rebuilt after 11107. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to make things very clear, we are patching the deliberate exclusion of one particular architecture, which served no purpose and discouraged development by any interested contributor. I don't know how a patch to revert such a silly commit would be received at the upstream community, so we can make no promises how long a patch will have to live at envoy, @htuch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #11107 is merged and this PR has picked it up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great. I'm wondering if we can get some agreement on the long term ownership of this patch. From my perspective, a patch this size is not sustainable in Envoy and that can't be the long term strategy; this will impact our velocity and add friction to upgrading to new versions of GURL in response to security issues. OTOH, I'm eager to move forward and we can accept a patch in the interim. @dio would it make sense to just try upstream this and see what happens? |
||
"utf_string_conversion_utils.h", | ||
"utf_string_conversions.h", | ||
- ], | ||
+ ] + build_config.strings_hdrs, | ||
copts = build_config.default_copts, | ||
visibility = ["//visibility:public"], | ||
deps = [ | ||
diff --git a/build_config.bzl b/build_config.bzl | ||
index d5fca65..fc0d7e5 100644 | ||
--- a/build_config/build_config.bzl | ||
+++ b/build_config/build_config.bzl | ||
@@ -1,8 +1,25 @@ | ||
-_default_copts = [ | ||
- "-std=c++14", | ||
- "-fno-strict-aliasing", | ||
-] | ||
+_default_copts = select({ | ||
+ "@envoy//bazel:windows_x86_64": [ | ||
+ "/std:c++14", | ||
+ ], | ||
+ "//conditions:default": [ | ||
+ "-std=c++14", | ||
+ "-fno-strict-aliasing", | ||
+ ], | ||
+}) | ||
+ | ||
+_strings_srcs = select({ | ||
+ "@envoy//bazel:windows_x86_64": [], | ||
+ "//conditions:default": ["string16.cc"], | ||
+}) | ||
+ | ||
+_strings_hdrs = select({ | ||
+ "@envoy//bazel:windows_x86_64": ["string_util_win.h"], | ||
+ "//conditions:default": ["string_util_posix.h"], | ||
+}) | ||
|
||
build_config = struct( | ||
default_copts = _default_copts, | ||
+ strings_srcs = _strings_srcs, | ||
+ strings_hdrs = _strings_hdrs, | ||
) | ||
diff --git a/url/BUILD b/url/BUILD | ||
index 0126bdc..5d1a171 100644 | ||
--- a/url/BUILD | ||
+++ b/url/BUILD | ||
@@ -43,11 +43,11 @@ cc_library( | ||
"url_util.h", | ||
], | ||
copts = build_config.default_copts, | ||
- linkopts = ["-licuuc"], | ||
htuch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
visibility = ["//visibility:public"], | ||
deps = [ | ||
"//base", | ||
"//base/strings", | ||
"//polyfills", | ||
+ "@org_unicode_icuuc//:common", | ||
], | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
load("@rules_cc//cc:defs.bzl", "cc_library") | ||
|
||
licenses(["notice"]) # Apache 2 | ||
|
||
exports_files([ | ||
"icu4c/LICENSE", | ||
"icu4j/main/shared/licenses/LICENSE", | ||
]) | ||
|
||
icuuc_copts = [ | ||
"-DU_STATIC_IMPLEMENTATION", | ||
"-DU_COMMON_IMPLEMENTATION", | ||
"-DU_HAVE_STD_ATOMICS", | ||
] + select({ | ||
"@envoy//bazel:apple": [ | ||
"-Wno-shorten-64-to-32", | ||
"-Wno-unused-variable", | ||
], | ||
"@envoy//bazel:windows_x86_64": [ | ||
"/utf-8", | ||
"/DLOCALE_ALLOW_NEUTRAL_NAMES=0", | ||
], | ||
# TODO(dio): Add "@envoy//bazel:android" when we have it. | ||
# "@envoy//bazel:android": [ | ||
# "-fdata-sections", | ||
# "-DU_HAVE_NL_LANGINFO_CODESET=0", | ||
# "-Wno-deprecated-declarations", | ||
# ], | ||
"//conditions:default": [], | ||
}) | ||
|
||
cc_library( | ||
name = "headers", | ||
hdrs = glob(["icu4c/source/common/unicode/*.h"]), | ||
includes = ["icu4c/source/common"], | ||
visibility = ["//visibility:public"], | ||
) | ||
|
||
cc_library( | ||
name = "common", | ||
hdrs = glob(["icu4c/source/common/unicode/*.h"]), | ||
includes = ["icu4c/source/common"], | ||
visibility = ["//visibility:public"], | ||
deps = [":icuuc"], | ||
) | ||
|
||
cc_library( | ||
name = "icuuc", | ||
srcs = glob([ | ||
"icu4c/source/common/*.c", | ||
"icu4c/source/common/*.cpp", | ||
"icu4c/source/stubdata/*.cpp", | ||
]), | ||
hdrs = glob(["icu4c/source/common/*.h"]), | ||
copts = icuuc_copts, | ||
tags = ["requires-rtti"], | ||
visibility = ["//visibility:private"], | ||
deps = [":headers"], | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @htuch: This is an autoconf dependency, can we just use Ref: #10599 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused, what is an autoconf dependency? If one is introduced by this patch, that will break the contribution of all four engineers working on the Windows port by breaking master. The one I am familiar with is the opentracing dependency which we have excluded for now, but that only breaks tracing extensions. Breaking http/1 seems unwise. Otherwise the basic idea of this bazel BUILD hack looks entirely reasonable, and there is no expectation it would be fixed upstream, because we are cherry-picking a project for subcomponents that don't have any independent project management and don't appear to be supported stand-alone. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm OK with this BUILD (as it isn't a huge patch). In terms of being an autoconf dependency, I'm basically saying it should build under
Are you saying in the future, we can't take any external dependency that is autoconf based? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct, autoconf is gnu tooling and while it runs in tandem the pseudo-unix cygwin toolchain there is no Windows port, and it's likely to be a thorn in the side of other odd SDKs on mobile and such. Most every project building on autoconf offers an alternative mechanism for some windows build, however well thought out and supported, or not. external/org_unicode_icuuc/icu4c/source/allinone/ is the Windows build logic by the icu4c maintainers. Most projects are moving in the direction of supporting CMake for a much simpler cross-platform experience, even as they retain autoconf. This makes the build agnostic of the flavor of Visual Studio and even msvc vs llvm clang-cl. For Envoy, maintaining a bazel BUILD file for this project makes perfect sense, since the lib objects all match the compiler and link choices of this project. We need to evaluate what the autoconf results produce in terms of source file toggles and changes, even if we avoid these long-term. For upstream, propose a CMakeFiles.txt to cover their existing autoconf determinations. Our rules_foreign_cc support in bazel is much better with CMake than less-predictable autoconf output. This probably results in the most flexibility to the maintainers and would be our simplest solution short of this short-term BUILD maintenance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wrowe can you propose some amendment to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md based on this? I don't think it sounds unreasonable, but we should take this through a review process and ensure folks are aligned, as it's a major project restriction. How come gperftools isn't impacting you? Don't you use tcmalloc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tcmalloc is disabled on Windows if you look at .bazelrc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. gperftools as of 2015 was being ported to Windows using the tcmalloc_minimal implementation. We have have documented this exclusion for Windows both in .bazelrc and in bazel/foreign_cc/BUILD. Until bazel test //test/... is largely building clean with all currently tagged "fails_on_windows", gperftools hasn't been and won't be a particularly high priority. The same is true for opentracing and all tracing extensions on Windows. THE major difference between all that and this proposed enhancement is that the envoyproxy is really nothing without a url parser. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is fine, but I would like to get this down in writing, so that we can ensure everyone is on-board and don't relitigate this each time we add a new dependency. Can you do the PR to https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md arguing to remove autoconf in rules_foreign_cc for Windows reasons for all deps that are core across platforms? |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -477,4 +477,11 @@ DEPENDENCY_REPOSITORIES = dict( | |
urls = ["https://github.com/dpkp/kafka-python/archive/2.0.0.tar.gz"], | ||
use_category = ["test"], | ||
), | ||
org_unicode_icuuc = dict( | ||
strip_prefix = "icu-release-64-2", | ||
sha256 = "524960ac99d086cdb6988d2a92fc163436fd3c6ec0a84c475c6382fbf989be05", | ||
urls = ["https://github.com/unicode-org/icu/archive/release-64-2.tar.gz"], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dio is there any reason you picked this April 2019 rather than the latest https://github.com/unicode-org/icu/releases/tag/release-67-1 ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The BUILD file is inspired from Tensorflow project as known to be stable. Will try to update along the way. Thanks for the reminder! https://github.com/tensorflow/tensorflow/blob/master/third_party/icu/workspace.bzl. Tracked in here. #12015 |
||
use_category = ["dataplane"], | ||
cpe = "cpe:2.3:a:icu-project:international_components_for_unicode", | ||
), | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
#include "common/http/url_utility.h" | ||
|
||
#include <sys/types.h> | ||
|
||
#include <iostream> | ||
#include <string> | ||
|
||
#include "common/common/assert.h" | ||
#include "common/common/empty_string.h" | ||
#include "common/common/utility.h" | ||
|
||
#include "absl/strings/numbers.h" | ||
#include "absl/strings/str_cat.h" | ||
|
||
namespace Envoy { | ||
namespace Http { | ||
namespace Utility { | ||
|
||
bool Url::initialize(absl::string_view absolute_url, bool is_connect) { | ||
htuch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// TODO(dio): When we have access to base::StringPiece, probably we can convert absolute_url to | ||
// that instead. | ||
GURL parsed(std::string{absolute_url}); | ||
if (is_connect) { | ||
return initializeForConnect(std::move(parsed)); | ||
} | ||
|
||
// TODO(dio): Check if we need to accommodate to strictly validate only http(s) AND ws(s) schemes. | ||
// Currently, we only accept http(s). | ||
if (!parsed.is_valid() || !parsed.SchemeIsHTTPOrHTTPS()) { | ||
return false; | ||
} | ||
|
||
scheme_ = parsed.scheme(); | ||
|
||
// Only non-default ports will be rendered as part of host_and_port_. For example, | ||
// http://www.host.com:80 has port component (i.e. 80). However, since 80 is a default port for | ||
// http scheme, host_and_port_ will be rendered as www.host.com (without port). The same case with | ||
// https scheme (with port 443) as well. | ||
host_and_port_ = | ||
absl::StrCat(parsed.host(), parsed.has_port() ? ":" : EMPTY_STRING, parsed.port()); | ||
|
||
const int port = parsed.EffectiveIntPort(); | ||
if (port <= 0 || port > std::numeric_limits<uint16_t>::max()) { | ||
return false; | ||
} | ||
port_ = static_cast<uint16_t>(port); | ||
|
||
// RFC allows the absolute URI to not end in "/", but the absolute path form must start with "/". | ||
path_and_query_params_ = parsed.PathForRequest(); | ||
if (parsed.has_ref()) { | ||
absl::StrAppend(&path_and_query_params_, "#", parsed.ref()); | ||
} | ||
|
||
return true; | ||
} | ||
|
||
bool Url::initializeForConnect(GURL&& url) { | ||
// CONNECT requests can only contain "hostname:port" | ||
// https://github.com/nodejs/http-parser/blob/d9275da4650fd1133ddc96480df32a9efe4b059b/http_parser.c#L2503-L2506. | ||
if (!url.is_valid() || url.IsStandard()) { | ||
return false; | ||
} | ||
|
||
const auto& parsed = url.parsed_for_possibly_invalid_spec(); | ||
// The parsed.scheme contains the URL's hostname (stored by GURL). While host and port have -1 | ||
// as its length. | ||
if (parsed.scheme.len <= 0 || parsed.host.len > 0 || parsed.port.len > 0) { | ||
return false; | ||
} | ||
|
||
host_and_port_ = url.possibly_invalid_spec(); | ||
const auto& parts = StringUtil::splitToken(host_and_port_, ":", /*keep_empty_string=*/true, | ||
/*trim_whitespace=*/false); | ||
if (parts.size() != 2 || static_cast<size_t>(parsed.scheme.len) != parts.at(0).size() || | ||
!validPortForConnect(parts.at(1))) { | ||
return false; | ||
} | ||
|
||
return true; | ||
} | ||
Comment on lines
+57
to
+80
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alyssawilk I'm not sure if this is the right thing to do for making sure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I'm a bit confused why EXPECT_FALSE(url.initialize("foo.com:65536", is_connect)); fails. Could be I'm insufficiently caffeinated though :-P Also as long as I'm looking, do we think this should be runtime guarded, even with a shorter duration? I could imagine this having behavioral changes in weird corner cases though ideally it should be good There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 65536 is greater than There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. re: Runtime guard. SGTM. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, gotcha. maybe comments there but yeah, otherwise this looks fine. |
||
|
||
bool Url::validPortForConnect(absl::string_view port_string) { | ||
int port; | ||
const bool valid = absl::SimpleAtoi(port_string, &port); | ||
// Only a port value in valid range (1-65535) is allowed. | ||
if (!valid || port <= 0 || port > std::numeric_limits<uint16_t>::max()) { | ||
return false; | ||
} | ||
port_ = static_cast<uint16_t>(port); | ||
return true; | ||
} | ||
|
||
} // namespace Utility | ||
} // namespace Http | ||
} // namespace Envoy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we put
/Za
when compiling on Windows, MSVC's C4067 warning is treated as an error.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it interesting given that googletest itself supports msvc, but does not support clang-cl.