Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http, utility: Fix unicode URL parsing #12324

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 30 additions & 13 deletions bazel/external/googleurl.patch
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# clang-cl. Tracked in https://github.com/envoyproxy/envoy/issues/11974.

diff --git a/base/compiler_specific.h b/base/compiler_specific.h
index 2962537..6193b56 100644
index 0cd36dc..8fc4db0 100644
--- a/base/compiler_specific.h
+++ b/base/compiler_specific.h
@@ -7,10 +7,6 @@
Expand All @@ -16,7 +16,7 @@ index 2962537..6193b56 100644
// Annotate a variable indicating it's ok if the variable is not used.
// (Typically used to silence a compiler warning when the assignment
// is important for some other reason.)
@@ -212,7 +208,9 @@
@@ -226,7 +222,9 @@
#endif
#endif

Expand All @@ -27,7 +27,7 @@ index 2962537..6193b56 100644
// Attribute "uninitialized" disables -ftrivial-auto-var-init=pattern for
// the specified variable.
// Library-wide alternative is
@@ -243,6 +241,8 @@
@@ -257,6 +255,8 @@
// E.g. platform, bot, benchmark or test name in patch description or next to
// the attribute.
#define STACK_UNINITIALIZED __attribute__((uninitialized))
Expand All @@ -37,10 +37,10 @@ index 2962537..6193b56 100644
#define STACK_UNINITIALIZED
#endif
diff --git a/base/strings/BUILD b/base/strings/BUILD
index 7a06170..7c86a5f 100644
index 604cc81..87fe7ce 100644
--- a/base/strings/BUILD
+++ b/base/strings/BUILD
@@ -6,23 +6,21 @@ load("//:build_config.bzl", "build_config")
@@ -6,13 +6,12 @@ load("//build_config:build_config.bzl", "build_config")
cc_library(
name = "strings",
srcs = [
Expand All @@ -55,9 +55,10 @@ index 7a06170..7c86a5f 100644
hdrs = [
"char_traits.h",
"string16.h",
"string_piece.h",
@@ -20,10 +19,9 @@ cc_library(
"string_piece_forward.h",
"string_util.h",
"string_util_internal.h",
- "string_util_posix.h",
"utf_string_conversion_utils.h",
"utf_string_conversions.h",
Expand All @@ -66,11 +67,11 @@ index 7a06170..7c86a5f 100644
copts = build_config.default_copts,
visibility = ["//visibility:public"],
deps = [
diff --git a/build_config.bzl b/build_config.bzl
index d5fca65..fc0d7e5 100644
diff --git a/build_config/build_config.bzl b/build_config/build_config.bzl
index d5fca65..dcada36 100644
--- a/build_config/build_config.bzl
+++ b/build_config/build_config.bzl
@@ -1,8 +1,25 @@
@@ -1,8 +1,32 @@
-_default_copts = [
- "-std=c++14",
- "-fno-strict-aliasing",
Expand All @@ -93,18 +94,34 @@ index d5fca65..fc0d7e5 100644
+_strings_hdrs = select({
+ "@envoy//bazel:windows_x86_64": ["string_util_win.h"],
+ "//conditions:default": ["string_util_posix.h"],
+})
+
+_icuuc_deps = select({
+ "@envoy//bazel:apple": ["@org_unicode_icuuc//:common"],
+ "@envoy//bazel:windows_x86_64": ["@org_unicode_icuuc//:common"],
+ "//conditions:default": ["@envoy//bazel/foreign_cc:icuuc"],
+})

build_config = struct(
default_copts = _default_copts,
+ strings_srcs = _strings_srcs,
+ strings_hdrs = _strings_hdrs,
+ icuuc_deps = _icuuc_deps,
)
diff --git a/url/BUILD b/url/BUILD
index 0126bdc..5d1a171 100644
index e9081d9..9d5929f 100644
--- a/url/BUILD
+++ b/url/BUILD
@@ -43,11 +43,11 @@ cc_library(
@@ -1,6 +1,8 @@
# Copyright 2019 The Chromium Authors. All rights reserved.
# Use of this source code is governed by a BSD-style license that can be
# found in the LICENSE file.
+load("@rules_cc//cc:defs.bzl", "cc_library")
+
load("//build_config:build_config.bzl", "build_config")

cc_library(
@@ -43,11 +45,10 @@ cc_library(
"url_util.h",
],
copts = build_config.default_copts,
Expand All @@ -114,6 +131,6 @@ index 0126bdc..5d1a171 100644
"//base",
"//base/strings",
"//polyfills",
+ "@org_unicode_icuuc//:common",
],
- ],
+ ] + build_config.icuuc_deps
)
20 changes: 9 additions & 11 deletions bazel/external/icuuc.BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@ load("@rules_cc//cc:defs.bzl", "cc_library")
licenses(["notice"]) # Apache 2

exports_files([
"icu4c/LICENSE",
"icu4j/main/shared/licenses/LICENSE",
"LICENSE",
])

icuuc_copts = [
Expand All @@ -31,29 +30,28 @@ icuuc_copts = [

cc_library(
name = "headers",
hdrs = glob(["icu4c/source/common/unicode/*.h"]),
includes = ["icu4c/source/common"],
hdrs = glob(["source/common/unicode/*.h"]),
includes = ["source/common"],
visibility = ["//visibility:public"],
)

cc_library(
name = "common",
hdrs = glob(["icu4c/source/common/unicode/*.h"]),
includes = ["icu4c/source/common"],
hdrs = glob(["source/common/unicode/*.h"]),
includes = ["source/common"],
visibility = ["//visibility:public"],
deps = [":icuuc"],
)

cc_library(
name = "icuuc",
srcs = glob([
"icu4c/source/common/*.c",
"icu4c/source/common/*.cpp",
"icu4c/source/stubdata/*.cpp",
"source/common/**/*.c",
"source/common/**/*.cpp",
"source/stubdata/*.cpp",
]),
hdrs = glob(["icu4c/source/common/*.h"]),
hdrs = glob(["source/common/*.h"]),
copts = icuuc_copts,
tags = ["requires-rtti"],
visibility = ["//visibility:private"],
deps = [":headers"],
)
40 changes: 40 additions & 0 deletions bazel/foreign_cc/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -236,3 +236,43 @@ envoy_cmake_external(
"//conditions:default": ["libz.a"],
}),
)

configure_make(
name = "icuuc",
configure_command = "source/configure",
configure_env_vars = select({
"//bazel:windows_x86_64": {
"CXXFLAGS": "/utf-8 /DLOCALE_ALLOW_NEUTRAL_NAMES=0 /std:c++11",
},
"//bazel:apple": {
"CXXFLAGS": "-Wno-shorten-64-to-32 -Wno-unused-variable -std=c++11 -DU_STATIC_IMPLEMENTATION -DU_HAVE_STD_ATOMICS",
"AR": "/usr/bin/ar",
},
"//conditions:default": {
"CXXFLAGS": "-std=c++11 -DU_STATIC_IMPLEMENTATION -DU_HAVE_STD_ATOMICS",
},
}),
configure_options = [
"--enable-option-checking",
"--enable-static",
"--enable-tools",
"--disable-shared",
"--disable-dyload",
"--disable-extras",
"--disable-plugins",
"--disable-tests",
"--disable-samples",
"--with-data-packaging=static",
],
lib_source = "@org_unicode_icuuc_default//:all",
static_libraries = select({
"//bazel:windows_x86_64": [
"libicuuc.lib",
"libicudata.lib",
],
"//conditions:default": [
"libicuuc.a",
"libicudata.a",
],
}),
)
17 changes: 17 additions & 0 deletions bazel/foreign_cc/icuuc.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
--- source/icudefs.mk.in
+++ source/icudefs.mk.in
@@ -117,7 +117,7 @@
CC = @CC@
CXX = @CXX@
AR = @AR@
-ARFLAGS = @ARFLAGS@ r
+ARFLAGS = @ARFLAGS@
RANLIB = @RANLIB@
COMPILE_LINK_ENVVAR = @COMPILE_LINK_ENVVAR@
UCLN_NO_AUTO_CLEANUP = @UCLN_NO_AUTO_CLEANUP@

--- source/config/mh-darwin
+++ source/config/mh-darwin
@@ -22,1 +22,1 @@
-ARFLAGS += -c
+ARFLAGS += -crs
12 changes: 8 additions & 4 deletions bazel/repositories.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -862,15 +862,19 @@ filegroup(
)

def _org_unicode_icuuc():
location = _get_location("org_unicode_icuuc")
http_archive(
name = "org_unicode_icuuc_default",
build_file_content = BUILD_ALL_CONTENT,
patches = ["@envoy//bazel/foreign_cc:icuuc.patch"],
**location
)

_repository_impl(
name = "org_unicode_icuuc",
build_file = "@envoy//bazel/external:icuuc.BUILD",
# TODO(dio): Consider patching udata when we need to embed some data.
)
native.bind(
name = "icuuc",
actual = "@org_unicode_icuuc//:common",
)

def _foreign_cc_dependencies():
_repository_impl("rules_foreign_cc")
Expand Down
12 changes: 6 additions & 6 deletions bazel/repository_locations.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -415,9 +415,9 @@ DEPENDENCY_REPOSITORIES = dict(
cpe = "N/A",
),
com_googlesource_googleurl = dict(
# Static snapshot of https://quiche.googlesource.com/quiche/+archive/googleurl_6dafefa72cba2ab2ba4922d17a30618e9617c7cf.tar.gz
sha256 = "f1ab73ddd1a7db4e08a9e4db6c2e98e5a0a7bbaca08f5fee0d73adb02c24e44a",
urls = ["https://storage.googleapis.com/quiche-envoy-integration/googleurl_6dafefa72cba2ab2ba4922d17a30618e9617c7cf.tar.gz"],
sha256 = "b8ca343feb9aedf29259f25169deb70f93a151a2e4713be97d0da2c5e3737e18",
strip_prefix = "quiche-googleurl-1bd6372fe3d267b2790ba76a49290b8a0b2df46c",
urls = ["https://github.com/dio/quiche-googleurl/archive/1bd6372fe3d267b2790ba76a49290b8a0b2df46c.tar.gz"],
Copy link
Member Author

@dio dio Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this when we have an agreement on how we do snapshot for googleurl release.

use_category = ["dataplane"],
cpe = "N/A",
),
Expand Down Expand Up @@ -475,9 +475,9 @@ DEPENDENCY_REPOSITORIES = dict(
use_category = ["test"],
),
org_unicode_icuuc = dict(
strip_prefix = "icu-release-64-2",
sha256 = "524960ac99d086cdb6988d2a92fc163436fd3c6ec0a84c475c6382fbf989be05",
urls = ["https://github.com/unicode-org/icu/archive/release-64-2.tar.gz"],
strip_prefix = "icu",
sha256 = "94a80cd6f251a53bd2a997f6f1b5ac6653fe791dfab66e1eb0227740fb86d5dc",
urls = ["https://github.com/unicode-org/icu/releases/download/release-67-1/icu4c-67_1-src.tgz"],
use_category = ["dataplane"],
cpe = "cpe:2.3:a:icu-project:international_components_for_unicode",
),
Expand Down
6 changes: 6 additions & 0 deletions test/common/http/utility_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1289,6 +1289,12 @@ TEST(Url, ParsingTest) {
// Test url with non-default ports.
validateUrl("https://www.host.com:8443", "https", "www.host.com:8443", "/", 8443);
validateUrl("http://www.host.com:8080", "http", "www.host.com:8080", "/", 8080);

#if !defined(__APPLE__) && !defined(WIN32)
// TODO(dio): Fix the unicode data (static) linking for macOS and Windows.
// Make sure we can parse unicode URL.
validateUrl("https://\xe5\x85\x89.example/", "https", "xn--54q.example", "/", 443);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what was the prior behavior? Would all unicode URLs be deemed invalid? If so I think this merits guarding in case upstreams can't handle unicode and the change in Envoy validation causes problems. If some unicode URLS passed through previously maybe it doesn't make enough difference to merit the guard. WDYT?

Either way probably worth a release note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would all unicode URLs be deemed invalid?

Unicode host name segments of URI's are invalid, at the transport level.

Unicode -> PunyCode. The resulting URI is a valid hostname by the DNS spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dio In what circumstances we would get unicode URL in non-punycode format? I'm confused why we need handling that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. This is motivated by porting the test case from https://quiche.googlesource.com/googleurl/+/refs/heads/master/test/basic_test.cc#31.

I think It is possible if we want to process input from config or logging. However, if this is not a valid use case. I can skip checking for this one.

#endif
}

void validateConnectUrl(absl::string_view raw_url, absl::string_view expected_host_port,
Expand Down