Skip to content

Conversation

theshubhamp
Copy link
Contributor

@theshubhamp theshubhamp commented Aug 23, 2021

Added a method to HTTP Utilities to parse out cookies into a map.

Changed oauth2 extension to lookups from a cookie map to reduce
redundant scanning of cookies.

Requested as part of PR raised for

Risk Level: Low
Testing: Existing tests pass & added a new test for coverage.
Docs Changes: N/A
Release Notes: N/A

Requested as part of the PR #17721
Related to #17424

Signed-off-by: Shubham Patil theshubhamp@gmail.com

Added a method to HTTP Utilities to parse out cookies into a map.

Changed oauth2 extension to lookups from a cookie map to reduce
redundant scanning of cookies.

Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
@repokitteh-read-only
Copy link

Hi @theshubhamp, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #17811 was opened by theshubhamp.

see: more, trace.

expires_ = Http::Utility::parseCookieValue(headers, "OauthExpires");
token_ = Http::Utility::parseCookieValue(headers, "BearerToken");
hmac_ = Http::Utility::parseCookieValue(headers, "OauthHMAC");
const auto& cookies = Http::Utility::parseCookies(headers);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation parses all cookies into a map even when we just need a subset.

Should parseCookies(...) be enhanced to accept a filter to filter down on keys at the time of parsing ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if possible performance gain can balance increased code complexity. I'd rather keep it as it is for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that instantiates many small strings, I think this should pass a lambda like the following to forEachCookies:

if (k == "OauthExpires") {
  expires_ = v;
}
...

Copy link
Contributor Author

@theshubhamp theshubhamp Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (k == "OauthExpires") {
  expires_ = v;
}
...

This may require consumers of forEachCookies to worry about duplicate cookie semantics discussed on this thread #17811 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this variant of parseCookies that accepts a key filter lambda ? It can support both static checks like these ^ OR something more dynamic like checking set / map membership.

absl::flat_hash_map<std::string, std::string>
Utility::parseCookies(const RequestHeaderMap& headers, const std::function<bool(absl::string_view)>& key_filter) {
  absl::flat_hash_map<std::string, std::string> cookies;

  forEachCookie(headers, Http::Headers::get().Cookie,
                [&cookies, &key_filter](absl::string_view k, absl::string_view v) -> bool {
                  if (key_filter(k)) {
                    cookies.emplace(k, v);
                  }

                  // continue iterating until all cookies are processed.
                  return true;
                });

  return cookies;
}

An overload can be added that that defaults key_filter to a lambda that always returns true to provide a parse all cookies fallback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the latter sounds good to me if we care about uniform handling of duplicate cookies. Checking for set membership may be even faster than the static string comparisons.

Copy link
Contributor Author

@theshubhamp theshubhamp Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Its being used like this in the oauth2 filter now:

...
  const auto& cookies = Http::Utility::parseCookies(headers, [](absl::string_view key) -> bool {
    return key == "OauthExpires" || key == "BearerToken" || key == "OauthHMAC";
  });
...

Kept static comparisons inside the predicate thinking that serial = comparisons should be (a little ?) faster than using a set which requires hash computation before every check (if I am not mistaken)

@theshubhamp
Copy link
Contributor Author

cc: @rojkov & @lizan for the review comments you had on my linked PR

@phlax phlax assigned snowp and rojkov and unassigned snowp Aug 24, 2021
Copy link
Member

@rojkov rojkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some comments.

/wait

// If the key matches, parse the value from the rest of the cookie string.
if (k == key) {
void forEachCookie(const HeaderMap& headers, const LowerCaseString& cookie_header,
const std::function<bool (const absl::string_view&, const absl::string_view&)> cookie_consumer) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::string_view should be passed by value as the spec suggests. No need to add const and &.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I have dropped const and & as recommended in the spec

return value;
}

std::map<std::string, std::string> Utility::parseCookies(const RequestHeaderMap& headers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use absl::flat_hash_map here.

Copy link
Contributor Author

@theshubhamp theshubhamp Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using absl::flat_hash_map

For my curiosity: Why should it be preferred over std::map ? I tried looking at the abseil docs and other sources but could not get a simple answer!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Algorithmic complexity of inserts and lookups is logarithmic for std::map. For hashes it's basically O(1).

expires_ = Http::Utility::parseCookieValue(headers, "OauthExpires");
token_ = Http::Utility::parseCookieValue(headers, "BearerToken");
hmac_ = Http::Utility::parseCookieValue(headers, "OauthHMAC");
const auto& cookies = Http::Utility::parseCookies(headers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if possible performance gain can balance increased code complexity. I'd rather keep it as it is for now.

Comment on lines 143 to 144
const auto expires_it = cookies.find("OauthExpires");
expires_ = expires_it != cookies.end() ? expires_it->second : EMPTY_STRING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace it with a function defined in the anonymous namespace and call the function also for token_ and hmac_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added std::string findValue(...) to unpack the hash_map as we need it here.

Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
Following best practices mentioned on https://abseil.io/docs/cpp/guides/strings#string_view:

string_view objects are very lightweight, so you should always pass them
by value within your methods and functions; don’t pass a const absl::string_view &.
(Passing absl::string_view instead of const absl::string_view & has the
same algorithmic complexity, but because of register allocation and parameter
passing rules, it is generally faster to pass by value in this case.)

Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
return std::string{result};
// Iterate over each cookie & return if its value is not empty.
forEachCookie(headers, cookie, [&key, &value](absl::string_view k, absl::string_view v) -> bool {
if (key == k && !v.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this new code implements different behavior and theoretically may break certain setups. Consider the case of

Cookie: a=; b=1; a=2
Cookie: a=3

The old parseCookie(headers, "a", "cookie") would return 3. The new code returns 2 if I'm not mistaken.

I think it makes sense to add a test for this case and make sure it passes for both versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this new version is not equivalent to the old one. I'll try to make both the variants (single-cookie and map) compatible with the old behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note it appears that even the old behaviours haven't been consistent over time.

For example if we use this new test as a reference:

TEST(HttpUtility, TestParseCookieDuplicates) {
  TestRequestHeaderMapImpl headers{
      {"someheader", "10.0.0.1"},
      {"cookie", "a=; b=1; a=2"},
      {"cookie", "a=3; b=2"}};

  EXPECT_EQ(Utility::parseCookieValue(headers, "a"), "3");
  EXPECT_EQ(Utility::parseCookieValue(headers, "b"), "1");
}

It fails before #17560 landed (changed header iter. from reverse to natural order)

# git status
HEAD detached at b57827c20

# bazel test //test/common/http:utility_test
...
NFO: 12 processes: 1 internal, 11 darwin-sandbox.
INFO: Build completed, 1 test FAILED, 12 total actions
//test/common/http:utility_test                                          FAILED in 2.5s

# less /private/var/tmp/_bazel_shubhamp/306aac4fa57fcfade8ca38e576399a40/execroot/envoy/bazel-out/darwin-fastbuild/testlogs/test/common/http/utility_test/test.log
...
[ RUN      ] HttpUtility.TestParseCookieDuplicates
test/common/http/utility_test.cc:559: Failure
Expected equality of these values:
  Utility::parseCookieValue(headers, "b")
    Which is: "2"
  "1"
Stack trace:
  0x10b50f3aa: Envoy::Http::HttpUtility_TestParseCookieDuplicates_Test::TestBody()
  0x1148c1fc4: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x114897a2b: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x114897963: testing::Test::Run()
  0x114898977: testing::TestInfo::Run()
... Google Test internal frames ...

But passes on main:

# less less /private/var/tmp/_bazel_shubhamp/306aac4fa57fcfade8ca38e576399a40/execroot/envoy/bazel-out/darwin-fastbuild/testlogs/test/common/http/utility_test/test.log
...
[ RUN      ] HttpUtility.TestParseCookieDuplicates
[       OK ] HttpUtility.TestParseCookieDuplicates (0 ms)

Copy link
Contributor Author

@theshubhamp theshubhamp Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behaviour on main (and the older version before it got changed) is a little inconsistent because it:

  • picks the first matching cookie from a single cookie header (i.e. resolves a's value to an empty string if the header value looks like: "a=; b=1; a=2")
  • but exhaustively searches remaining cookie headers until a non empty value is found.

And this ^ makes parsing out duplicate cookies spilt between multiple cookie headers fun.

Copy link
Contributor Author

@theshubhamp theshubhamp Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on how to proceed here ?

  1. Keep the behaviour from main as-is
  2. Pick the first cookie value OR first non-empty cookie value
  3. Pick the last cookie value OR last non-empty cookie value

I'm inclined towards approaches 2 or 3 because it keeps the parsing simple in the long run - but this can break some setups.

RFC 6265 - HTTP State Management Mechanism itself is pretty ambiguous about duplicate cookies and ordering:

   Although cookies are serialized linearly in the Cookie header,
   servers SHOULD NOT rely upon the serialization order.  In particular,
   if the Cookie header contains two cookies with the same name (e.g.,
   that were set with different Path or Domain attributes), servers
   SHOULD NOT rely upon the order in which these cookies appear in the
   header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... That's a good catch! If the spec explicitly states that the order is unimportant then just pick the first value irrespective of its emptiness for the sake of perf.

A second reviewer may give a second opinion on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to take the first value for perf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, switched to picking up the first occurrence of a cookie!

Added tests so that this behavioural quirk has coverage.

@rojkov
Copy link
Member

rojkov commented Aug 25, 2021

/wait

return std::string{result};
// Iterate over each cookie & return if its value is not empty.
forEachCookie(headers, cookie, [&key, &value](absl::string_view k, absl::string_view v) -> bool {
if (key == k && !v.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to take the first value for perf.

return v;

bool continue_iteration = cookie_consumer(k, v);
if (!continue_iteration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!continue_iteration) {
if (!cookie_consumer(k, v)) {

then you don't need continue_iteration.

Copy link
Contributor Author

@theshubhamp theshubhamp Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, moved the invocation inside if condition itself

expires_ = Http::Utility::parseCookieValue(headers, "OauthExpires");
token_ = Http::Utility::parseCookieValue(headers, "BearerToken");
hmac_ = Http::Utility::parseCookieValue(headers, "OauthHMAC");
const auto& cookies = Http::Utility::parseCookies(headers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that instantiates many small strings, I think this should pass a lambda like the following to forEachCookies:

if (k == "OauthExpires") {
  expires_ = v;
}
...

Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
…in headers + Tests for coverage

Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
…2 ext

Signed-off-by: Shubham Patil <theshubhamp@gmail.com>
Copy link
Member

@rojkov rojkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good to me.

@mattklein123 mattklein123 merged commit d658d02 into envoyproxy:main Aug 27, 2021
@theshubhamp theshubhamp deleted the theshubhamp/cookie-map-refactor branch September 3, 2021 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants