Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: log local TCP address #454

Merged

Conversation

maxmoehl
Copy link
Member

@maxmoehl maxmoehl commented Nov 8, 2024

Summary

When troubleshooting issues related to keepalive it is often useful to be able to correlate multiple requests made over the same connection. A first step was to introduce the information on whether the request was made on a re-used connection or not which was done with a7dbf84. This commit adds the local address of the TCP connection which adds the capability to distinguish between multiple, parallel keep-alive connections by looking at the client port of gorouter for the backend connection.

Backward Compatibility

Breaking Change? Yes

@maxmoehl
Copy link
Member Author

maxmoehl commented Nov 8, 2024

The big question is: how do we proceed with such log changes which could be considered breaking? I don't think it's a healthy practice to keep adding flags for each and every individual log field we add 🤔

@ameowlia
Copy link
Member

ameowlia commented Nov 8, 2024

The big question is: how do we proceed with such log changes which could be considered breaking? I don't think it's a healthy practice to keep adding flags for each and every individual log field we add 🤔

I agree, I hate having to add so many config properties. But sadly, I do think that this is a breaking change. I think of all the logs that operators look at, I think they DO look at access logs. And they are not a nice json format, they are an unwieldy string mess that you would have to write a custom parser for.

Here are the options I see...

  1. Add the new field with no config property - risk breaking operators and facing their wrath (we should not do this)
  2. Add the new field to the end of the log line - still risk breaking operators and facing their wrath, but maybe slightly less risk (we should not do this either)
  3. Put it behind yet another config property - this fills up our code with config properties
  4. Develop a more advanced config property that allows users to "opt in" to certain fields by name 1 by 1. So that way we can add new fields, but they can always opt in / out of certain ones. - Fixes this problem for the future. But this is more work. And this is just a config property for everything! Will operators even opt in to any of them!?
  5. (From @geofffranks) add an additional access-log file that's json-based for people to migrate to, and we deprecate the old one eventually. When it is json based we can add new properties without it being a breaking change. - Personally I love this idea, but it is definitely more work.

So, in summary, I would say:

  • sadly this is a breaking change (whomp)
  • if you could chose one of options 3-5 that would be great

@maxmoehl
Copy link
Member Author

maxmoehl commented Nov 8, 2024

sadly this is a breaking change (whomp)

That's what I was expecting 😄

Develop a more advanced config property that allows users to "opt in" to certain fields by name 1 by 1. So that way we can add new fields, but they can always opt in / out of certain ones. - Fixes this problem for the future. But this is more work. And this is just a config property for everything! Will operators even opt in to any of them!?

That's probably what I will be going for in this PR to prevent accumulating further config flags. Just a flat list, and if anyone needs special properties they can be added and exposed via this mechanism without breaking anyone.

add an additional access-log file that's json-based for people to migrate to, and we deprecate the old one eventually. When it is json based we can add new properties without it being a breaking change. - Personally I love this idea, but it is definitely more work.

This is my preferred long-term solution, but it will require quite some effort and it's not clear when we will be able to invest.

@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch 2 times, most recently from eac4b22 to 20a7867 Compare November 26, 2024 10:34
@maxmoehl
Copy link
Member Author

I've added a new property which allows logging of additional fields in a generic way. The release PR will follow once we agree on this PR.

@maxmoehl maxmoehl marked this pull request as ready for review November 26, 2024 10:40
@maxmoehl maxmoehl requested a review from a team as a code owner November 26, 2024 10:40
Copy link
Member

@ameowlia ameowlia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Can you add a PR to routing-release so I can deploy and test everything at once?

@maxmoehl
Copy link
Member Author

I've opened cloudfoundry/routing-release#448 as a draft as I'm still adding tests.

@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch 3 times, most recently from 06efc11 to 085cb3b Compare January 17, 2025 13:24
@maxmoehl
Copy link
Member Author

maxmoehl commented Jan 17, 2025

The past weeks have been a bit busy, I finally found the time to clean this up. The rr PR is also ready now. Unfortunately test execution on my end is still hindered by cloudfoundry/routing-release#450. I ran the package tests using go test and they seem fine but I can't run the whole shebang, could you please do that? Thank you!

I'm currently also deploying this to one of our environments.

Edit: works!

@@ -255,6 +256,7 @@ func (rt *roundTripper) RoundTrip(originalRequest *http.Request) (*http.Response
slog.Float64("dns-lookup-time", trace.DnsTime()),
slog.Float64("dial-time", trace.DialTime()),
slog.Float64("tls-handshake-time", trace.TlsTime()),
slog.String("local-addr", trace.LocalAddr()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading your spec description on the other PR, it is not clear if the "extra_headers" property is only for access logs or not.

❓ Based on my reading of it, I would've thought it would affect these logs too? Thoughts?

Copy link
Member Author

@maxmoehl maxmoehl Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in the past we didn't give much thought to the compatibility of non-access-logs, primarily because these are JSON and we expect people to parse it as such. Additional fields shouldn't cause any breakage in those cases. For example the timing info directly above this line is controlled by a feature flag in the access logs but unconditionally included in this log.

I could adjust the wording in the release spec to reflect that clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could adjust the wording in the release spec to reflect that clearly.

Sounds good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the property description.

@ameowlia
Copy link
Member

✅ tests pass

@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch 2 times, most recently from 9b22a57 to 04a92c6 Compare January 22, 2025 12:07
@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch from 04a92c6 to 0856d01 Compare January 29, 2025 13:08
When troubleshooting issues related to keepalive it is often useful to
be able to correlate multiple requests made over the same connection. A
first step was to introduce the information on whether the request was
made on a re-used connection or not which was done with a7dbf84. This
commit adds the local address of the TCP connection which adds the
capability to distinguish between multiple, parallel keep-alive
connections by looking at the client port of gorouter for the backend
connection.
@maxmoehl maxmoehl force-pushed the maxmoehl/add-local-addr branch from 0856d01 to e0d4b36 Compare January 31, 2025 06:21
@ameowlia ameowlia merged commit a34e690 into cloudfoundry:main Jan 31, 2025
1 check passed
@maxmoehl maxmoehl deleted the maxmoehl/add-local-addr branch February 1, 2025 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants