Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add namespace to server and peer name across namespaces #1247

Merged
merged 3 commits into from
Oct 31, 2024

Conversation

esara
Copy link
Contributor

@esara esara commented Oct 11, 2024

This change will append the namespace to the server or peer address if the destination does not match the resource namespace, running in k8s

addressing #1246

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.60%. Comparing base (d3b2240) to head (8ee7346).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
pkg/export/otel/traces.go 66.66% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
+ Coverage   80.58%   80.60%   +0.01%     
==========================================
  Files         141      141              
  Lines       14217    14226       +9     
==========================================
+ Hits        11457    11467      +10     
+ Misses       2221     2218       -3     
- Partials      539      541       +2     
Flag Coverage Δ
integration-test 58.73% <33.33%> (-0.04%) ⬇️
k8s-integration-test 58.36% <66.66%> (-0.18%) ⬇️
oats-test 35.01% <33.33%> (-0.01%) ⬇️
unittests 52.57% <33.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@esara esara force-pushed the k8snamespace branch 7 times, most recently from 39707a1 to 769c3e3 Compare October 12, 2024 15:56
Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @esara!

We need to discuss whether this addition could break service graph metrics, as I'm afraid that adding the namespace only to the name of the remote part, could lead to some metrics like:

service_graph_client{client="client",server="server.namespace"}
service_graph_server{client="client.namespace",server="server"}

I'm afraid that different naming conventions for client and server could lead to breaking service graphs.

@grcevski
Copy link
Contributor

I was thinking more about this issue, would it be acceptable if we only added the .namespace in the traces output? I'm worried that if we add it everywhere it will break how we expect the metrics to behave, especially since the span metrics expect the service name to be separated from service namespace.

I'm thinking we should only change:

			request.ClientAddr(request.SpanPeer(span)),
			request.ServerAddr(request.SpanHost(span)),

in func traceAttributes(span *request.Span, optionalAttrs map[attr.Name]struct{}) []attribute.KeyValue { in traces.go.

Would this work for you @esara ?

I this change will be more correct based on the spec, but everything that's based on the service name will remain intact.

@grcevski
Copy link
Contributor

We can also make it consistent with the HTTP/gRPC/DB metrics by changing span_getters.go and adding similar logic in here:

	case attr.ClientAddr:
		getter = func(s *Span) attribute.KeyValue { return ClientAddr(SpanPeer(s)) }
	case attr.ServerAddr:
		getter = func(s *Span) attribute.KeyValue { return ServerAddr(SpanHost(s)) }

If you'd like, I can make a PR on top of yours as previously. Name resolver collects the information and we can just use it in those places to extend what the server address is (or peer).

@esara
Copy link
Contributor Author

esara commented Oct 29, 2024

I first tried to change traceAttributes.go - if this is not what you meant, I can also try with span_getters.go

@@ -562,6 +562,16 @@ func (tr *tracesOTELReceiver) acceptSpan(span *request.Span) bool {
func traceAttributes(span *request.Span, optionalAttrs map[attr.Name]struct{}) []attribute.KeyValue {
var attrs []attribute.KeyValue

// add namespace to server and peer name across namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @esara,

So I think we need to be more specific with how we setup the Peer and the Host, because the meaning of the server address changes depending if we are doing a client or server span.

Essentially the logic is like this:

	if span.IsClientSpan() {
		clientAddr = request.SpanPeer(span) + "." + span.ServiceID.Namespace
		serverAddr = request.SpanHost(span) + "." + span.OtherNamespace
	} else {
		clientAddr = request.SpanPeer(span) + "." + span.OtherNamespace
		serverAddr = request.SpanHost(span) + "." + span.ServiceID.Namespace
	}

I would make helpers functions for these somewhere in traces.go and change to use the values derived by those Server and Peer helpers everywhere where we do this now:

			request.ClientAddr(request.SpanPeer(span)),
			request.ServerAddr(request.SpanHost(span)),

There are few of those case statements below in the code. I know it's more complicated, but in Kafka events we never get the client (since it's forever lost on the queue) and we hijack the Othernamespace for the clientId. We probably should fix that :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good point only to change the server and the client address respectively I added a second commit to do that - but I kept it in the method (except for kafka spans), while this can be factored out, that would actually be more code than this... also the current request.SpanPeer/SpanHost are shared with metrics, so we would need a local method which in turns calls them - maybe better organized, but a bit more complicated to read

functionally is this correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this works now, thanks! Based on what I can tell, you prefer that we don't add the additional "." + namespace when the services are in the same namespace, correct? I don't have a strong opinion on that, perhaps adding the .namespace always might be more consistent, but it's more data.

Copy link
Contributor Author

@esara esara Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I prefer it this way, because it is less intrusive to existing experience/service names - if microservices mostly communicate with each other within the namespace, it was already correct/no change - and the reference to services elsewhere would still work (metrics, logs, etc...)

this will only change the behavior, where the remote server (or client) is not obviously in the same space as the resource

Copy link
Contributor

@grcevski grcevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think this is correct now. We probably should follow up with fixing the kafka othernamespace issue.

@@ -562,6 +562,16 @@ func (tr *tracesOTELReceiver) acceptSpan(span *request.Span) bool {
func traceAttributes(span *request.Span, optionalAttrs map[attr.Name]struct{}) []attribute.KeyValue {
var attrs []attribute.KeyValue

// add namespace to server and peer name across namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this works now, thanks! Based on what I can tell, you prefer that we don't add the additional "." + namespace when the services are in the same namespace, correct? I don't have a strong opinion on that, perhaps adding the .namespace always might be more consistent, but it's more data.

@grcevski
Copy link
Contributor

Great, let's see what @mariomac thinks now. I think we can address the kafka issue in a follow-up PR.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grcevski grcevski merged commit 882ccc2 into grafana:main Oct 31, 2024
13 checks passed
@grcevski
Copy link
Contributor

Thanks again @esara for your contribution!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants