-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE #10377] Nacos client observability enhancement - trace part #11138
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to discuss.
client/pom.xml
Outdated
<dependency> | ||
<groupId>io.opentelemetry</groupId> | ||
<artifactId>opentelemetry-semconv</artifactId> | ||
<version>1.29.0-alpha</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any stable version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opentelemetry-semconv library is actually designed to make the span field naming comply with otlp regulations. It itself can be regarded as a collection of global variables. However, all versions of this library have the alpha suffix, which is the case in the official examples and also in the maven repository https://mvnrepository.com/artifact/io.opentelemetry/opentelemetry-semconv
Span addListenerSpan = ConfigTrace.getClientConfigServiceSpan("addTenantListeners"); | ||
try (Scope ignored = addListenerSpan.makeCurrent()) { | ||
|
||
worker.addTenantListenersWithContent(dataId, group, content, encryptedDataKey, Arrays.asList(listener)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracing process may disturb adding listeners. It is expected that nothing can interrupt the main behaviors.
Make sure that span.makeCurrent()
and span.isRecording()
may not throw any exceptions in any situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my trace-related operations are implemented through the OpenTelemetry API
, including the span life cycle. OpenTelemetry clearly states it in the documentation: Unless your application imports the OpenTelemetry SDK, your instrumentation does nothing and does not impact application performance. https://opentelemetry.io/docs/concepts/instrumentation/libraries/#opentelemetry-api
.queryConfig(dataId, group, worker.getAgent().getTenant(), timeoutMs, false); | ||
|
||
if (queryConfigSpan.isRecording()) { | ||
queryConfigSpan.setAttribute("function.current.name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use constants instead of magic value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it soon
@@ -101,11 +105,60 @@ public String getConfig(String dataId, String group, long timeoutMs) throws Naco | |||
public String getConfigAndSignListener(String dataId, String group, long timeoutMs, Listener listener) | |||
throws NacosException { | |||
group = StringUtils.isBlank(group) ? Constants.DEFAULT_GROUP : group.trim(); | |||
ConfigResponse configResponse = worker.getAgent() | |||
.queryConfig(dataId, group, worker.getAgent().getTenant(), timeoutMs, false); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an enhanced subclass of ClientWorker
or AOP processing is a better way to wrap or decorate these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I will think about it carefully and discuss it with you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An enhanced subclass may be better, since a span could include more than one method
if (content != null) { | ||
LOGGER.warn("[{}] [get-config] get failover ok, dataId={}, group={}, tenant={}, config={}", | ||
worker.getAgentName(), dataId, group, tenant, ContentUtils.truncateContent(content)); | ||
cr.setContent(content); | ||
String encryptedDataKey = LocalEncryptedDataKeyProcessor | ||
.getEncryptDataKeyFailover(agent.getName(), dataId, group, tenant); | ||
String encryptedDataKey = LocalEncryptedDataKeyProcessor.getEncryptDataKeyFailover(agent.getName(), dataId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracing the update of local failover cache is also important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I will do it here
requestToServer(request, BatchInstanceResponse.class); | ||
redoService.instanceRegistered(serviceName, groupName); | ||
|
||
if (span.isRecording()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some attributes can be set before doing actions in order to record more information for exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some attributes, especially those that are expensive to calculate ( eg. some instance states), should be added if span is recording. https://opentelemetry.io/docs/concepts/instrumentation/libraries/#performance However, some attributes in the Nacos client may not need to be operated in this way (in fact, I have made the version number of the nacos client a must-have attribute for each span). I will discuss it with you in detail.
… when exceptions occur
… when exceptions occur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Request some further changes in another PR.
@@ -1098,10 +1158,19 @@ public ConfigResponse queryConfig(String dataId, String group, String tenant, lo | |||
} | |||
|
|||
private Response requestProxy(RpcClient rpcClientInner, Request request) throws NacosException { | |||
return requestProxy(rpcClientInner, request, 3000L); | |||
return agent.requestProxy(rpcClientInner, request, 3000L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a method with some default values. Please use this.requestProxy
to keep modification of the origin method like metrics or tracing enhancement.
@@ -477,8 +501,11 @@ private void refreshContentAndCheck(String groupKey, boolean notify) { | |||
|
|||
private void refreshContentAndCheck(CacheData cacheData, boolean notify) { | |||
try { | |||
ConfigResponse response = getServerConfig(cacheData.dataId, cacheData.group, cacheData.tenant, 3000L, | |||
notify); | |||
String group = cacheData.group; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose getServerConfig
is same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to use dynamic proxy. It would be enough if just delegating all ClientWorker
methods.
Please do not create a Pull Request without creating an issue first.
What is the purpose of the change
The purpose of this PR is to enhance the observability of Nacos client. This pr ensures that users can obtain metric through
OpenTelemetry
andPrometheus
; obtain trace throughOpenTelemetry
.This PR is only related to the trace part.
Related issue: #10377
Brief changelog
Trace
Verifying this change
Follow this checklist to help us incorporate your contribution quickly and easily:
[ISSUE #123] Fix UnknownException when host config not exist
. Each commit in the pull request should have a meaningful subject line and body.