Skip to content
This repository has been archived by the owner on Dec 23, 2023. It is now read-only.

Implementation of OC-Agent Trace Service clients. #1476

Closed

Conversation

songy23
Copy link
Contributor

@songy23 songy23 commented Sep 29, 2018

Fixes #1408.

Equivalent to https://github.com/census-ecosystem/opencensus-go-exporter-ocagent/blob/master/ocagent.go.

This works similarly to Go Agent exporter:

  • Export will be handled in the main thread. OC library (SpanExporter) pushes SpanData to ExportClient, then ExportClient translates them to SpanProto and pushes to Agent.
  • Config will be handled in a daemon thread, since waiting and applying incoming Config will block the thread.

@codecov-io
Copy link

codecov-io commented Sep 29, 2018

Codecov Report

Merging #1476 into master will decrease coverage by 0.88%.
The diff coverage is 26.61%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1476      +/-   ##
============================================
- Coverage     82.46%   81.57%   -0.89%     
- Complexity     1590     1595       +5     
============================================
  Files           243      244       +1     
  Lines          7493     7610     +117     
  Branches        702      718      +16     
============================================
+ Hits           6179     6208      +29     
- Misses         1098     1185      +87     
- Partials        216      217       +1
Impacted Files Coverage Δ Complexity Δ
...ter/trace/ocagent/OcAgentTraceExporterHandler.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...s/exporter/trace/ocagent/OcAgentTraceExporter.java 22.58% <0%> (-1.56%) 3 <0> (ø)
...census/exporter/trace/ocagent/TraceProtoUtils.java 82.6% <0%> (-4.9%) 32 <0> (ø)
...ace/ocagent/OcAgentTraceExporterConfiguration.java 100% <100%> (ø) 2 <1> (ø) ⬇️
...rter/trace/ocagent/OcAgentTraceServiceClients.java 32.98% <32.98%> (ø) 3 <3> (?)
...census/implcore/trace/export/SpanExporterImpl.java 90.41% <0%> (-1.37%) 8% <0%> (ø)
...encensus/contrib/dropwizard/DropWizardMetrics.java 96.52% <0%> (-0.2%) 16% <0%> (ø)
.../java/io/opencensus/metrics/export/TimeSeries.java 100% <0%> (ø) 4% <0%> (+2%) ⬆️
...ain/java/io/opencensus/implcore/metrics/Gauge.java 100% <0%> (ø) 3% <0%> (ø) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dea86ae...96011b2. Read the comment docs.

@songy23 songy23 added this to the Release 0.17.0 milestone Sep 29, 2018
Copy link
Contributor

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high-level comments about the overall design.

}

@Override
public void onError(Throwable t) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we at least log this? Should we try to reconnect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this error is logged and we should retry on errors.

"Failed to connect to OC-Agent because of %s. Retried %d time(s).", e, i));
// Exponential backoff.
try {
Thread.sleep(0, getBackoffTimeInNanos(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should backoff with more than 1ms. Based on the definition of the Sleep the number of nanos has to be <1M.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed exponential backoff since we'll retry every 5 minutes (or a custom time interval).

if (useInsecure == null) {
useInsecure = false;
}
client = OcAgentTraceServiceClients.openStreams(endPoint, useInsecure, serviceName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I miss understand the current design, but I want to make sure we are on the same page. I think the whole idea was to watch for the oc-agent to run. So if we fail to connect we should retry every 1min (or configurable time). The main workflow is if somebody deploys this then after 1 week they decide that they need traces they will be able to just deploy the ocagent without restarting their app.

High-level design:

  • In the application (or other library) we will have something like "registerOcAgentExporterWhenAvailable"
  • Start a new Thread (for the purpose of these explanation say name = ThreadOcAgent) that tries to connect to the oc-agent every 1 minute.
  • If not able to connect try again in 1 minute (or configurable time).
  • If able to connect:
    • Install the OcAgentSpanExportHandler to the SpanExporter. So spans will be exported.
    • When disconnected uninstall the OcAgentSpanExportHandler from the SpanExporter, also ThreadOcAgent should again retry to connect in 1 minute.
    • The config rpc handling can be done in the ThreadOcAgent as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we fail to connect we should retry every 1min (or configurable time).

Do you mean the Agent exporter should keep retrying forever if initiation did not succeed? I saw in Go we only attempted to connect 10 times with exponential backoff, that's also what I did here.

/cc @odeke-em

In the application (or other library) we will have something like "registerOcAgentExporterWhenAvailable"

There's already an API OcAgentTraceExporter.createAndRegister(). It registers the OcAgentTraceExporterHandler to SpanExporter. Before we managed to connect to Agent, the OcAgentTraceExporterHandler will just do no-op.

Start a new Thread (for the purpose of these explanation say name = ThreadOcAgent) that tries to connect to the oc-agent every 1 minute.

If not able to connect try again in 1 minute (or configurable time).

If we need to keep retrying connection, I agree this should be done in another thread.

Install the OcAgentSpanExportHandler to the SpanExporter. So spans will be exported.

When disconnected uninstall the OcAgentSpanExportHandler from the SpanExporter, also ThreadOcAgent should again retry to connect in 1 minute.

My current design is to register OcAgentSpanExportHandler first. If connection failed, it will just do no-op. I feel this is more straightforward and efficient than doing the register/unregister to SpanExporter every now and then, and these two approaches should have the same behavior to the end users.

The config rpc handling can be done in the ThreadOcAgent as well.

Config RPC has to be in its own thread, because Config is a blocking RPC. Once the client (OC-Library) sends back the CurrentLibraryConfig to Agent, that thread will be blocked waiting for the next UpdatedLibraryConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, we think keep retrying is the right thing to do here. Will update the code based on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I hadn't seen this ping, I'll also create an issue for the Go agent exporter.

@songy23 songy23 assigned songy23 and unassigned bogdandrutu Sep 30, 2018
@songy23 songy23 changed the title Implementation of OC-Agent Trace Service clients. [WIP] Implementation of OC-Agent Trace Service clients. Oct 2, 2018
@songy23 songy23 added the action required The pull request is blocked by something other than a need for code review. label Oct 2, 2018
@bogdandrutu
Copy link
Contributor

Please ping me when this is ready for review.

@songy23 songy23 changed the title [WIP] Implementation of OC-Agent Trace Service clients. Implementation of OC-Agent Trace Service clients. Oct 5, 2018
@songy23 songy23 removed the action required The pull request is blocked by something other than a need for code review. label Oct 5, 2018
@songy23
Copy link
Contributor Author

songy23 commented Oct 5, 2018

Updated, PTAL. This works as follows:

  1. In OcAgentTraceExporter.createAndRegister(), a OcAgentTraceExporterHandler will be created and registered to SpanExporter.
  2. By the time when OcAgentTraceExporterHandler is created, a daemon thread AgentConnectionWorker will be created and started. This thread will be kept running forever. Every 5 minutes (or a custom time interval), it will try to connect to remote Agent.
  3. If connection succeeded, SpanData passed to OcAgentTraceExporterHandler will be exported to Agent via RPC. AgentConnectionWorker will handle the UpdatedLibraryConfig.
  4. If connection failed or lost, SpanData passed to OcAgentTraceExporterHandler will be ignored.

@songy23
Copy link
Contributor Author

songy23 commented Oct 9, 2018

Rebased, PTAL.

return exportRequestObserverRef.get() != null;
}

private static boolean isConfigConnected() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have an option in the configuration of the ocagent exporter "enable_config" or something similar that allows users to configure if this exporter should also handle the config.

Also based on some comments on gitter we may have a configuration to that says keep connection open (for Android devices we may not want to always keep the RPC connection/request alive). We can address this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an option enableConfig to OcAgentTraceExporterConfiguration. Also left a TODO for adding an option on keeping the connection open.

Copy link
Contributor

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments that I wrote early this morning. I will do a full review soon.

build.gradle Outdated
@@ -26,6 +26,10 @@ def useCheckerFramework = rootProject.hasProperty('checkerFramework')
def useErrorProne = !useCheckerFramework

subprojects {
// OC-Agent exporter depends on grpc-core which depends on errorprone annotations, and it
// doesn't wort with checker framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/wort/work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/**
* Returns the retry time interval when trying to connect to Agent.
*
* @return the retry time interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"." at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @since 0.17
*/
@Nullable
public abstract Boolean getEnableConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use boolean and make the default value "true"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@songy23
Copy link
Contributor Author

songy23 commented Oct 11, 2018

As we discussed, I'll try to decouple the logic and send separate PRs. Closing this PR for now.

@songy23 songy23 closed this Oct 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants