Add observer support to receiver_creator #173

jrcamp · 2020-04-13T18:53:46Z

Add observer notification interface (k8s observer will be in separate PR)
Refactor receiver_creator to be more easily testable and organized
- receiver.go mostly implements OT interface and delegates to the new files
- observerhandler.go responds to observer events and manages the starting/stopping of receivers
- rules.go implements rules evaluation (not currently implemented)
- runner.go contains a runner interface that handles the details of how to start and stop a receiver instance
  that the observer handler wants to start/stop
Implement basic add/remove/change response in receiver_creator to observer events

* Add observer notification interface (k8s observer will be in separate PR) * Refactor receiver_creator to be more easily testable and organized * receiver.go mostly implements OT interface and delegates to the new files * observerhandler.go responds to observer events and manages the starting/stopping of receivers * rules.go implements rules evaluation (not currently implemented) * runner.go contains a runner interface that handles the details of how to start and stop a receiver instance that the observer handler wants to start/stop * Implement basic add/remove/change response in receiver_creator to observer events

tigrannajaryan · 2020-04-14T17:40:05Z

extension/observer/observer.go

+type Endpoint interface {
+	// ID uniquely identifies this endpoint.
+	ID() string
+	// Target is an address or hostname of the endpoint.


Is this specifically an IP address or can be some other form of an address?

Generally it's an IP address for pods but there's no reason it couldn't be a DNS name which is why I named it generically.

OK, so perhaps the comment can say "IP address or hostname". It was not clear to me that this could not be some other form of an address (other than IP address or host name).

extension/observer/observer.go

tigrannajaryan · 2020-04-14T17:43:07Z

extension/observer/observer.go

+// PortEndpoint is an endpoint that has a target as well as a port.
+type PortEndpoint struct {
+	endpointBase
+	Port int32


Nit: IP port numbers are unsigned 16-bit integers. Do we need the larger int32 range?

extension/observer/observer.go

tigrannajaryan · 2020-04-14T18:06:59Z

receiver/receivercreator/observerhandler.go

+	// receiverTemplates maps receiver template full name to a receiverTemplate value.
+	receiverTemplates map[string]receiverTemplate
+	// receiversByEndpointID is a map of endpoint IDs to a receiver instance.
+	receiversByEndpointID multimap.MultiMap


Why is this a multimap? Can we have more than one receiver per endpoint ID? Is it because we can have different receiver types associated with the same ID? Should this be a map of maps: map[endpointID]map[receiverFullName]receiver in that case or we don't care about having the nested map?

Why is this a multimap? Can we have more than one receiver per endpoint ID?

Right, multiple receivers can be started for a single endpoint. The multimap is basically just a wrapper around a map of maps. I could do my own light wrapper instead but it's a lightweight library with no other dependencies.

OK, up to you. I thought it might be nicer to have stronger compoile-time typing from map of maps. Your choice.

Yeah the lack of type checking is unfortunate. Maybe one day Go will bless us with generics. :) I'll try doing custom wrapper, doesn't need all possible multimap functions so should be limited.

tigrannajaryan · 2020-04-14T18:11:25Z

receiver/receivercreator/observerhandler.go

+				continue
+			}
+			rcvr, err := re.runner.start(template.receiverConfig, userConfigMap{
+				"endpoint": e.Target(),


Can we define "endpoint" as a const and give meaningful name? Otherwise it looks like a magic value. Are receiver configs expected to support a configuration option with this precise name? Is this enforced or checked at compile or runtime? Is this expected to match ReceiverSettings.Endpoint field's tag?

Is this expected to match ReceiverSettings.Endpoint field's tag?

Yeah this is required to match ReceiverSettings.Endpoint field. Is a constant in receivercreator ok or does it need to be a constant in ot-collector core?

const here is OK, just add a comment to explain that it has to match the ReceiverSettings.Endpoint field. We are not going to change that field name (hopefully ever) since it is a config file setting.

tigrannajaryan · 2020-04-14T18:22:21Z

receiver/receivercreator/receiver.go

@@ -38,11 +37,12 @@ type receiverCreator struct {
 	nextConsumer consumer.MetricsConsumerOld
 	logger       *zap.Logger
 	cfg          *Config
-	receivers    []component.Receiver
+	responder    *observerHandler


Does the name responder reflect the purpose? It seems to be a bit of a mismatch with its type name observerHandler.

Yep I renamed observerHandler at one point. Will fix.

tigrannajaryan · 2020-04-14T18:27:11Z

receiver/receivercreator/runner.go

+func (run *receiverRunner) loadRuntimeReceiverConfig(
+	factory component.ReceiverFactoryOld,
+	receiver receiverConfig,
+	discoveredConfig userConfigMap) (configmodels.Receiver, error) {


We usually use this style in the codebase:

Suggested change

discoveredConfig userConfigMap) (configmodels.Receiver, error) {

discoveredConfig userConfigMap,

) (configmodels.Receiver, error) {

tigrannajaryan · 2020-04-14T18:32:12Z

receiver/receivercreator/runner.go

+		return nil, err
+	}
+
+	if err := recvr.Start(run.ctx, run.host); err != nil {


Start() may call Host.ReportFatalError(). ReportFatalError() exits the process. This was considered acceptable since it would normally happen at startup when receivers are created.

Now this allows receivers to be created at any time long after the startup. It means if something goes wrong the receiver's Start function can exit the process during normal operation. That is not considered acceptable: https://github.com/open-telemetry/opentelemetry-collector/blob/master/CONTRIBUTING.md#do-not-crash-after-startup

We will need to wrap the host here and provide a different ReportFatalError implementation which does not exit the process but only logs the problem and perhaps emits a metric/trace.

What's the purpose of ReportFatalError if Start() can return an error? Why not let the caller (the core in the usual case, receivercreator) in this case decide how to handle it?

Start() is expected to return quickly. If the component needs more time to initialize it is expected to return and continue initialization asynchronously. ReportFatalError allows the component to report initialization failure in that case.

I changed it to just log for now. However that's probably not sufficient if receivers are assuming when they call that they are going to be restarted (by virtue of the process dying and something like k8s restarting the exited container). In the receiver creator case should we shut down the receiver that reported a fatal error? How often, how long, etc. It starts opening up a can of worms. Actually this applies to even the case where Start() returns an error. However for the receivers we write Start() will rarely fail as what it usually does is just kick off some kind of periodic asynchronous process[1].

I think we may need to introduce some kind of lifecycle for receivers if we need to handle these kinds of scenarios. It's something I considered adding to the smart agent but hadn't gotten around to it. I think we need a way to have receivers fail to start and be able to restart them with exponential retry, etc. (Although I understand for some receivers this is unlikely to help as detailed in https://github.com/open-telemetry/opentelemetry-collector/blob/master/CONTRIBUTING.md#bad-input-handling). But for many receivers it's possible they encounter some error that can be resolved by resetting to a good known state. In addition having these kinds of explicit status of receivers and whether they're running, crashing, healthy, etc. would enhance the telemetry as you could tell whether a receiver was in a good running state or not.

[1] This periodic process somewhat functions as a poor version of the lifecycle management. If an error is encountered it merely retries the same collection loop after N seconds. However it typically doesn't do anything fancy like reset to known good state. We will get some insight into this async collection loop through telemetry but I think more could be done by standardizing this collection loop instead of each receiver/monitor doing it on its own.

Can you open a github issue so that we can think this through in the future? I agree this needs more work.

Created #178

tigrannajaryan · 2020-04-15T21:32:08Z

@jrcamp ping me when this is ready for another round of review.

jrcamp · 2020-04-16T02:02:46Z

@tigrannajaryan updated, thanks for the review.

tigrannajaryan

LGTM, just one minor comment regarding naming.

tigrannajaryan · 2020-04-16T18:13:09Z

receiver/receivercreator/receiver.go

-	// TODO: Need to make sure this is unique (just endpoint is probably not totally sufficient).
-	receiverConfig.SetName(fmt.Sprintf("%s/%s{endpoint=%q}", rc.cfg.Name(), staticSubConfig.fullName, mergedConfig.GetString("endpoint")))
-	return receiverConfig, nil
+// safeHost provides a safer version of host for receivers started at runtime.


Let's calling this something like loggingHost. "safety" can mean different things depending on the requirement. In this case the requirement is that fatal errors are logged instead of crashing the process. But crashing the process is the "safe" option in some other case (during startup).

tigrannajaryan · 2020-04-16T18:14:30Z

receiver/receivercreator/runner.go

+		return nil, err
+	}
+
+	if err := recvr.Start(run.ctx, run.host); err != nil {


Can you open a github issue so that we can think this through in the future? I agree this needs more work.

tigrannajaryan

LGTM

* Add observer notification interface (k8s observer will be in separate PR) * Refactor receiver_creator to be more easily testable and organized * receiver.go mostly implements OT interface and delegates to the new files * observerhandler.go responds to observer events and manages the starting/stopping of receivers * rules.go implements rules evaluation (not currently implemented) * runner.go contains a runner interface that handles the details of how to start and stop a receiver instance that the observer handler wants to start/stop * Implement basic add/remove/change response in receiver_creator to observer events

Missed the linux path on the copy step

jrcamp requested review from bogdandrutu, flands, owais, pjanotti, rghetia, songy23 and tigrannajaryan as code owners April 13, 2020 18:53

tigrannajaryan reviewed Apr 14, 2020

View reviewed changes

jrcamp added 2 commits April 15, 2020 21:53

review feedback

5fa3239

Merge remote-tracking branch 'upstream/master' into observer

75bf695

jrcamp requested a review from tigrannajaryan April 16, 2020 01:59

fixes

6c32df2

cleanup

271611f

tigrannajaryan reviewed Apr 16, 2020

View reviewed changes

jrcamp mentioned this pull request Apr 17, 2020

How to handle receiver start errors in receiver_creator at runtime #178

Closed

review feedback

1b50f89

jrcamp requested a review from tigrannajaryan April 17, 2020 15:56

tigrannajaryan approved these changes Apr 17, 2020

View reviewed changes

tigrannajaryan merged commit 2f20d0c into open-telemetry:master Apr 17, 2020

jrcamp deleted the observer branch April 17, 2020 17:54

jrcamp mentioned this pull request May 18, 2020

REQUEST: New membership for jrcamp open-telemetry/community#349

Closed

6 tasks

mxiamxia referenced this pull request in mxiamxia/opentelemetry-collector-contrib Jul 22, 2020

Fix docker-otelsvc build (#173)

127e632

Missed the linux path on the copy step

ljmsc referenced this pull request in ljmsc/opentelemetry-collector-contrib Feb 21, 2022

export Links in Jaeger exporter. (#173)

ab58cae

bogdandrutu pushed a commit that referenced this pull request May 12, 2022

Fix status badge (#173)

3897019

XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this pull request Feb 29, 2024

http semantic convention 1.23.0 support (open-telemetry#173)

e89ec77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add observer support to receiver_creator #173

Add observer support to receiver_creator #173

jrcamp commented Apr 13, 2020

tigrannajaryan Apr 14, 2020

jrcamp Apr 14, 2020

tigrannajaryan Apr 15, 2020

tigrannajaryan Apr 14, 2020

tigrannajaryan Apr 14, 2020

jrcamp Apr 14, 2020

tigrannajaryan Apr 15, 2020

jrcamp Apr 15, 2020

tigrannajaryan Apr 14, 2020

jrcamp Apr 14, 2020

tigrannajaryan Apr 15, 2020

tigrannajaryan Apr 14, 2020

jrcamp Apr 14, 2020

tigrannajaryan Apr 14, 2020

tigrannajaryan Apr 14, 2020

jrcamp Apr 14, 2020

tigrannajaryan Apr 15, 2020

jrcamp Apr 16, 2020 •

edited

Loading

tigrannajaryan Apr 16, 2020

jrcamp Apr 17, 2020

tigrannajaryan commented Apr 15, 2020

jrcamp commented Apr 16, 2020

tigrannajaryan left a comment

tigrannajaryan Apr 16, 2020

tigrannajaryan Apr 16, 2020

tigrannajaryan left a comment

	discoveredConfig userConfigMap) (configmodels.Receiver, error) {
	discoveredConfig userConfigMap,
	) (configmodels.Receiver, error) {

Add observer support to receiver_creator #173

Add observer support to receiver_creator #173

Conversation

jrcamp commented Apr 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrcamp Apr 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Apr 15, 2020

jrcamp commented Apr 16, 2020

tigrannajaryan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan left a comment

Choose a reason for hiding this comment

jrcamp Apr 16, 2020 •

edited

Loading