Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add positive confirmation that Agent made a connection to Collector (… #2423

Conversation

BernardTolosajr
Copy link
Contributor

@BernardTolosajr BernardTolosajr commented Aug 27, 2020

Signed-off-by: Bernardo Tolosa bernardotolosajr@gmail.com

Which problem is this PR solving?

Short description of the changes

  • Add connection state checking base on getState

@BernardTolosajr BernardTolosajr requested a review from a team as a code owner August 27, 2020 01:39
}
}

logger.Info("Agent collector connection state change", zap.String("dialTarget", dialTarget), zap.String("status", conn.GetState().String()))
Copy link
Member

@yurishkuro yurishkuro Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Info("Agent collector connection state change", zap.String("dialTarget", dialTarget), zap.String("status", conn.GetState().String()))
logger.Info("Agent collector connection state change", zap.String("dialTarget", dialTarget), zap.Stringer("status", conn.GetState()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

if !conn.WaitForStateChange(ctx, s) {
logger.Fatal("Could not get connection state", zap.String("dialTarget", dialTarget))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be testable, including success and failure cases.

break
}
if !conn.WaitForStateChange(ctx, s) {
logger.Fatal("Could not get connection state", zap.String("dialTarget", dialTarget))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can return the error, no need to crash


ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
for {
Copy link
Member

@joe-elliott joe-elliott Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change agent startup to block while it attempts to connect to a collector. We could run this logic in a goroutine instead, keep the current behavior and just clearly log when a collector connection is successful. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also log "Checking connection to collector" before starting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on this.

@codecov
Copy link

codecov bot commented Aug 28, 2020

Codecov Report

Merging #2423 into master will increase coverage by 0.00%.
The diff coverage is 95.55%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2423   +/-   ##
=======================================
  Coverage   95.58%   95.58%           
=======================================
  Files         208      208           
  Lines       10679    10699   +20     
=======================================
+ Hits        10207    10227   +20     
  Misses        399      399           
  Partials       73       73           
Impacted Files Coverage Δ
cmd/agent/app/reporter/grpc/builder.go 95.45% <80.00%> (-4.55%) ⬇️
plugin/storage/es/dependencystore/storage.go 90.74% <100.00%> (+0.17%) ⬆️
plugin/storage/es/factory.go 100.00% <100.00%> (ø)
plugin/storage/es/options.go 100.00% <100.00%> (ø)
plugin/storage/es/spanstore/reader.go 100.00% <100.00%> (ø)
plugin/storage/es/spanstore/service_operation.go 100.00% <100.00%> (ø)
plugin/storage/grpc/shared/grpc_client.go 85.96% <100.00%> (ø)
cmd/query/app/server.go 93.33% <0.00%> (+2.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 393e04d...8f6570e. Read the comment docs.

s := cc.GetState()
logger.Info("Agent collector connection state change", zap.String("dialTarget", dialTarget), zap.Stringer("status", s))
if s == connectivity.Ready {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to return here? If we run indefinitely, we'll be able to log state changes after the process started as well. The logs below are from the agent: the collector was initially available, then I killed it for a few seconds, and started again:

$ go run ./cmd/agent  --reporter.grpc.host-port localhost:14250
2020/08/28 10:14:44 maxprocs: Leaving GOMAXPROCS=12: CPU quota undefined
{"level":"info","ts":1598602484.5554316,"caller":"flags/service.go:116","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1598602484.5556228,"caller":"flags/admin.go:120","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1598602484.5556521,"caller":"flags/admin.go:126","msg":"Starting admin HTTP server","http-addr":":14271"}
{"level":"info","ts":1598602484.5556605,"caller":"flags/admin.go:112","msg":"Admin server started","http.host-port":"[::]:14271","health-status":"unavailable"}
{"level":"info","ts":1598602484.5561311,"caller":"grpc/builder.go:69","msg":"Agent requested insecure grpc connection to collector(s)"}
{"level":"info","ts":1598602484.5561495,"caller":"grpc@v1.28.0-pre/clientconn.go:106","msg":"parsed scheme: \"\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602484.5561638,"caller":"grpc@v1.28.0-pre/clientconn.go:106","msg":"scheme \"\" not registered, fallback to default scheme","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602484.5561745,"caller":"passthrough/passthrough.go:48","msg":"ccResolverWrapper: sending update to cc: {[{localhost:14250  <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602484.5561845,"caller":"grpc@v1.28.0-pre/clientconn.go:948","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602484.5562766,"caller":"grpc/builder.go:103","msg":"Checking connection to collector"}
{"level":"info","ts":1598602484.5562928,"caller":"grpc/builder.go:108","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"info","ts":1598602484.5570667,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002fca80:{{localhost:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602484.5570996,"caller":"grpc/builder.go:108","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"READY"}
{"level":"info","ts":1598602484.5574603,"caller":"agent/main.go:78","msg":"Starting agent"}
{"level":"info","ts":1598602484.5575192,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1598602484.5575948,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}
{"level":"info","ts":1598602492.940894,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"warn","ts":1598602492.9432561,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602493.9436958,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"warn","ts":1598602493.9458196,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602495.4147575,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"warn","ts":1598602495.4163811,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602498.001834,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"warn","ts":1598602498.0036414,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602502.4922438,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598602502.4940085,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002fca80:{{localhost:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you decide to not return here, you probably won't want to return on the WaitForStateChange line either. And you'll probably want a different context for the subsequent runs (without a timeout?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return statement here is to stop logging the state once it's saw ready. Let me check it on how to solve it on the subsequent runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing return statement log the state over and over, let me check it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the WaitForStateChange blocking until a change has been seen?

Copy link
Contributor Author

@BernardTolosajr BernardTolosajr Aug 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove the return statement to get the state in the subsequent runs. The logs below are from the agent: the collector was initially available, then killed it for a few seconds, and started again:

{"level":"info","ts":1598660159.389798,"caller":"grpc@v1.28.0-pre/clientconn.go:948","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660159.390025,"caller":"grpc/builder.go:103","msg":"Checking connection to collector"}
{"level":"info","ts":1598660159.390068,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"info","ts":1598660159.393039,"caller":"agent/main.go:78","msg":"Starting agent"}
{"level":"info","ts":1598660159.393326,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1598660159.39345,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}
{"level":"info","ts":1598660159.393458,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002746a0:{{localhost:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660159.393586,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"READY"}
{"level":"info","ts":1598660164.37164,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660164.371763,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"warn","ts":1598660164.37359,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660164.3737128,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1598660165.375075,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660165.37512,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"warn","ts":1598660165.376992,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660165.377099,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1598660167.2614028,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660167.261448,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"warn","ts":1598660167.263094,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660167.2631412,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1598660170.339704,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660170.339753,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"warn","ts":1598660170.344249,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660170.3443048,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1598660174.1797829,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660174.179817,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"warn","ts":1598660174.181422,"caller":"grpc@v1.28.0-pre/clientconn.go:1223","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err :connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660174.181532,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1598660181.663411,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660181.663457,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"info","ts":1598660181.666461,"caller":"base/balancer.go:170","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002746a0:{{localhost:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1598660181.6665568,"caller":"grpc/builder.go:110","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"READY"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the modified code. Thoughts?

go func(cc *grpc.ClientConn) {
	logger.Info("Checking connection to collector")
	ctx, cancel := context.WithTimeout(context.Background(), time.Second)
	defer cancel()
	var currentState connectivity.State
	for {
		s := cc.GetState()
		if currentState != s {
			logger.Info("Agent collector connection state change", zap.String("dialTarget", dialTarget), zap.Stringer("status", s))
		}
		currentState = s
		cc.WaitForStateChange(ctx, s)
	}
}(conn)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the PR with the modified code? We can then leave comments on the specific lines :-)

My comment would be: you have only one context for all iterations, and it will be timed out after a few seconds. What happens with cc.WaitForStateChange when it receives a context that has expired already? I would expect it to return right away, instead of waiting for a state change.

My suggestion: create a new context right after the cc.WaitForStateChange without a timeout, and use it for all subsequent calls. This way, your go routine will block until there's a state change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, i will update the PR with the modified code.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do another manual test, but looks good to me.

@jpkrohling
Copy link
Contributor

@BernardTolosajr would you please rebase this? The test failure might be intermittent, but I'll take a look at it if it happens again.

@BernardTolosajr
Copy link
Contributor Author

Got it.

@BernardTolosajr BernardTolosajr force-pushed the agent-collector-connection-status branch from da3bb01 to f08962a Compare September 11, 2020 02:16
Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but will do a manual test just in case.

@jpkrohling
Copy link
Contributor

Seems to be working fine, except for the case where the collector goes away and then back. I'm not sure we actually have any options here, as I don't think gRPC will propagate this state until a real failure happens.

Test:

  1. Start the collector:
$ SPAN_STORAGE_TYPE=memory go run ./cmd/collector

And then the agent:

$ go run ./cmd/agent --reporter.grpc.host-port=localhost:14250
2020/09/11 10:22:05 maxprocs: Leaving GOMAXPROCS=12: CPU quota undefined
{"level":"info","ts":1599812525.3154635,"caller":"flags/service.go:116","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1599812525.3155801,"caller":"flags/admin.go:120","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1599812525.3156064,"caller":"flags/admin.go:126","msg":"Starting admin HTTP server","http-addr":":14271"}
{"level":"info","ts":1599812525.315617,"caller":"flags/admin.go:112","msg":"Admin server started","http.host-port":"[::]:14271","health-status":"unavailable"}
{"level":"info","ts":1599812525.316112,"caller":"grpc/builder.go:69","msg":"Agent requested insecure grpc connection to collector(s)"}
{"level":"info","ts":1599812525.31614,"caller":"grpc@v1.29.1/clientconn.go:243","msg":"parsed scheme: \"\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3161561,"caller":"grpc@v1.29.1/clientconn.go:249","msg":"scheme \"\" not registered, fallback to default scheme","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3161683,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{localhost:14250  <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3161745,"caller":"grpc@v1.29.1/clientconn.go:667","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3161795,"caller":"grpc@v1.29.1/clientconn.go:682","msg":"Channel switches to new LB policy \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3161926,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3162704,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"localhost:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3163183,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3163552,"caller":"grpc/builder.go:103","msg":"Checking connection to collector"}
{"level":"info","ts":1599812525.3163776,"caller":"grpc/builder.go:108","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"CONNECTING"}
{"level":"info","ts":1599812525.316865,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3169198,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0001f1c00:{{localhost:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3169289,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812525.3169355,"caller":"grpc/builder.go:108","msg":"Agent collector connection state change","dialTarget":"localhost:14250","status":"READY"}
{"level":"info","ts":1599812525.317298,"caller":"agent/main.go:78","msg":"Starting agent"}
{"level":"info","ts":1599812525.3173316,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1599812525.3173654,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}

Then, kill the collector. The agent shows:

{"level":"info","ts":1599812530.4758391,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812530.4760103,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"localhost:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812530.4760938,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812530.4761722,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"warn","ts":1599812530.4768913,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812530.4769225,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812530.476963,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812531.4771075,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812531.4771428,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"localhost:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"warn","ts":1599812531.4775724,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812531.4775908,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812533.2571855,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812533.257261,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"localhost:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"warn","ts":1599812533.257662,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812533.2576776,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812535.7288105,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812535.7288384,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"localhost:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"warn","ts":1599812535.729178,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {localhost:14250  <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp [::1]:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812535.7291934,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812539.0429885,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812539.043027,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"localhost:14250\" to connect","system":"grpc","grpc_log":true}

And when the collector is back:

{"level":"info","ts":1599812539.0434477,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812539.043485,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0001f1c00:{{localhost:14250  <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1599812539.0434995,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}

(note that I rebased your branch, so that it picks the newer gRPC client that we have in master right now)

Please, rebase this PR and I'll merge it.

…aegertracing#2410)

Signed-off-by: Bernardo Tolosa <bernardotolosajr@gmail.com>
Signed-off-by: Bernardo Tolosa <bernardotolosajr@gmail.com>
Signed-off-by: Bernardo Tolosa <bernardotolosajr@gmail.com>
Signed-off-by: Bernardo Tolosa <bernardotolosajr@gmail.com>
@BernardTolosajr BernardTolosajr force-pushed the agent-collector-connection-status branch from f08962a to 8f6570e Compare September 11, 2020 11:49
@BernardTolosajr
Copy link
Contributor Author

noted on this one, i will rebase and push it.

@jpkrohling
Copy link
Contributor

Codecov seems stuck, but I'm merging it anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Positive confirmation that Agent made a connection to Collector
4 participants