-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[agent] Make timeout for reporting configurable #1034
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1034 +/- ##
======================================
Coverage 100% 100%
======================================
Files 159 159
Lines 7126 7136 +10
======================================
+ Hits 7126 7136 +10
Continue to review full report at Codecov.
|
Thanks for looking into this one.
We have some deployments at Uber where we're using two reporters of the same type (well, only tchannel reporter is supported anyway today) to send the data to two locations. However, it is done using the jaeger/cmd/agent/app/builder.go Lines 103 to 106 in 0c86682
I believe we have a custom config file format that we parse in a custom main. We debated that we could achieve that functionality by forwarding spans to a dedicated splitter, but that seems like more complicated deployment vs. a small code change in the agent. Ideally, we would like to preserve that forking functionality in the agent. Have you looked into going with the list approach as proposed? Another alternative is to use nested union-like config, instead of
|
I think this |
The tricky part is to decide how cli flags will support a list. If that proves to be too ugly, we could draw a line and say we only support one reporter by default. At Uber we can continue using the custom config and main for multiple reporters. |
Yeah, spent an hour+ trying to make the flags and config match up and not fail tests, couldn't get it to work. Prefer if there is only reporter being supported. |
let's go with just one reporter, as long as programmatically we can add more (as we can today). Could you post how the CLI / config will look ? |
So, now this just adds a simple This is how the config will look like: jaeger/cmd/agent/app/builder_test.go Lines 55 to 61 in 66ed190
|
The next week I will have time to look at #927 |
I'm running this in prod for the past 10 days and everything works great, we used to drop a ton of batches due to timeouts, but right now those occurrences are super rare when running with |
@gouthamve sorry for a longer delay. If i understand it correctly tchannel flags are prefixed with If we cannot make it work like it's in #1034 (comment). I think this could go in as there are no other options. cc @yurishkuro |
They are not prefixed right now. Changing flags is a breaking change. I could do it if that is preferred though. |
@gouthamve #1092 addresses agent refactoring. Once that is in this will have to be rebased. |
@gouthamve could you please rebase this on the current master? We have refactored agent flags now it should be easy to add new ones. |
0940c1e
to
f0d11c9
Compare
@pavolloffay Rebased :) I'll run this internally and report on how it performs. |
f0d11c9
to
bbf34aa
Compare
https://github.com/jaegertracing/jaeger/blob/master/cmd/agent/app/reporter/tchannel/flags.go#L82-L83 would always overwrite this: Fix is in the last commit. It wasn't caught before as flag values from the previous iteration are carried forward to the next iteration. |
31260a2
to
15476e0
Compare
Hmm, the test failure looks like a flake. |
@gouthamve are you also fixing #1192 in this PR? |
Yes, I need to re-init the flags for my tests to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks also for fixing the broken flags. A bonus would be a test verifying the timeout.
@yurishkuro could you please also have a look? I might lack some tchannel exprience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm overall, a couple minor nits
@@ -30,6 +30,8 @@ const ( | |||
hostPort = "host-port" | |||
discoveryMinPeers = "discovery.min-peers" | |||
discoveryConnCheckTimeout = "discovery.conn-check-timeout" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove blank line
} | ||
if v.GetDuration(tchannelPrefix+discoveryConnCheckTimeout) != defaultConnCheckTimeout { | ||
b.ConnCheckTimeout = v.GetDuration(tchannelPrefix + discoveryConnCheckTimeout) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking this does not solve the overriding problem if the manually provided value was the same as the default value, but it's still an improvement. The key thing is that we just need to remove the legacy flags soon.
15476e0
to
7229f86
Compare
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
7229f86
to
34d298d
Compare
Rebased and feedback incorporated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* agent: Make timeout for reporting configurable Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Fix tests and make sure deprecated flags work Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
So, this is less a fix for #1026 but more to kick off the discussion around: #927
While the original proposal has:
I kind of like:
I took a stab at implementing my version and turns out it's super straight forward and is very little changed code. But if the preference is to have the originally proposed format, I'll make the changes. But now I'm curious, can we 2 different
tchannel
reporters?Note: This will be a breaking change.