-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nsq_consumer regression by avoiding connection to an empty list of servers #9503
Conversation
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great change, just have a few minor comments
@@ -134,15 +135,28 @@ func (n *NSQConsumer) Start(ac telegraf.Accumulator) error { | |||
return nil | |||
})) | |||
|
|||
// For backward compatibility | |||
if n.Server != "" { | |||
n.Nsqd = append(n.Nsqd, n.Server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation says we prepend it to the list, but we're appending it here. Could be a concern if the server connection list order matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all thanks a lot for the review!
Regarding the order, I just used what was done before in this plugin (previous line 143). I can change this, should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change the docs to say append. If the plugin appended before, changing it to prepending might cause incompatibility.
if len(n.Nsqlookupd) == 0 && len(n.Nsqd) == 0 { | ||
return fmt.Errorf("either 'nsqd' or 'nsqlookupd' needs to be specified") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about moving the server len checks to an func (n *NSQConsumer) Init() error
function? This way it can error on startup that the configuration is invalid, rather than at connect time.
That might also help with another potential issue; if the Start function is called more than once (say connection failure), the above code will keep appending to the n.Nsqd slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your point is perfectly valid and makes sense to me. I tried to touch as little as possible as this is a regression fix. Do you want me to fix this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge the regression fix as-is so it's in 1.19.1 tomorrow and make another pr to implement Steven's recommendation.
@@ -134,15 +135,28 @@ func (n *NSQConsumer) Start(ac telegraf.Accumulator) error { | |||
return nil | |||
})) | |||
|
|||
// For backward compatibility | |||
if n.Server != "" { | |||
n.Nsqd = append(n.Nsqd, n.Server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change the docs to say append. If the plugin appended before, changing it to prepending might cause incompatibility.
if len(n.Nsqlookupd) == 0 && len(n.Nsqd) == 0 { | ||
return fmt.Errorf("either 'nsqd' or 'nsqlookupd' needs to be specified") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge the regression fix as-is so it's in 1.19.1 tomorrow and make another pr to implement Steven's recommendation.
(cherry picked from commit 80829b3)
* origin/master: (183 commits) fix: CrateDB replace dots in tag keys with underscores (influxdata#9566) feat: Pull metrics from multiple AWS CloudWatch namespaces (influxdata#9386) fix: improve Clickhouse corner cases for empty recordset in aggregation queries, fix dictionaries behavior (influxdata#9401) fix(opcua): clean client on disconnect so that connect works cleanly (influxdata#9583) fix: Refactor ec2 init for config-api (influxdata#9576) fix: sort logs by timestamp before writing to Loki (influxdata#9571) fix: muting tests for udp_listener (influxdata#9578) fix: Do not return on disconnect to avoid breaking reconnect (influxdata#9524) fix: Fixing k8s nodes and pods parsing error (influxdata#9581) feat: OpenTelemetry output plugin (influxdata#9228) feat: Support AWS Web Identity Provider (influxdata#9411) fix: upgraded sensu/go to v2.9.0 (influxdata#9577) fix: Normalize unix socket path (influxdata#9554) docs: fix aws ec2 readme inconsistency (influxdata#9567) feat: Modbus Rtu over tcp enhancement (influxdata#9570) docs: information on new conventional commit format (influxdata#9573) docs: Add logo (influxdata#9574) docs: Adding links to net_irtt and dht_sensor external plugins (influxdata#9569) Upgrade hashicorp/consul/api to 1.9.1 (influxdata#9565) Update vmware/govmomi to v0.26.0 (influxdata#9552) Do not skip good quality nodes after a bad quality node is encountered (influxdata#9550) fix test so it hits a fake service (influxdata#9564) Update changelog Fix procstat plugin README to match sample config (influxdata#9553) Fix metrics reported as written but not actually written (influxdata#9526) Prevent segfault in persistent volume claims (influxdata#9549) Update procstat to support cgroup globs & include systemd unit children (Copy of influxdata#7890) (influxdata#9488) Fix attempt to connect to an empty list of servers. (influxdata#9503) Fix handling bool in sql input plugin (influxdata#9540) Suricata alerts (influxdata#9322) Linter fixes for plugins/inputs/[fg]* (influxdata#9387) For Prometheus Input add ability to query Consul Service catalog (influxdata#5464) Support Landing page on Prometheus landing page (influxdata#8641) [Docs] Clarify tagging behavior (influxdata#9461) Change the timeout from all queries to per query (influxdata#9471) Attach the pod labels to the `kubernetes_pod_volume` & `kubernetes_pod_network` metrics. (influxdata#9438) feat(http_listener_v2): allows multiple paths and add path_tag (influxdata#9529) Bug Fix Snmp empty metric name (influxdata#9519) Worktable workfile stats (influxdata#8587) Update Go to v1.16.6 (influxdata#9542) ...
Required for all PRs:
resolves #9502
As a fallout of the linter-fixes we now check return code when trying to connect to servers specified in
nsqd
. However, if this option is omitted, which is valid when usingnsqlookupd
, the server list stays empty. In turn the attempt to connect to this empty list of servers fails as can be expected. Previously (<v1.19), this error went unnoticed as the returned error was not checked.This PR now checks if the server-list specified in
nsqd
(and for backward-compatibilityserver
) is empty and omits the attempt to connect to this empty list. Furthermore, we check if at least eithernsqd
ornsqlookupd
has entires.