Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nsq_consumer regression by avoiding connection to an empty list of servers #9503

Merged
merged 1 commit into from
Jul 27, 2021

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented Jul 14, 2021

Required for all PRs:

  • Updated associated README.md.
  • Wrote appropriate unit tests.

resolves #9502

As a fallout of the linter-fixes we now check return code when trying to connect to servers specified in nsqd. However, if this option is omitted, which is valid when using nsqlookupd, the server list stays empty. In turn the attempt to connect to this empty list of servers fails as can be expected. Previously (<v1.19), this error went unnoticed as the returned error was not checked.

This PR now checks if the server-list specified in nsqd (and for backward-compatibility server) is empty and omits the attempt to connect to this empty list. Furthermore, we check if at least either nsqd or nsqlookupd has entires.

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Jul 14, 2021
@srebhan srebhan added plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins regression something that used to work, but is now broken labels Jul 14, 2021
Copy link
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jul 14, 2021
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great change, just have a few minor comments

@@ -134,15 +135,28 @@ func (n *NSQConsumer) Start(ac telegraf.Accumulator) error {
return nil
}))

// For backward compatibility
if n.Server != "" {
n.Nsqd = append(n.Nsqd, n.Server)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation says we prepend it to the list, but we're appending it here. Could be a concern if the server connection list order matters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thanks a lot for the review!
Regarding the order, I just used what was done before in this plugin (previous line 143). I can change this, should I?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably change the docs to say append. If the plugin appended before, changing it to prepending might cause incompatibility.

Comment on lines +144 to +146
if len(n.Nsqlookupd) == 0 && len(n.Nsqd) == 0 {
return fmt.Errorf("either 'nsqd' or 'nsqlookupd' needs to be specified")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about moving the server len checks to an func (n *NSQConsumer) Init() error function? This way it can error on startup that the configuration is invalid, rather than at connect time.
That might also help with another potential issue; if the Start function is called more than once (say connection failure), the above code will keep appending to the n.Nsqd slice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your point is perfectly valid and makes sense to me. I tried to touch as little as possible as this is a regression fix. Do you want me to fix this in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge the regression fix as-is so it's in 1.19.1 tomorrow and make another pr to implement Steven's recommendation.

@MyaLongmire MyaLongmire added the waiting for response waiting for response from contributor label Jul 22, 2021
@@ -134,15 +135,28 @@ func (n *NSQConsumer) Start(ac telegraf.Accumulator) error {
return nil
}))

// For backward compatibility
if n.Server != "" {
n.Nsqd = append(n.Nsqd, n.Server)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably change the docs to say append. If the plugin appended before, changing it to prepending might cause incompatibility.

Comment on lines +144 to +146
if len(n.Nsqlookupd) == 0 && len(n.Nsqd) == 0 {
return fmt.Errorf("either 'nsqd' or 'nsqlookupd' needs to be specified")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge the regression fix as-is so it's in 1.19.1 tomorrow and make another pr to implement Steven's recommendation.

@reimda reimda merged commit 80829b3 into influxdata:master Jul 27, 2021
@srebhan srebhan deleted the fix_9502 branch July 28, 2021 08:23
reimda pushed a commit that referenced this pull request Jul 28, 2021
phemmer added a commit to phemmer/telegraf that referenced this pull request Aug 13, 2021
* origin/master: (183 commits)
  fix: CrateDB replace dots in tag keys with underscores (influxdata#9566)
  feat: Pull metrics from multiple AWS CloudWatch namespaces (influxdata#9386)
  fix: improve Clickhouse corner cases for empty recordset in aggregation queries, fix dictionaries behavior (influxdata#9401)
  fix(opcua): clean client on disconnect so that connect works cleanly (influxdata#9583)
  fix: Refactor ec2 init for config-api (influxdata#9576)
  fix: sort logs by timestamp before writing to Loki (influxdata#9571)
  fix: muting tests for udp_listener (influxdata#9578)
  fix: Do not return on disconnect to avoid breaking reconnect (influxdata#9524)
  fix: Fixing k8s nodes and pods parsing error (influxdata#9581)
  feat: OpenTelemetry output plugin (influxdata#9228)
  feat: Support AWS Web Identity Provider (influxdata#9411)
  fix: upgraded sensu/go to v2.9.0 (influxdata#9577)
  fix: Normalize unix socket path (influxdata#9554)
  docs: fix aws ec2 readme inconsistency (influxdata#9567)
  feat: Modbus Rtu over tcp enhancement (influxdata#9570)
  docs: information on new conventional commit format (influxdata#9573)
  docs: Add logo (influxdata#9574)
  docs: Adding links to net_irtt and dht_sensor external plugins (influxdata#9569)
  Upgrade hashicorp/consul/api to 1.9.1 (influxdata#9565)
  Update vmware/govmomi to v0.26.0 (influxdata#9552)
  Do not skip good quality nodes after a bad quality node is encountered (influxdata#9550)
  fix test so it hits a fake service (influxdata#9564)
  Update changelog
  Fix procstat plugin README to match sample config (influxdata#9553)
  Fix metrics reported as written but not actually written  (influxdata#9526)
  Prevent segfault in persistent volume claims (influxdata#9549)
  Update procstat to support cgroup globs & include systemd unit children (Copy of influxdata#7890) (influxdata#9488)
  Fix attempt to connect to an empty list of servers. (influxdata#9503)
  Fix handling bool in sql input plugin (influxdata#9540)
  Suricata alerts (influxdata#9322)
  Linter fixes for plugins/inputs/[fg]* (influxdata#9387)
  For Prometheus Input add ability to query Consul Service catalog (influxdata#5464)
  Support Landing page on Prometheus landing page (influxdata#8641)
  [Docs] Clarify tagging behavior (influxdata#9461)
  Change the timeout from all queries to per query (influxdata#9471)
  Attach the pod labels to the `kubernetes_pod_volume` & `kubernetes_pod_network` metrics. (influxdata#9438)
  feat(http_listener_v2): allows multiple paths and add path_tag (influxdata#9529)
  Bug Fix Snmp empty metric name (influxdata#9519)
  Worktable workfile stats (influxdata#8587)
  Update Go to v1.16.6 (influxdata#9542)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. regression something that used to work, but is now broken waiting for response waiting for response from contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inputs.nsq_consumer fails after upgrading to 1.19.x
5 participants