Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move HTTP calls to Logstash from New() to Fetch() #15306

Merged
merged 5 commits into from
Jan 3, 2020

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Jan 2, 2020

Resolves #15276.

This PR makes the logstash/node and logstash/node_stats metricsets resilient to Logstash's unavailability.

Before this PR, if Logstash was not already running when Metricbeat was started up with the logstash-xpack module enabled, Metricbeat would immediately exit with an error since Logstash was unreachable.

With this PR, Metricbeat will keep running and retrying to reach Logstash periodically. This also allows Logstash to go away temporarily (say, for an upgrade) and keeps Metricbeat running.

Testing this PR

  1. Build Metricbeat with this PR.

    cd $GOPATH/src/github.com/elastic/beats/metricbeat
    mage build
    
  2. Enable the logstash-xpack module.

    ./metricbeat modules enable logstash-xpack
    
  3. Start up Metricbeat (without starting up Logstash).

    ./metricbeat -e
    
  4. Note that there are errors in the Metricbeat log about it not being able to connect to Logstash's APIs. But also ensure that Metricbeat keeps running.

  5. Start up Logstash.

  6. Check the Metricbeat log again. Ensure that eventually (< 1 minute), the connection errors go away.

  7. Ensure that a .monitoring-logstash-*-mb-* index for today's date exists. Ensure that it contains recent (timestamp within last 20 seconds) documents of type:logstash_stats. Ensure that it contains exactly one document per Logstash pipeline of type:logstash_state (a new document is created per pipeline, per Logstash node (re)start).

    POST .monitoring-logstash-*/_search
    {
      "size": 0,
      "aggs": {
        "by_type": {
          "terms": {
            "field": "type",
            "size": 10
          },
          "aggs": {
            "latest_timestamp": {
              "max": {
                "field": "timestamp"
              }
            }
          }
        }
      }
    }
    
  8. Stop Logstash.

  9. Repeat steps 4-7.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring (Stack monitoring)

@ycombinator ycombinator added v7.5.2 needs_backport PR is waiting to be backported to other branches. labels Jan 2, 2020

m.HTTP.SetURI(m.HTTP.GetURI() + "?vertices=true")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator The two init function looks really close to me could it be possible to move them to the logstash package and use the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph I extracted much of the common code into the logstash package. Please take a look at let me know what you think. Thanks!

@ph
Copy link
Contributor

ph commented Jan 3, 2020

LGTM

@ycombinator
Copy link
Contributor Author

CI failures are unrelated. Merging.

@ycombinator ycombinator merged commit b05dd84 into elastic:master Jan 3, 2020
@ycombinator ycombinator deleted the mb-ls-resiliency branch January 3, 2020 15:23
ycombinator added a commit that referenced this pull request Jan 4, 2020
* node_stats: move HTTP calls from New() to Fetch()

* node: move HTTP calls from New() to Fetch()

* Adding CHANGELOG entry

* Refactoring: extract common code into logstash package

* Unexporting getVersion
ycombinator added a commit that referenced this pull request Jan 6, 2020
* node_stats: move HTTP calls from New() to Fetch()

* node: move HTTP calls from New() to Fetch()

* Adding CHANGELOG entry

* Refactoring: extract common code into logstash package

* Unexporting getVersion
@ycombinator ycombinator added test-plan Add this PR to be manual test plan and removed needs_backport PR is waiting to be backported to other branches. labels Jan 9, 2020
@mtojek
Copy link
Contributor

mtojek commented Jan 21, 2020

Manual testing passed (BC1).

Metricbeat survived taking down logstash. Noticed expected errors while restarting:

2020-01-21T11:47:15.603+0100	ERROR	[logstash.node]	node/node.go:79	could not fetch node pipelines: error making http request: Get http://localhost:9600/_node/pipelines?graph=true: dial tcp [::1]:9600: connect: connection refused
2020-01-21T11:47:15.608+0100	ERROR	[logstash.node_stats]	node_stats/node_stats.go:72	error making http request: Get http://localhost:9600/: dial tcp [::1]:9600: connect: connection refused
2020-01-21T11:47:35.606+0100	ERROR	[logstash.node]	node/node.go:79	could not fetch node pipelines: error making http request: Get http://localhost:9600/_node/pipelines?graph=true: EOF
2020-01-21T11:47:35.608+0100	ERROR	[logstash.node_stats]	node_stats/node_stats.go:72	error making http request: Get http://localhost:9600/: EOF
2020-01-21T11:47:45.607+0100	ERROR	[logstash.node]	node/node.go:79	could not fetch node pipelines: error making http request: Get http://localhost:9600/_node/pipelines?graph=true: EOF
2020-01-21T11:47:45.610+0100	ERROR	[logstash.node_stats]	node_stats/node_stats.go:72	error making http request: Get http://localhost:9600/: EOF

@mtojek mtojek added the test-plan-ok This PR passed manual testing label Jan 21, 2020
@andresrc andresrc added the Team:Integrations Label for the Integrations team label Mar 6, 2020
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…astic#15323)

* node_stats: move HTTP calls from New() to Fetch()

* node: move HTTP calls from New() to Fetch()

* Adding CHANGELOG entry

* Refactoring: extract common code into logstash package

* Unexporting getVersion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Feature:Stack Monitoring Metricbeat Metricbeat module Team:Integrations Label for the Integrations team test-plan Add this PR to be manual test plan test-plan-ok This PR passed manual testing v7.5.2 v7.6.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Metricbeat] When Logstash module is enable and we fail to reach it, metricbeat refuses to starts
6 participants