Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nagios parser to parse all perfdata and report info message. #5601

Merged
merged 6 commits into from
Mar 25, 2019

Conversation

sergeykhegay
Copy link
Contributor

@sergeykhegay sergeykhegay commented Mar 18, 2019

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated. Not needed.
  • Has appropriate unit tests.

Example Nagios check output:

DISK OK - free space: / 3326 MB (56%); | /=2643MB;5948;5958;0;5968
/ 15272 MB (77%);
/boot 68 MB (69%);
/home 69357 MB (27%);
/var/log 819 MB (84%); | /boot=68MB;88;93;0;98
/home=69357MB;253404;253409;0;253414
/var/log=818MB;970;975;0;980

Old behavior:

nagios_state,host=localhost state=0i 1552677172000000000
nagios,host=localhost,perfdata=/,unit=MB critical_gt=5958,critical_lt=0,max=5968,min=0,value=2643,warning_gt=5948,warning_lt=0 1552677172000000000
nagios,host=localhost,perfdata=/boot,unit=MB critical_gt=93,critical_lt=0,max=98,min=0,value=68,warning_gt=88,warning_lt=0 1552677172000000000

New behavior:

nagios,host=localhost,perfdata=/,unit=MB critical_gt=5958,critical_lt=0,max=5968,min=0,value=2643,warning_gt=5948,warning_lt=0 1552939211000000000
nagios,host=localhost,perfdata=/boot,unit=MB critical_gt=93,critical_lt=0,max=98,min=0,value=68,warning_gt=88,warning_lt=0 1552939211000000000
nagios,host=localhost,perfdata=/home,unit=MB critical_gt=253409,critical_lt=0,max=253414,min=0,value=69357,warning_gt=253404,warning_lt=0 1552939211000000000
nagios,host=localhost,perfdata=/var/log,unit=MB critical_gt=975,critical_lt=0,max=980,min=0,value=818,warning_gt=970,warning_lt=0 1552939211000000000
nagios_state,host=localhost longmsg="/ 15272 MB (77%); /boot 68 MB (69%); /home 69357 MB (27%); /var/log 819 MB (84%);",msg="DISK OK - free space: / 3326 MB (56%);",state=0i 1552939211000000000

Changes:

  • Lines for perfdata in {/home, /var/log} added.
  • Info message and long info message are reported for the nagios_state.

Fixes #5591.

@sergeykhegay
Copy link
Contributor Author

@glinton, could you review please?

@sergeykhegay
Copy link
Contributor Author

Hi @danielnelson , @glinton is there any update on this? Is it being reviewed or any ETA? Regards, Sergey.

@danielnelson
Copy link
Contributor

I'll try to do a review later today

if err != nil {
return nil, fmt.Errorf("exec: get exit code: %s", err)
}
nagios.AppendExitCode(&out, exitCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like injecting the return code into the parser data, is there a reason this needs to be changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the state, service_output, long_service_output semantically belong together and should be a part of the same measurement.

The current parser interface doesn't allow to pass any extra information to the parser, whilst the service_output and the long_service_output are known during the parsing time only.

The parser will properly parse bytes for which the nagios.AppendExitCode(...) wasn't called, it will assume the default state to be 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not my primary concern but wouldn't the state be set incorrectly depending on how the output ends? Perhaps we could add the state to the telegraf.Metric after it has been returned from the parser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Because we encode the state in the exec plugin, no matter how the output ends, the parser will always properly decode the state and strip those additional bytes.

    There might be an issue, when the parser is used somewhere else. And it is my bad, because I did not properly document the behavior of the Parse method. I will add the docs, but first, could you please comment on the next point:

  • Appending the error code after we return from the parser is definitely possible. But will require me to do something like this:

// To return the exit code through the error handling mechanism.
//
type exitCode int

func (c exitCode) Error() string {}

// The callee changes
//
func (c CommandRunner) Run(
	e *Exec,
	command string,
	acc telegraf.Accumulator,
) ([]byte, error) {
	<...irrelevant...>

	out = <output from the check>

	if isNagiosParser {
		exitCode, err := nagios.GetExitCode(nagiosErr)
		if err != nil {
			return nil, fmt.Errorf("exec: get exit code: %s", err)
		}
		return out.Bytes(), exitCode(exitCode)
	}

	return out.Bytes(), nil
}

// The caller changes
//
func (e *Exec) ProcessCommand(command string, acc telegraf.Accumulator, wg *sync.WaitGroup) {
	defer wg.Done()

	out, err := e.runner.Run(e, command, acc)
        var exitCode int
	if err != nil {
		if v, ok := err.(exitCode); ok {
			exitCode = int(v)
		} else {
			acc.AddError(err)
			return
		}
	}
      

	metrics, err := e.parser.Parse(out)
	if err != nil {
		acc.AddError(err)
		return
	}
	
        if _, ok := e.parser.(*nagios.NagiosParser); ok {
                // Find the metric with name == "nagios_state" and append the "state" field to it.
                // Can be a convention that it is the last metric, or iterate through all and find it.
               m := <somehow find the metric>
               m.AddField("state", exitCode)
        }
        
        for _, metric := range metrics {
		acc.AddFields(metric.Name(), metric.Fields(), metric.Tags(), metric.Time())
	}
}
  • I am relying on your expertise here. Which approach seem to be better for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielnelson, what is your take on this ^^?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure, neither solution is great, and if I could do it over then I would probably make nagios it's own plugin separate from exec and scrap the standalone parser.

Putting that aside though, since I don't think it is worth a breaking change... I would parse, then if this is the NagiosParser, get the error code and iterate over the metrics looking for the nagios_state metric using metric.Name(). You will also want to watch for the case where the state metric is not found, and add the field as a new metric.

One more thing, could you do me a favor and use this form for adding the metrics?

        for _, metric := range metrics {
		acc.AddMetric(metric)
	}

plugins/parsers/nagios/parser_test.go Outdated Show resolved Hide resolved
@danielnelson danielnelson added this to the 1.11.0 milestone Mar 21, 2019
@danielnelson danielnelson added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Mar 21, 2019
@sergeykhegay
Copy link
Contributor Author

Hi @danielnelson, commented and made some changes. Do you think it is good to go now?

@sergeykhegay
Copy link
Contributor Author

Hi @danielnelson, addressed your comments.

One of the test builds fails though, it seems to be something finicky with the require library. Have you seen something similar before?

There is a new version of it, I'll check if they have related changes there.

@danielnelson
Copy link
Contributor

I think the tests fails because the ordering in a map in Go is unspecified, so they can't be safely compared, try using the helper functions in testutil/metric.go.

@sergeykhegay
Copy link
Contributor Author

Fixed the tests. But this is very weird. Golang's reflect.DeepEqual should have handled this correctly. And the require library seems to use it (just skimmed the code).

Please let me know what you think of the changes, @danielnelson.

@danielnelson danielnelson merged commit 60027cf into influxdata:master Mar 25, 2019
@danielnelson
Copy link
Contributor

Looks great, thanks!

dupondje pushed a commit to dupondje/telegraf that referenced this pull request Apr 22, 2019
hwaastad pushed a commit to hwaastad/telegraf that referenced this pull request Jun 13, 2019
@sergeykhegay sergeykhegay deleted the fix-nagios branch June 19, 2019 18:06
bitcharmer pushed a commit to bitcharmer/telegraf that referenced this pull request Oct 18, 2019
athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants