Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE [metricbeat][system/process] - Handle non-fatal errors #40298

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1bfcecb
fix: multierror support
VihasMakwana Jul 17, 2024
21b102b
chore: add degradable error
VihasMakwana Jul 18, 2024
51a7854
chore: update process summary
VihasMakwana Jul 18, 2024
908553d
chore: rename function
VihasMakwana Jul 18, 2024
fd6d312
fix: update go.mod, update uuid and metrics version
VihasMakwana Jul 23, 2024
091fff8
fix: test
VihasMakwana Jul 23, 2024
42101c8
Merge branch 'main' into metricbeat-process-multierr
VihasMakwana Jul 23, 2024
ac01831
update: go.mod
VihasMakwana Jul 23, 2024
246d730
fix: add license, remove uuid5
VihasMakwana Jul 23, 2024
e842010
fix: update notice
VihasMakwana Jul 23, 2024
0293645
fix: remove ioutil
VihasMakwana Jul 23, 2024
add7a45
fix: unit test
VihasMakwana Jul 23, 2024
c1d4aba
fix: add specifc version metric-system
VihasMakwana Jul 23, 2024
9433065
chore: update tests
VihasMakwana Jul 23, 2024
58bc2ff
Merge branch 'main' into metricbeat-process-multierr
VihasMakwana Jul 23, 2024
8940f7d
fix: update notice
VihasMakwana Jul 23, 2024
82dc103
fix: typo
VihasMakwana Jul 23, 2024
2e0bd28
fix: typo
VihasMakwana Jul 23, 2024
806cda4
Merge branch 'main' into metricbeat-process-multierr
VihasMakwana Jul 23, 2024
18d38af
fix: add comments
VihasMakwana Jul 23, 2024
dd35df4
Merge branch 'main' into metricbeat-process-multierr
VihasMakwana Jul 24, 2024
d843237
fix: update comments
VihasMakwana Jul 24, 2024
833bae6
fix: update test case
VihasMakwana Jul 24, 2024
895ce90
fix: update test case comments
VihasMakwana Jul 24, 2024
7a4ef73
fix: mage check
VihasMakwana Jul 24, 2024
6c9338e
fix: add comments
VihasMakwana Jul 25, 2024
f7b5fc0
fix: mage check
VihasMakwana Jul 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13180,11 +13180,11 @@ Contents of probable licence file $GOMODCACHE/github.com/elastic/elastic-agent-l

--------------------------------------------------------------------------------
Dependency : github.com/elastic/elastic-agent-system-metrics
Version: v0.10.3
Version: v0.10.6-0.20240723152150-0f81a1e67528
Licence type (autodetected): Apache-2.0
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/elastic/elastic-agent-system-metrics@v0.10.3/LICENSE.txt:
Contents of probable licence file $GOMODCACHE/github.com/elastic/elastic-agent-system-metrics@v0.10.6-0.20240723152150-0f81a1e67528/LICENSE.txt:

Apache License
Version 2.0, January 2004
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ require (
github.com/elastic/ebpfevents v0.6.0
github.com/elastic/elastic-agent-autodiscover v0.7.0
github.com/elastic/elastic-agent-libs v0.9.15
github.com/elastic/elastic-agent-system-metrics v0.10.3
github.com/elastic/elastic-agent-system-metrics v0.10.6-0.20240723152150-0f81a1e67528
github.com/elastic/go-elasticsearch/v8 v8.14.0
github.com/elastic/go-sfdc v0.0.0-20240621062639-bcc8456508ff
github.com/elastic/mito v1.15.0
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -551,8 +551,8 @@ github.com/elastic/elastic-agent-client/v7 v7.15.0 h1:nDB7v8TBoNuD6IIzC3z7Q0y+7b
github.com/elastic/elastic-agent-client/v7 v7.15.0/go.mod h1:6h+f9QdIr3GO2ODC0Y8+aEXRwzbA5W4eV4dd/67z7nI=
github.com/elastic/elastic-agent-libs v0.9.15 h1:WCLtuErafUxczT/rXJa4Vr6mxwC8dgtqMbEq+qWGD4M=
github.com/elastic/elastic-agent-libs v0.9.15/go.mod h1:2VgYxHaeM+cCDBjiS2wbmTvzPGbnlXAtYrlcLefheS8=
github.com/elastic/elastic-agent-system-metrics v0.10.3 h1:8pWdj8DeY8PBG/BA0DJalRpJWruDoP5QrIP/YKug5dE=
github.com/elastic/elastic-agent-system-metrics v0.10.3/go.mod h1:3JwPa3zZJjmBYN87xwdLcFpHrUkWpR863jiYdg39sSc=
github.com/elastic/elastic-agent-system-metrics v0.10.6-0.20240723152150-0f81a1e67528 h1:AuBoMilTnJRszbtwZQSTbzuA81TgWz+unsqRR0Brs14=
github.com/elastic/elastic-agent-system-metrics v0.10.6-0.20240723152150-0f81a1e67528/go.mod h1:cd7YgcTEjBNeLGnH/C9cEvP/YexohwS6CpmN9Ju58Mw=
github.com/elastic/elastic-transport-go/v8 v8.6.0 h1:Y2S/FBjx1LlCv5m6pWAF2kDJAHoSjSRSJCApolgfthA=
github.com/elastic/elastic-transport-go/v8 v8.6.0/go.mod h1:YLHer5cj0csTzNFXoNQ8qhtGY1GTvSqPnKWKaqQE3Hk=
github.com/elastic/fsevents v0.0.0-20181029231046-e1d381a4d270 h1:cWPqxlPtir4RoQVCpGSRXmLqjEHpJKbR60rxh1nQZY4=
Expand Down
2 changes: 1 addition & 1 deletion libbeat/cmd/instance/beat.go
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ func NewBeat(name, indexPrefix, v string, elasticLicensed bool, initFuncs []func
ID: id,
FirstStart: time.Now(),
StartTime: time.Now(),
EphemeralID: metricreport.EphemeralID(),
EphemeralID: uuid.UUID(metricreport.EphemeralID()),
},
Fields: fields,
}
Expand Down
7 changes: 5 additions & 2 deletions metricbeat/module/system/process/process.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
package process

import (
"errors"
"fmt"
"os"
"runtime"
Expand Down Expand Up @@ -111,7 +112,8 @@ func (m *MetricSet) Fetch(r mb.ReporterV2) error {
// monitor either a single PID, or the configured set of processes.
if m.setpid == 0 {
procs, roots, err := m.stats.Get()
if err != nil {
if err != nil && !errors.Is(err, process.NonFatalErr{}) {
cmacknz marked this conversation as resolved.
Show resolved Hide resolved
// return only if the error is fatal in nature
return fmt.Errorf("process stats: %w", err)
}

Expand All @@ -121,9 +123,10 @@ func (m *MetricSet) Fetch(r mb.ReporterV2) error {
RootFields: roots[evtI],
})
if !isOpen {
return nil
return err
}
}
return err
} else {
proc, root, err := m.stats.GetOneRootEvent(m.setpid)
if err != nil {
Expand Down
9 changes: 7 additions & 2 deletions metricbeat/module/system/process/process_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
package process

import (
"errors"
"os"
"testing"
"time"
Expand All @@ -37,13 +38,17 @@ func TestFetch(t *testing.T) {

f := mbtest.NewReportingMetricSetV2Error(t, getConfig())
events, errs := mbtest.ReportingFetchV2Error(f)
assert.Empty(t, errs)
for _, err := range errs {
assert.True(t, errors.Is(err, process.NonFatalErr{}))
}
assert.NotEmpty(t, events)

time.Sleep(2 * time.Second)

events, errs = mbtest.ReportingFetchV2Error(f)
assert.Empty(t, errs)
for _, err := range errs {
assert.True(t, errors.Is(err, process.NonFatalErr{}))
}
assert.NotEmpty(t, events)

t.Logf("fetched %d events, showing events[0]:", len(events))
Expand Down
16 changes: 9 additions & 7 deletions metricbeat/module/system/process_summary/process_summary.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@
package process_summary

import (
"errors"
"fmt"
"io/ioutil"
"os"
"runtime"
"strconv"
"strings"
Expand Down Expand Up @@ -68,9 +69,10 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
// descriptive error must be returned.
func (m *MetricSet) Fetch(r mb.ReporterV2) error {

procList, err := process.ListStates(m.sys)
if err != nil {
return fmt.Errorf("error fetching process list: %w", err)
procList, degradeErr := process.ListStates(m.sys)
VihasMakwana marked this conversation as resolved.
Show resolved Hide resolved
if degradeErr != nil && !errors.Is(degradeErr, process.NonFatalErr{}) {
// return only if the error is fatal in nature
return fmt.Errorf("error fetching process list: %w", degradeErr)
}

procStates := map[string]int{}
Expand All @@ -83,7 +85,7 @@ func (m *MetricSet) Fetch(r mb.ReporterV2) error {
}

outMap := mapstr.M{}
err = typeconv.Convert(&outMap, procStates)
err := typeconv.Convert(&outMap, procStates)
if err != nil {
return fmt.Errorf("error formatting process stats: %w", err)
}
Expand All @@ -101,13 +103,13 @@ func (m *MetricSet) Fetch(r mb.ReporterV2) error {
MetricSetFields: outMap,
})

return nil
return degradeErr
cmacknz marked this conversation as resolved.
Show resolved Hide resolved
}

// threadStats returns a map of state counts for running threads on a system
func threadStats(sys resolve.Resolver) (mapstr.M, error) {
statPath := sys.ResolveHostFS("/proc/stat")
procData, err := ioutil.ReadFile(statPath)
procData, err := os.ReadFile(statPath)
if err != nil {
return nil, fmt.Errorf("error reading procfs file %s: %w", statPath, err)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
package process_summary

import (
"errors"
"testing"

"github.com/stretchr/testify/assert"
Expand All @@ -46,7 +47,9 @@ func TestFetch(t *testing.T) {
f := mbtest.NewReportingMetricSetV2Error(t, getConfig())
events, errs := mbtest.ReportingFetchV2Error(f)

require.Empty(t, errs)
for _, err := range errs {
assert.True(t, errors.Is(err, process.NonFatalErr{}))
}
require.NotEmpty(t, events)
event := events[0].BeatEvent("system", "process_summary").Fields
t.Logf("%s/%s event: %+v", f.Module().Name(), f.Name(),
Expand All @@ -62,7 +65,9 @@ func TestStateNames(t *testing.T) {
f := mbtest.NewReportingMetricSetV2Error(t, getConfig())
events, errs := mbtest.ReportingFetchV2Error(f)

require.Empty(t, errs)
for _, err := range errs {
assert.True(t, errors.Is(err, process.NonFatalErr{}))
}
require.NotEmpty(t, events)
event := events[0].BeatEvent("system", "process_summary").Fields

Expand Down
39 changes: 39 additions & 0 deletions metricbeat/module/system/test_system.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,9 +385,18 @@ def test_process_summary(self):

output = self.read_output_json()
self.assertGreater(len(output), 0)
only_errors_encountered = True

for evt in output:
self.assert_fields_are_documented(evt)
if evt.get("error", None) is not None:
# Here, we assume that the error is non-fatal and we move forward the test execution.
# If the error is non-fatal, the test should pass with assertions.
# If the error is fatal, the test should fail.
continue

# we've encoutered an event. Turn off the flag
only_errors_encountered = False

summary = evt["system"]["process"]["summary"]
assert isinstance(summary["total"], int)
Expand All @@ -396,6 +405,10 @@ def test_process_summary(self):
assert isinstance(summary["running"], int)
assert isinstance(summary["total"], int)

# If the flag is true, we've only encountered error (fatal errors)
# If the flag is false, we've encoutered events and probably some non-fatal errors.
assert not only_errors_encountered

@unittest.skipUnless(re.match("(?i)win|linux|darwin|freebsd", sys.platform), "os")
def test_process(self):
"""
Expand All @@ -419,7 +432,17 @@ def test_process(self):
self.assertGreater(len(output), 0)

found_cmdline = False
only_errors_encountered = True
for evt in output:
if evt.get("error", None) is not None:
# Here, we assume that the error is non-fatal and we move forward the test execution.
# If the error is non-fatal, the test should pass with assertions.
# If the error is fatal, the test should fail.
continue

# we've encoutered an event. Turn off the flag
only_errors_encountered = False

process = evt["system"]["process"]
# Not all process will have 'cmdline' due to permission issues,
# especially on Windows. Therefore we ensure at least some of
Expand All @@ -442,6 +465,10 @@ def test_process(self):
self.assertTrue(
found_cmdline, "cmdline not found in any process events")

# If the flag is true, we've only encountered error (fatal errors)
# If the flag is false, we've encoutered events and probably some non-fatal errors.
assert not only_errors_encountered

@unittest.skipUnless(re.match("(?i)linux|darwin|freebsd", sys.platform), "os")
def test_process_unix(self):
"""
Expand Down Expand Up @@ -477,7 +504,16 @@ def test_process_unix(self):
found_fd = False
found_env = False
found_cwd = not sys.platform.startswith("linux")
only_errors_encountered = True
for evt in output:
if evt.get("error", None) is not None:
# Here, we assume that the error is non-fatal and we move forward the test execution.
# If the error is non-fatal, the test should pass with assertions.
# If the error is fatal, the test should fail.
continue
# we've encoutered an event. Turn off the flag
only_errors_encountered = False

found_cwd |= "working_directory" in evt["process"]

process = evt["system"]["process"]
Expand All @@ -499,6 +535,9 @@ def test_process_unix(self):
if not sys.platform.startswith("darwin"):
self.assertTrue(found_fd, "fd not found in any process events")

# If the flag is true, we've only encountered error (fatal errors)
# If the flag is false, we've encoutered events and probably some non-fatal errors.
assert not only_errors_encountered
self.assertTrue(found_env, "env not found in any process events")
self.assertTrue(
found_cwd, "working_directory not found in any process events")
Expand Down
Loading