Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor state_* metricsets to share response from endpoint #25640

Merged
merged 17 commits into from
May 18, 2021
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -987,6 +987,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Apache: convert status.total_kbytes to status.total_bytes in fleet mode. {pull}23022[23022]
- Release MSSQL as GA {pull}23146[23146]
- Add support for SASL/SCRAM authentication to the Kafka module. {pull}24810[24810]
- Refactor state_* metricsets to share response from endpoint. {pull}25640[25640]
- Add server id to zookeeper events. {pull}25550[25550]
- Add additional network metrics to docker/network {pull}25354[25354]

Expand Down
16 changes: 11 additions & 5 deletions metricbeat/helper/prometheus/prometheus.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ type Prometheus interface {

GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapStr, error)

ProcessMetrics(families []*dto.MetricFamily, mapping *MetricsMapping) ([]common.MapStr, error)

ReportProcessedMetrics(mapping *MetricsMapping, r mb.ReporterV2) error
}

Expand Down Expand Up @@ -139,11 +141,7 @@ type MetricsMapping struct {
ExtraFields map[string]string
}

func (p *prometheus) GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapStr, error) {
families, err := p.GetFamilies()
if err != nil {
return nil, err
}
func (p *prometheus) ProcessMetrics(families []*dto.MetricFamily, mapping *MetricsMapping) ([]common.MapStr, error) {

eventsMap := map[string]common.MapStr{}
infoMetrics := []*infoMetricData{}
Expand Down Expand Up @@ -260,6 +258,14 @@ func (p *prometheus) GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapS
return events, nil
}

func (p *prometheus) GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapStr, error) {
families, err := p.GetFamilies()
if err != nil {
return nil, err
}
return p.ProcessMetrics(families, mapping)
}

// infoMetricData keeps data about an infoMetric
type infoMetricData struct {
Labels common.MapStr
Expand Down
93 changes: 93 additions & 0 deletions metricbeat/module/kubernetes/kubernetes.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package kubernetes

import (
"fmt"
"sync"
"time"

dto "github.com/prometheus/client_model/go"

p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
)

func init() {
// Register the ModuleFactory function for the "kubernetes" module.
if err := mb.Registry.AddModule("kubernetes", ModuleBuilder()); err != nil {
panic(err)
}
}

type Module interface {
mb.Module
GetSharedFamilies(prometheus p.Prometheus) ([]*dto.MetricFamily, error)
}

type familiesCache struct {
sharedFamilies []*dto.MetricFamily
lastFetchErr error
lastFetchTimestamp time.Time
lock sync.Mutex
}

type cacheMap map[string]*familiesCache

type module struct {
mb.BaseModule

fCache cacheMap
}

func ModuleBuilder() func(base mb.BaseModule) (mb.Module, error) {
jsoriano marked this conversation as resolved.
Show resolved Hide resolved
sharedFamiliesCache := make(cacheMap)
return func(base mb.BaseModule) (mb.Module, error) {
hash := generateCacheHash(base.Config().Hosts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle the hash is always going to be the same during the life of this module. Wdyt about storing it in module{} so it doesn't need to be recalculated every time? Actually, for the same reason, the module could keep a reference to the cache entry directly.

// NOTE: These entries will be never removed, this can be a leak if
// metricbeat is used to monitor clusters dynamically created.
// (https://github.com/elastic/beats/pull/25640#discussion_r633395213)
sharedFamiliesCache[hash] = &familiesCache{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These entries will be never removed, this can be a leak if metricbeat is used to monitor clusters dynamically created. I guess this is only a corner case, we can leave this by now.

Copy link
Member Author

@ChrsMark ChrsMark May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave a comment about this in the code so as to have a good pointer if an issue arise in the future. One thing we could do (on top of my head suggestion follows) to tackle this could be to have a method on module level to figure out what entries to remove, which method will be called from Metricset's Close().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is being written every time a module is created. As it is now, I see two possible problems:

  • There can be race conditions (and panics) if several metricsets are created at the same time (not sure if possible), or if a metricset calls GetSharedFamilies while other metricset with the same hosts is being created (I guess this can happen with bad luck and/or with a low metricbeat.max_start_delay).
  • If a metricset is created after another one has already filled the cache, the cache will be reset, not a big problem, but could be easily solved by checking if the cache entry exists.

I think reads and writes on this map should be also thread safe. And ideally we should check if there is some entry in the cache for a given key before overwriting it here.

m := module{
BaseModule: base,
fCache: sharedFamiliesCache,
}
return &m, nil
}
}

func (m *module) GetSharedFamilies(prometheus p.Prometheus) ([]*dto.MetricFamily, error) {
now := time.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done after getting the lock? If not, if a call to GetFamilies takes more than the period, all waiting metricsets will request the families again instead of reusing the ones just received, and the waiting metricset that end up requesting the families again will store an "old" timestamp.


hash := generateCacheHash(m.Config().Hosts)
fCache := m.fCache[hash]

fCache.lock.Lock()
defer fCache.lock.Unlock()

if fCache.lastFetchTimestamp.IsZero() || now.Sub(fCache.lastFetchTimestamp) > m.Config().Period {
fCache.sharedFamilies, fCache.lastFetchErr = prometheus.GetFamilies()
fCache.lastFetchTimestamp = now
}

return fCache.sharedFamilies, fCache.lastFetchErr
}

func generateCacheHash(host []string) string {
return fmt.Sprintf("%s", host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using something like https://github.com/mitchellh/hashstructure for hashing.

}
14 changes: 13 additions & 1 deletion metricbeat/module/kubernetes/state_container/state_container.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package state_container

import (
"fmt"
"strings"

"github.com/pkg/errors"
Expand All @@ -26,6 +27,7 @@ import (
p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
"github.com/elastic/beats/v7/metricbeat/mb/parse"
k8smod "github.com/elastic/beats/v7/metricbeat/module/kubernetes"
"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util"
)

Expand Down Expand Up @@ -89,6 +91,7 @@ type MetricSet struct {
mb.BaseMetricSet
prometheus p.Prometheus
enricher util.Enricher
mod k8smod.Module
}

// New create a new instance of the MetricSet
Expand All @@ -99,10 +102,15 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
if err != nil {
return nil, err
}
mod, ok := base.Module().(k8smod.Module)
if !ok {
return nil, fmt.Errorf("must be child of kubernetes module")
}
return &MetricSet{
BaseMetricSet: base,
prometheus: prometheus,
enricher: util.NewContainerMetadataEnricher(base, false),
mod: mod,
}, nil
}

Expand All @@ -112,7 +120,11 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
func (m *MetricSet) Fetch(reporter mb.ReporterV2) error {
m.enricher.Start()

events, err := m.prometheus.GetProcessedMetrics(mapping)
families, err := m.mod.GetSharedFamilies(m.prometheus)
if err != nil {
return errors.Wrap(err, "error getting families")
}
events, err := m.prometheus.ProcessMetrics(families, mapping)
if err != nil {
return errors.Wrap(err, "error getting event")
}
Expand Down
16 changes: 15 additions & 1 deletion metricbeat/module/kubernetes/state_cronjob/state_cronjob.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,14 @@
package state_cronjob

import (
"fmt"

"github.com/pkg/errors"

"github.com/elastic/beats/v7/libbeat/common"
p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
k8smod "github.com/elastic/beats/v7/metricbeat/module/kubernetes"
)

func init() {
Expand All @@ -40,6 +43,7 @@ type CronJobMetricSet struct {
mb.BaseMetricSet
prometheus p.Prometheus
mapping *p.MetricsMapping
mod k8smod.Module
}

// NewCronJobMetricSet returns a prometheus based metricset for CronJobs
Expand All @@ -49,9 +53,15 @@ func NewCronJobMetricSet(base mb.BaseMetricSet) (mb.MetricSet, error) {
return nil, err
}

mod, ok := base.Module().(k8smod.Module)
if !ok {
return nil, fmt.Errorf("must be child of kubernetes module")
}

return &CronJobMetricSet{
BaseMetricSet: base,
prometheus: prometheus,
mod: mod,
mapping: &p.MetricsMapping{
Metrics: map[string]p.MetricMap{
"kube_cronjob_info": p.InfoMetric(),
Expand All @@ -77,7 +87,11 @@ func NewCronJobMetricSet(base mb.BaseMetricSet) (mb.MetricSet, error) {
//
// Copied from other kube state metrics.
func (m *CronJobMetricSet) Fetch(reporter mb.ReporterV2) error {
events, err := m.prometheus.GetProcessedMetrics(m.mapping)
families, err := m.mod.GetSharedFamilies(m.prometheus)
if err != nil {
return errors.Wrap(err, "error getting family metrics")
}
events, err := m.prometheus.ProcessMetrics(families, m.mapping)
if err != nil {
return errors.Wrap(err, "error getting metrics")
}
Expand Down
17 changes: 16 additions & 1 deletion metricbeat/module/kubernetes/state_daemonset/state_daemonset.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,14 @@
package state_daemonset

import (
"fmt"

"github.com/elastic/beats/v7/libbeat/common"
"github.com/elastic/beats/v7/libbeat/common/kubernetes"
p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
"github.com/elastic/beats/v7/metricbeat/mb/parse"
k8smod "github.com/elastic/beats/v7/metricbeat/module/kubernetes"
"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util"
)

Expand Down Expand Up @@ -69,6 +72,7 @@ type MetricSet struct {
mb.BaseMetricSet
prometheus p.Prometheus
enricher util.Enricher
mod k8smod.Module
}

// New create a new instance of the MetricSet
Expand All @@ -79,10 +83,15 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
if err != nil {
return nil, err
}
mod, ok := base.Module().(k8smod.Module)
if !ok {
return nil, fmt.Errorf("must be child of kubernetes module")
}
return &MetricSet{
BaseMetricSet: base,
prometheus: prometheus,
enricher: util.NewResourceMetadataEnricher(base, &kubernetes.ReplicaSet{}, false),
mod: mod,
}, nil
}

Expand All @@ -92,7 +101,13 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
func (m *MetricSet) Fetch(reporter mb.ReporterV2) {
m.enricher.Start()

events, err := m.prometheus.GetProcessedMetrics(mapping)
families, err := m.mod.GetSharedFamilies(m.prometheus)
if err != nil {
m.Logger().Error(err)
reporter.Error(err)
return
}
events, err := m.prometheus.ProcessMetrics(families, mapping)
if err != nil {
m.Logger().Error(err)
reporter.Error(err)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,14 @@
package state_deployment

import (
"fmt"

"github.com/elastic/beats/v7/libbeat/common"
"github.com/elastic/beats/v7/libbeat/common/kubernetes"
p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
"github.com/elastic/beats/v7/metricbeat/mb/parse"
k8smod "github.com/elastic/beats/v7/metricbeat/module/kubernetes"
"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util"
)

Expand Down Expand Up @@ -70,6 +73,7 @@ type MetricSet struct {
mb.BaseMetricSet
prometheus p.Prometheus
enricher util.Enricher
mod k8smod.Module
}

// New create a new instance of the MetricSet
Expand All @@ -80,10 +84,15 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
if err != nil {
return nil, err
}
mod, ok := base.Module().(k8smod.Module)
if !ok {
return nil, fmt.Errorf("must be child of kubernetes module")
}
return &MetricSet{
BaseMetricSet: base,
prometheus: prometheus,
enricher: util.NewResourceMetadataEnricher(base, &kubernetes.Deployment{}, false),
mod: mod,
}, nil
}

Expand All @@ -93,7 +102,13 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
func (m *MetricSet) Fetch(reporter mb.ReporterV2) {
m.enricher.Start()

events, err := m.prometheus.GetProcessedMetrics(mapping)
families, err := m.mod.GetSharedFamilies(m.prometheus)
if err != nil {
m.Logger().Error(err)
reporter.Error(err)
return
}
events, err := m.prometheus.ProcessMetrics(families, mapping)
if err != nil {
m.Logger().Error(err)
reporter.Error(err)
Expand Down
Loading