Adding xpack code for ES cluster stats metricset #7810

ycombinator · 2018-07-30T22:27:24Z

This PR teaches the elasticsearch/cluster_stats metricset to query the appropriate Elasticsearch HTTP APIs and index cluster_stats documents into .monitoring-es-6-mb-* indices. These documents should be compatible in structure to cluster_stats documents in the current .monitoring-es-6-* indices indexed via the internal monitoring method.

houndci-bot · 2018-07-30T22:27:36Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+// specific language governing permissions and limitations
+// under the License.
+
+package cluster_stats


don't use an underscore in package name

ycombinator · 2018-08-15T00:20:15Z

@ruflin There are still some TODOs in this PR for me to work on but, if you have some time, I'd like your early opinion on how I'm going about this metricset (for x-pack monitoring). Of course, I'd love your opinion on any part of the PR but the areas I'm specifically concerned about right now are:

the number of Elasticsearch API calls we have to make per Fetch() cycle - I think these are okay for now to make forward progress but longer-term we'll want to try and reduce the number of calls if possible by maybe creating "super" APIs on the ES side or something, and
the passthru fields (search for passthru in the PR code) - do you see a better way to handle these?

ruflin

Left a few minor comments / question. Definitively heading in the right direction. I think it's a good example of how the reporting should look potentially different in the cluster_stats fileset. There are things like usage we could have a separate metricset for and only report every minute for example.

ruflin · 2018-08-15T11:29:21Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+		return err
+	}
+
+	dataMS := common.MapStr(data)


Perhaps add a type assertion even though this should never fail I think.

How should I do this?

I tried changing this line to:

dataMS, ok := data.(common.MapStr)

but that gives me the compile-time error:

invalid type assertion: data.(common.MapStr) (non-interface type map[string]interface {} on left)

Ignore my comment, above is good.

ruflin · 2018-08-15T11:30:20Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+	passthruFields := []string{
+		"indices.segments.file_sizes", // object with dynamic keys
+		"nodes.versions", // array of strings
+		"nodes.os.names", // array of objects


Is it a list of names : ["a", "b"] or nested objects [{"a":"b"}, {"c":"d"}]. Just curious.

It is a list of objects, for example:

[ { "name": "Mac OS X", "count": 1 } ]

ruflin · 2018-08-15T11:31:24Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+
+	dataMS := common.MapStr(data)
+
+	passthruFields := []string{


In general it seems these are values which outside x-pack reporting should be reported by the specific metricsets and to get all names for example it would be an aggregation based on the cluster id?

ruflin · 2018-08-15T11:33:17Z

metricbeat/module/elasticsearch/elasticsearch.go

@@ -167,3 +170,39 @@ func GetNodeInfo(http *helper.HTTP, uri string, nodeID string) (*NodeInfo, error
 	}
 	return nil, fmt.Errorf("no node matched id %s", nodeID)
 }
+
+// GetLicense returns license information
+func GetLicense(http *helper.HTTP, resetURI string) (map[string]interface{}, error) {


Should we cache this value for a few minute as the license is hopefully very rarely going to change?

Good idea. Will do.

Implemented in 5abe3b0. Curious to hear your thoughts on the implementation — too complex, too simple, just right for now?

ruflin · 2018-08-15T11:33:49Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+		"timestamp":     common.Time(time.Now()),
+		"interval_ms":   m.Module().Config().Period / time.Millisecond,
+		"type":          "cluster_stats",
+		"license":       license,


Is the full license response reported here?

Yes, it is the response from the GET _xpack/license Elasticsearch API, which looks like this:

{ "license": { "status": "active", "uid": "f80c1fb5-75e9-4536-be2d-a768b02abb46", "type": "basic", "issue_date": "2018-08-06T23:47:09.619Z", "issue_date_in_millis": 1533599229619, "max_nodes": 1000, "issued_to": "elasticsearch", "issuer": "elasticsearch", "start_date_in_millis": -1 } }

ycombinator · 2018-08-15T13:52:25Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+		"type":          "cluster_stats",
+		"license":       license,
+		"version":       info.Version.Number,
+		"cluster_stats": clusterStats,


@ruflin As you can see here, the document that is indexed into monitoring contains not only clusterStats but also clusterState (an abridged version of it) and stackStats.

For clusterState and stackStats we are simply calling the corresponding Elasticsearch APIs, doing a tiny bit of massaging of the response data, and then passing the resulting structure through over here. We are definitely not doing a complete schema conversion like we are doing for clusterStats.

It makes me wonder: should we treat clusterStats special over here? Or conversely, should we do complete schema conversions for clusterState and stackStats too?

How is the internal reporting in ES happening? Is it just taking all the existing fields? If yes then we should do the same here.

Yes, it just takes all the existing fields and passes them through. Here is the relevant code:

https://github.com/elastic/elasticsearch/blob/b87f3062b77cab7888e9037b4996f2c26db12816/x-pack/plugin/monitoring/src/main/java/org/elasticsearch/xpack/monitoring/collector/cluster/ClusterStatsMonitoringDoc.java#L141-L147

Passing through cluster_stats as-is in 3b11529.

ycombinator · 2018-08-15T23:50:51Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+}
+
+// computeNodesHash computes a simple hash value that can be used to determine if the nodes listing has changed since the last report.
+func computeNodesHash(clusterState common.MapStr) (int32, error) {


@pickypg If possible I'd like you to review this function. It attempts to port over the logic implemented in https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/monitoring/src/main/java/org/elasticsearch/xpack/monitoring/collector/cluster/ClusterStatsMonitoringDoc.java#L180-L195

houndci-bot · 2018-08-16T00:53:03Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+		return false, fmt.Errorf("Routing table indices is not a map")
+	}
+
+	for name, _ := range indices {


should omit 2nd value from range; this loop is equivalent to for name := range ...

ycombinator · 2018-08-16T01:09:46Z

@ruflin This PR is (finally!) ready for review now. Thanks.

ruflin

Overall code looks good to me. Some minor changes needed.

The code has become more complex then I expected but I think it is also a very good learning ground for what we could do different with metricsets in the Elasticsearch module to not heavily rely on the cluster state.

ruflin · 2018-08-16T08:47:15Z

metricbeat/module/elasticsearch/cluster_stats/data_xpack.go

+	for _, value := range nodes {
+		nodeData, ok := value.(map[string]interface{})
+		if !ok {
+			return 0, fmt.Errorf("Node data is not a map")


Nit: In general errors which are returned and not printed should start lower case as they could be embeded in other errors. I wonder why hound did not complain. Same applies to most other errors in this PR.

ruflin · 2018-08-16T08:50:30Z

metricbeat/module/elasticsearch/elasticsearch.go

 	"github.com/elastic/beats/metricbeat/helper"
 )

 // Global clusterIdCache. Assumption is that the same node id never can belong to a different cluster id
 var clusterIDCache = map[string]string{}

+// Global cache for license information. Assumption is that license information changes infrequently
+type _licenseCache struct {


Why do you start with a _ here? I would prefer not to have _ in var names.

I don't like the _ prefix here either so I'm open to other suggestions. The reason I did this was because, just a few lines below, I want to create the actual singleton cache instance variable. I wanted that variable to have the "real" name, licenseCache, so I had to come up with some other name for the type.

ruflin · 2018-08-16T08:51:19Z

metricbeat/module/elasticsearch/elasticsearch.go

+
+var licenseCache = &_licenseCache{}
+
+func (c *_licenseCache) get() common.MapStr {


nit: I would put the private and cache related code to the bottom of the file as I would assume most people come to the file to look at the public methods.

ruflin · 2018-08-16T08:51:55Z

metricbeat/module/elasticsearch/elasticsearch.go

@@ -167,3 +204,51 @@ func GetNodeInfo(http *helper.HTTP, uri string, nodeID string) (*NodeInfo, error
 	}
 	return nil, fmt.Errorf("no node matched id %s", nodeID)
 }
+
+// GetLicense returns license information


Could you add a note here that the license is cached?

ycombinator · 2018-08-16T15:52:03Z

@ruflin This is ready for another round of review. I've addressed most of your feedback; only this thread remains unresolved: #7810 (comment)

ruflin

LGTM: Will this need to be backported to 6.x?

ycombinator · 2018-08-17T14:16:25Z

Will this need to be backported to 6.x?

Yes, just added the necessary labels and put up the backport PR here: #8000

This PR teaches the `elasticsearch/cluster_stats` metricset to query the appropriate Elasticsearch HTTP APIs and index `cluster_stats` documents into `.monitoring-es-6-mb-*` indices. These documents should be compatible in structure to `cluster_stats` documents in the current `.monitoring-es-6-*` indices indexed via the internal monitoring method. (cherry picked from commit 264e7b4)

ycombinator added in progress Pull request is currently in progress. Metricbeat Metricbeat v7.0.0-alpha1 monitoring labels Jul 30, 2018

houndci-bot reviewed Jul 30, 2018

View reviewed changes

ycombinator added 9 commits August 13, 2018 10:14

WIP: Adding xpack code for ES cluster stats metricset

5fa381c

Adding some more TODO fields

d3e952e

Elaborating on TODOs

ef2e566

Fleshing out more TODO fields

f1147a0

Fixing error message

0ab561f

Use X-Pack event mapping if flag is set

7598a50

Implementing some TODOs

ad44df3

Fleshing out some more

2ca1903

Adding comments to explain passthrus

e72f9c9

ruflin reviewed Aug 15, 2018

View reviewed changes

ycombinator commented Aug 15, 2018

View reviewed changes

ycombinator added 4 commits August 15, 2018 07:51

Implement simple license cache

5abe3b0

Fixing formatting

35f1bd5

Inject cluster_needs_tls field

94392d6

Compute nodes_hash

be31fcf

ycombinator commented Aug 15, 2018

View reviewed changes

Add apm.found field to stack_stats

38beb00

houndci-bot reviewed Aug 16, 2018

View reviewed changes

ycombinator added 2 commits August 15, 2018 17:55

Please the hound

14116d9

Fix stack_stats structure

dd37ab6

ycombinator added review and removed in progress Pull request is currently in progress. labels Aug 16, 2018

Make error messages consistent

22cf327

ruflin requested changes Aug 16, 2018

View reviewed changes

ycombinator added 4 commits August 16, 2018 05:58

Making error messages start with lowercase character

167156e

Adding note about license information being cached

bc4f1ff

Move license cache implementation to bottom of file

12fc997

Passthru cluster stats as well

3b11529

ruflin approved these changes Aug 17, 2018

View reviewed changes

ruflin merged commit 264e7b4 into elastic:master Aug 17, 2018

ycombinator added needs_backport PR is waiting to be backported to other branches. v6.5.0 labels Aug 17, 2018

ruflin mentioned this pull request Aug 17, 2018

Monitor the Elastic Stack with Metricbeat #7035

Closed

39 tasks

ycombinator removed the needs_backport PR is waiting to be backported to other branches. label Sep 25, 2018

ycombinator deleted the metricbeat/elasticsearch/cluster-stats/x-pack branch December 25, 2019 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding xpack code for ES cluster stats metricset #7810

Adding xpack code for ES cluster stats metricset #7810

ycombinator commented Jul 30, 2018

houndci-bot Jul 30, 2018

ycombinator commented Aug 15, 2018

ruflin left a comment

ruflin Aug 15, 2018

ycombinator Aug 15, 2018

ruflin Aug 16, 2018

ruflin Aug 15, 2018

ycombinator Aug 15, 2018 •

edited

Loading

ruflin Aug 16, 2018

ruflin Aug 15, 2018

ruflin Aug 15, 2018

ycombinator Aug 15, 2018

ycombinator Aug 15, 2018 •

edited

Loading

ruflin Aug 15, 2018

ycombinator Aug 15, 2018

ycombinator Aug 15, 2018

ruflin Aug 16, 2018

ycombinator Aug 16, 2018

ycombinator Aug 16, 2018

ycombinator Aug 15, 2018 •

edited

Loading

houndci-bot Aug 16, 2018

ycombinator commented Aug 16, 2018

ruflin left a comment

ruflin Aug 16, 2018

ruflin Aug 16, 2018

ycombinator Aug 16, 2018

ruflin Aug 16, 2018

ruflin Aug 16, 2018

ycombinator commented Aug 16, 2018

ruflin left a comment

ycombinator commented Aug 17, 2018 •

edited

Loading


		var licenseCache = &_licenseCache{}

		func (c *_licenseCache) get() common.MapStr {

Adding xpack code for ES cluster stats metricset #7810

Adding xpack code for ES cluster stats metricset #7810

Conversation

ycombinator commented Jul 30, 2018

Choose a reason for hiding this comment

ycombinator commented Aug 15, 2018

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator Aug 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator Aug 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator Aug 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Aug 16, 2018

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ycombinator commented Aug 16, 2018

ruflin left a comment

Choose a reason for hiding this comment

ycombinator commented Aug 17, 2018 • edited Loading

ycombinator Aug 15, 2018 •

edited

Loading

ycombinator Aug 15, 2018 •

edited

Loading

ycombinator Aug 15, 2018 •

edited

Loading

ycombinator commented Aug 17, 2018 •

edited

Loading