-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Adding xpack code for ES cluster stats metricset #7810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding xpack code for ES cluster stats metricset #7810
Conversation
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
package cluster_stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use an underscore in package name
@ruflin There are still some TODOs in this PR for me to work on but, if you have some time, I'd like your early opinion on how I'm going about this metricset (for x-pack monitoring). Of course, I'd love your opinion on any part of the PR but the areas I'm specifically concerned about right now are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor comments / question. Definitively heading in the right direction. I think it's a good example of how the reporting should look potentially different in the cluster_stats fileset. There are things like usage we could have a separate metricset for and only report every minute for example.
return err | ||
} | ||
|
||
dataMS := common.MapStr(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a type assertion even though this should never fail I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should I do this?
I tried changing this line to:
dataMS, ok := data.(common.MapStr)
but that gives me the compile-time error:
invalid type assertion: data.(common.MapStr) (non-interface type map[string]interface {} on left)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore my comment, above is good.
passthruFields := []string{ | ||
"indices.segments.file_sizes", // object with dynamic keys | ||
"nodes.versions", // array of strings | ||
"nodes.os.names", // array of objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a list of names : ["a", "b"]
or nested objects [{"a":"b"}, {"c":"d"}]
. Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a list of objects, for example:
[
{
"name": "Mac OS X",
"count": 1
}
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-(
|
||
dataMS := common.MapStr(data) | ||
|
||
passthruFields := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it seems these are values which outside x-pack reporting should be reported by the specific metricsets and to get all names for example it would be an aggregation based on the cluster id?
@@ -167,3 +170,39 @@ func GetNodeInfo(http *helper.HTTP, uri string, nodeID string) (*NodeInfo, error | |||
} | |||
return nil, fmt.Errorf("no node matched id %s", nodeID) | |||
} | |||
|
|||
// GetLicense returns license information | |||
func GetLicense(http *helper.HTTP, resetURI string) (map[string]interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we cache this value for a few minute as the license is hopefully very rarely going to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 5abe3b0. Curious to hear your thoughts on the implementation — too complex, too simple, just right for now?
"timestamp": common.Time(time.Now()), | ||
"interval_ms": m.Module().Config().Period / time.Millisecond, | ||
"type": "cluster_stats", | ||
"license": license, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the full license response reported here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the response from the GET _xpack/license
Elasticsearch API, which looks like this:
{
"license": {
"status": "active",
"uid": "f80c1fb5-75e9-4536-be2d-a768b02abb46",
"type": "basic",
"issue_date": "2018-08-06T23:47:09.619Z",
"issue_date_in_millis": 1533599229619,
"max_nodes": 1000,
"issued_to": "elasticsearch",
"issuer": "elasticsearch",
"start_date_in_millis": -1
}
}
"type": "cluster_stats", | ||
"license": license, | ||
"version": info.Version.Number, | ||
"cluster_stats": clusterStats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin As you can see here, the document that is indexed into monitoring contains not only clusterStats
but also clusterState
(an abridged version of it) and stackStats
.
For clusterState
and stackStats
we are simply calling the corresponding Elasticsearch APIs, doing a tiny bit of massaging of the response data, and then passing the resulting structure through over here. We are definitely not doing a complete schema conversion like we are doing for clusterStats
.
It makes me wonder: should we treat clusterStats
special over here? Or conversely, should we do complete schema conversions for clusterState
and stackStats
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the internal reporting in ES happening? Is it just taking all the existing fields? If yes then we should do the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it just takes all the existing fields and passes them through. Here is the relevant code:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing through cluster_stats
as-is in 3b11529.
} | ||
|
||
// computeNodesHash computes a simple hash value that can be used to determine if the nodes listing has changed since the last report. | ||
func computeNodesHash(clusterState common.MapStr) (int32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pickypg If possible I'd like you to review this function. It attempts to port over the logic implemented in https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/monitoring/src/main/java/org/elasticsearch/xpack/monitoring/collector/cluster/ClusterStatsMonitoringDoc.java#L180-L195
return false, fmt.Errorf("Routing table indices is not a map") | ||
} | ||
|
||
for name, _ := range indices { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should omit 2nd value from range; this loop is equivalent to for name := range ...
@ruflin This PR is (finally!) ready for review now. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall code looks good to me. Some minor changes needed.
The code has become more complex then I expected but I think it is also a very good learning ground for what we could do different with metricsets in the Elasticsearch module to not heavily rely on the cluster state.
for _, value := range nodes { | ||
nodeData, ok := value.(map[string]interface{}) | ||
if !ok { | ||
return 0, fmt.Errorf("Node data is not a map") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: In general errors which are returned and not printed should start lower case as they could be embeded in other errors. I wonder why hound did not complain. Same applies to most other errors in this PR.
"github.com/elastic/beats/metricbeat/helper" | ||
) | ||
|
||
// Global clusterIdCache. Assumption is that the same node id never can belong to a different cluster id | ||
var clusterIDCache = map[string]string{} | ||
|
||
// Global cache for license information. Assumption is that license information changes infrequently | ||
type _licenseCache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you start with a _ here? I would prefer not to have _ in var names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the _
prefix here either so I'm open to other suggestions. The reason I did this was because, just a few lines below, I want to create the actual singleton cache instance variable. I wanted that variable to have the "real" name, licenseCache
, so I had to come up with some other name for the type.
|
||
var licenseCache = &_licenseCache{} | ||
|
||
func (c *_licenseCache) get() common.MapStr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would put the private and cache related code to the bottom of the file as I would assume most people come to the file to look at the public methods.
@@ -167,3 +204,51 @@ func GetNodeInfo(http *helper.HTTP, uri string, nodeID string) (*NodeInfo, error | |||
} | |||
return nil, fmt.Errorf("no node matched id %s", nodeID) | |||
} | |||
|
|||
// GetLicense returns license information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a note here that the license is cached?
@ruflin This is ready for another round of review. I've addressed most of your feedback; only this thread remains unresolved: #7810 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM: Will this need to be backported to 6.x?
Yes, just added the necessary labels and put up the backport PR here: #8000 |
This PR teaches the `elasticsearch/cluster_stats` metricset to query the appropriate Elasticsearch HTTP APIs and index `cluster_stats` documents into `.monitoring-es-6-mb-*` indices. These documents should be compatible in structure to `cluster_stats` documents in the current `.monitoring-es-6-*` indices indexed via the internal monitoring method. (cherry picked from commit 264e7b4)
This PR teaches the
elasticsearch/cluster_stats
metricset to query the appropriate Elasticsearch HTTP APIs and indexcluster_stats
documents into.monitoring-es-6-mb-*
indices. These documents should be compatible in structure tocluster_stats
documents in the current.monitoring-es-6-*
indices indexed via the internal monitoring method.