Skip to content

Adding xpack code for ES cluster stats metricset #7810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Aug 17, 2018
Merged

Adding xpack code for ES cluster stats metricset #7810

merged 21 commits into from
Aug 17, 2018

Conversation

ycombinator
Copy link
Contributor

This PR teaches the elasticsearch/cluster_stats metricset to query the appropriate Elasticsearch HTTP APIs and index cluster_stats documents into .monitoring-es-6-mb-* indices. These documents should be compatible in structure to cluster_stats documents in the current .monitoring-es-6-* indices indexed via the internal monitoring method.

@ycombinator ycombinator added in progress Pull request is currently in progress. Metricbeat Metricbeat v7.0.0-alpha1 monitoring labels Jul 30, 2018
// specific language governing permissions and limitations
// under the License.

package cluster_stats

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use an underscore in package name

@ycombinator
Copy link
Contributor Author

@ruflin There are still some TODOs in this PR for me to work on but, if you have some time, I'd like your early opinion on how I'm going about this metricset (for x-pack monitoring). Of course, I'd love your opinion on any part of the PR but the areas I'm specifically concerned about right now are:

  • the number of Elasticsearch API calls we have to make per Fetch() cycle - I think these are okay for now to make forward progress but longer-term we'll want to try and reduce the number of calls if possible by maybe creating "super" APIs on the ES side or something, and
  • the passthru fields (search for passthru in the PR code) - do you see a better way to handle these?

Copy link
Collaborator

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor comments / question. Definitively heading in the right direction. I think it's a good example of how the reporting should look potentially different in the cluster_stats fileset. There are things like usage we could have a separate metricset for and only report every minute for example.

return err
}

dataMS := common.MapStr(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a type assertion even though this should never fail I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I do this?

I tried changing this line to:

dataMS, ok := data.(common.MapStr)

but that gives me the compile-time error:

invalid type assertion: data.(common.MapStr) (non-interface type map[string]interface {} on left)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my comment, above is good.

passthruFields := []string{
"indices.segments.file_sizes", // object with dynamic keys
"nodes.versions", // array of strings
"nodes.os.names", // array of objects
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a list of names : ["a", "b"] or nested objects [{"a":"b"}, {"c":"d"}]. Just curious.

Copy link
Contributor Author

@ycombinator ycombinator Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a list of objects, for example:

[
  {
    "name": "Mac OS X",
    "count": 1
  }
]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-(


dataMS := common.MapStr(data)

passthruFields := []string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it seems these are values which outside x-pack reporting should be reported by the specific metricsets and to get all names for example it would be an aggregation based on the cluster id?

@@ -167,3 +170,39 @@ func GetNodeInfo(http *helper.HTTP, uri string, nodeID string) (*NodeInfo, error
}
return nil, fmt.Errorf("no node matched id %s", nodeID)
}

// GetLicense returns license information
func GetLicense(http *helper.HTTP, resetURI string) (map[string]interface{}, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we cache this value for a few minute as the license is hopefully very rarely going to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Will do.

Copy link
Contributor Author

@ycombinator ycombinator Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 5abe3b0. Curious to hear your thoughts on the implementation — too complex, too simple, just right for now?

"timestamp": common.Time(time.Now()),
"interval_ms": m.Module().Config().Period / time.Millisecond,
"type": "cluster_stats",
"license": license,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the full license response reported here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the response from the GET _xpack/license Elasticsearch API, which looks like this:

{
  "license": {
    "status": "active",
    "uid": "f80c1fb5-75e9-4536-be2d-a768b02abb46",
    "type": "basic",
    "issue_date": "2018-08-06T23:47:09.619Z",
    "issue_date_in_millis": 1533599229619,
    "max_nodes": 1000,
    "issued_to": "elasticsearch",
    "issuer": "elasticsearch",
    "start_date_in_millis": -1
  }
}

"type": "cluster_stats",
"license": license,
"version": info.Version.Number,
"cluster_stats": clusterStats,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin As you can see here, the document that is indexed into monitoring contains not only clusterStats but also clusterState (an abridged version of it) and stackStats.

For clusterState and stackStats we are simply calling the corresponding Elasticsearch APIs, doing a tiny bit of massaging of the response data, and then passing the resulting structure through over here. We are definitely not doing a complete schema conversion like we are doing for clusterStats.

It makes me wonder: should we treat clusterStats special over here? Or conversely, should we do complete schema conversions for clusterState and stackStats too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the internal reporting in ES happening? Is it just taking all the existing fields? If yes then we should do the same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing through cluster_stats as-is in 3b11529.

}

// computeNodesHash computes a simple hash value that can be used to determine if the nodes listing has changed since the last report.
func computeNodesHash(clusterState common.MapStr) (int32, error) {
Copy link
Contributor Author

@ycombinator ycombinator Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false, fmt.Errorf("Routing table indices is not a map")
}

for name, _ := range indices {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should omit 2nd value from range; this loop is equivalent to for name := range ...

@ycombinator ycombinator added review and removed in progress Pull request is currently in progress. labels Aug 16, 2018
@ycombinator
Copy link
Contributor Author

@ruflin This PR is (finally!) ready for review now. Thanks.

Copy link
Collaborator

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code looks good to me. Some minor changes needed.

The code has become more complex then I expected but I think it is also a very good learning ground for what we could do different with metricsets in the Elasticsearch module to not heavily rely on the cluster state.

for _, value := range nodes {
nodeData, ok := value.(map[string]interface{})
if !ok {
return 0, fmt.Errorf("Node data is not a map")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: In general errors which are returned and not printed should start lower case as they could be embeded in other errors. I wonder why hound did not complain. Same applies to most other errors in this PR.

"github.com/elastic/beats/metricbeat/helper"
)

// Global clusterIdCache. Assumption is that the same node id never can belong to a different cluster id
var clusterIDCache = map[string]string{}

// Global cache for license information. Assumption is that license information changes infrequently
type _licenseCache struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you start with a _ here? I would prefer not to have _ in var names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the _ prefix here either so I'm open to other suggestions. The reason I did this was because, just a few lines below, I want to create the actual singleton cache instance variable. I wanted that variable to have the "real" name, licenseCache, so I had to come up with some other name for the type.


var licenseCache = &_licenseCache{}

func (c *_licenseCache) get() common.MapStr {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would put the private and cache related code to the bottom of the file as I would assume most people come to the file to look at the public methods.

@@ -167,3 +204,51 @@ func GetNodeInfo(http *helper.HTTP, uri string, nodeID string) (*NodeInfo, error
}
return nil, fmt.Errorf("no node matched id %s", nodeID)
}

// GetLicense returns license information
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note here that the license is cached?

@ycombinator
Copy link
Contributor Author

@ruflin This is ready for another round of review. I've addressed most of your feedback; only this thread remains unresolved: #7810 (comment)

Copy link
Collaborator

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: Will this need to be backported to 6.x?

@ruflin ruflin merged commit 264e7b4 into elastic:master Aug 17, 2018
@ycombinator ycombinator added needs_backport PR is waiting to be backported to other branches. v6.5.0 labels Aug 17, 2018
@ycombinator
Copy link
Contributor Author

ycombinator commented Aug 17, 2018

Will this need to be backported to 6.x?

Yes, just added the necessary labels and put up the backport PR here: #8000

jsoriano pushed a commit that referenced this pull request Aug 17, 2018
This PR teaches the `elasticsearch/cluster_stats` metricset to query the appropriate Elasticsearch HTTP APIs and index `cluster_stats` documents into `.monitoring-es-6-mb-*` indices. These documents should be compatible in structure to `cluster_stats` documents in the current `.monitoring-es-6-*` indices indexed via the internal monitoring method.

(cherry picked from commit 264e7b4)
@ycombinator ycombinator removed the needs_backport PR is waiting to be backported to other branches. label Sep 25, 2018
@ycombinator ycombinator deleted the metricbeat/elasticsearch/cluster-stats/x-pack branch December 25, 2019 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants