Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Update Ceph module to support new API #16254

Merged
merged 41 commits into from
Feb 19, 2020

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Feb 11, 2020

Opening as draft to feel all fields and starting building using CI.

This PR updates Ceph module to use restful API introduced in latest releases.

Issue: #7723

How to test this PR locally

  1. Use docker-compose up ceph to boot up Ceph Nautilus module. The port 5000 and 8003 should be exposed.
  2. Enable ceph metricbeat module.
  3. Configure module similarly to the mgr_cluster_disk_integration_test.go:
func getConfig(host, password string) map[string]interface{} {
	return map[string]interface{}{
		"module":                "ceph",
		"metricsets":            []string{"mgr_cluster_disk"},
		"hosts":                 []string{host},
		"username":              user,
		"password":              password,
		"ssl.verification_mode": "none",
	}
}

user - "demo"
password - go to https://localhost:5000/restful-list-keys.json
metricsets - include all: mgr_cluster_disk, mgr_cluster_health, mgr_osd_perf, mgr_osd_pool_stats, mgr_osd_tree, mgr_pool_disk

  1. Start metricbeat with -e -d processors and observe events flow.

@mtojek mtojek added the Metricbeat Metricbeat label Feb 11, 2020
@mtojek mtojek requested a review from a team February 11, 2020 18:56
@mtojek mtojek self-assigned this Feb 11, 2020
@mtojek mtojek added the Team:Services (Deprecated) Label for the former Integrations-Services team label Feb 11, 2020
@mtojek
Copy link
Contributor Author

mtojek commented Feb 11, 2020

jenkins, test this please

@mtojek mtojek marked this pull request as ready for review February 12, 2020 19:03
@mtojek mtojek requested a review from jsoriano February 12, 2020 19:03
@mtojek
Copy link
Contributor Author

mtojek commented Feb 12, 2020

Adding @jsoriano as there is an issue with nested objects.

I'm afraid that this might be a blocker for light module, hence considering either rewriting this or using some magic with processors.

@@ -1,11 +1,23 @@
version: '2.3'
version: '3.7'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason you need 3.7? We try to have the same version in all docker compose files because some times we use multiple of them in the same scenario, and we will do it more in the future for E2E testing.

3.X versions can be problematic for us because we make use of depends_on and extends, that are not supported in these versions focused on docker swarm. See these comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (forgot to clean this after experimenting).

"num_osds": 1,
"num_per_pool_osds": 1
},
"pools": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @jsoriano as there is an issue with nested objects.

I'm afraid that this might be a blocker for light module, hence considering either rewriting this or using some magic with processors.

Yes, this kind of arrays of objects are unsupported, this question appears with some frequency, see this thread for example that also shows this problem with the http module.

Something similar was also discussed for the Azure input, there we receive messages formatted as JSON objects containing lists of records, we wanted an event per record, and there was no way to do it using processors. It was finally supported by handling this in the underlying Kafka input (see #13965). We could consider doing something similar for the http module.

The reason processors cannot be used for that, is that they are not able to generate multiple events from a single event, what is what we would need here.

For this specific data I think that ideally we should generate:

  • An event for the disk, that contains the global stats and maybe a summarization of the pools (as number of pools).
  • An event per pool, that also contains a field with the name of the disk for correlation.

A note about nested objects. We could make the effort of supporting them, and then we could store events like this one without unexpected results, but it would be an important effort with (in my opinion) little benefits as the information stored this way would be more difficult to query, and we could hardly take advantage of existing Kibana applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adjusting the PR there is no need for nested arrays.

Content-Type: application/json
query:
wait: "1"
body: '{"prefix": "df", "format": "json"}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are collecting this information now from two metricsets, one for disks (cluster_disk), and another one for pools (pool_disk), they both make the same query (to /api/v0.1/df), but collect different data from the response.

We could keep this model and have two metricsets for the same data, this would allow a more intuitive transition from the old metricsets to the new ones. It would also allow to create by now the replacement for cluster_disk, that gets the stats for the disks and drops the pools array. And we can continue thinking on how to improve the http json metricset to support this kind of arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the response objects once again, I came to the conclusion that there is too much wiring, removing and transforming stuff around to have a nice looking metric structure. I'm ok with rewriting this to standard metric sets.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 13, 2020

Alright, I've rewritten metricsets to a standard approach. The PR is ready to review.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments, the only thing that could be important to address before merging is to decide if we could and should reuse old fields when possible. I think we should do that to ease migration and allow to reuse dashboards. It may be also important to do some modifications to the reference configuration files we are providing.

The rest are suggestions or things that could be addressed in the future.

metricbeat/module/ceph/mgr_osd_disk/mgr_osd_disk.go Outdated Show resolved Hide resolved
metricbeat/module/ceph/_meta/config.yml Outdated Show resolved Hide resolved
metricbeat/module/ceph/docker-compose.yml Outdated Show resolved Hide resolved
type: long
description: >
Used bytes of the cluster
format: bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse the same field names as we are using with old modules? This way the transition will be easier and we will be able to reuse dashboards and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now all field names are as much identical as possible. The difference is in metric set names:

- name: cluster_disk
  type: group
  description: >
    cluster_disk
  release: ga
  fields:
    - name: available.bytes
      type: long
      description: >
        Available bytes of the cluster
      format: bytes
    - name: total.bytes
      type: long
      description: >
        Total bytes of the cluster
      format: bytes
    - name: used.bytes
      type: long
      description: >
        Used bytes of the cluster
      format: bytes

and

- name: mgr_cluster_disk
  type: group
  description: >
    mgr_cluster_disk
  release: beta
  fields:
    - name: available.bytes
      type: long
      description: >
        Available bytes of the cluster
      format: bytes
    - name: total.bytes
      type: long
      description: >
        Total bytes of the cluster
      format: bytes
    - name: used.bytes
      type: long
      description: >
        Used bytes of the cluster
      format: bytes

"name": "host.example.com"
},
"ceph": {
"mgr_cluster_health": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other comment, it'd be nice if we could use the same fields as old modules for common data. Here for example it could be cluster_health.

Suggested change
"mgr_cluster_health": {
"cluster_health": {

Copy link
Contributor Author

@mtojek mtojek Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is also metricset name, hence it can't be replaced (causes "metricset name already registered" error). Other fields are equal in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the metricset doesn't need to be neccesarily in the event, and in this case I think it would be beneficial to have equivalent fields in the same places.

To do that we can report events with the metrics at the module level:

reporter.Event(mb.Event{
    ModuleFields: common.MapStr{
      "cluster_health": event,
    },
})

Or even at the root level:

reporter.Event(mb.Event{
    RootFields: common.MapStr{
      "ceph": common.MapStr{
        "cluster_health": event,
       },
    },
})

If, for some reason, it is needed to differentiate per metricset in a query or dashboard, the event.dataset field can be used.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 18, 2020

Also, rebased against master (update to python 3).

@mtojek mtojek requested review from jsoriano and a team February 18, 2020 10:51
@mtojek
Copy link
Contributor Author

mtojek commented Feb 18, 2020

jenkins, test this again please

// UpWithEnvironmentVariable adds environment variable.
func UpWithEnvironmentVariable(keyValue string) UpOption {
return func(options *UpOptions) { options.Environment = append(options.Environment, keyValue) }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed at the end, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Removed entire feature.

return compose.EnsureUp(t, "ceph",
compose.UpWithTimeout(upTimeout),
compose.UpWithEnvironmentVariable(fmt.Sprintf("CEPH_CODENAME=%s", defaultCodename)),
compose.UpWithEnvironmentVariable(fmt.Sprintf("CEPH_VERSION=%s", defaultVersion)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding the environment shouldn't be needed as we have now the defaults in the docker-compose.yml file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Removed.

"name": "host.example.com"
},
"ceph": {
"mgr_cluster_health": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the metricset doesn't need to be neccesarily in the event, and in this case I think it would be beneficial to have equivalent fields in the same places.

To do that we can report events with the metrics at the module level:

reporter.Event(mb.Event{
    ModuleFields: common.MapStr{
      "cluster_health": event,
    },
})

Or even at the root level:

reporter.Event(mb.Event{
    RootFields: common.MapStr{
      "ceph": common.MapStr{
        "cluster_health": event,
       },
    },
})

If, for some reason, it is needed to differentiate per metricset in a query or dashboard, the event.dataset field can be used.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 18, 2020

To do that we can report events with the metrics at the module level:

Didn't know it's allowed to access them there. Cool!

Just want to highlight one thing - to properly validate all fields, some fields.yml files have been also updated (e.g. mgr_pool_disk -> pool_disk), which means that the aggregated fields.all.yml contain two same records. The same story should be for docs I believe, not sure about the (rendering) outcome...

@mtojek mtojek requested review from jsoriano and a team February 18, 2020 19:18
@jsoriano
Copy link
Member

jsoriano commented Feb 18, 2020

Just want to highlight one thing - to properly validate all fields, some fields.yml files have been also updated (e.g. mgr_pool_disk -> pool_disk), which means that the aggregated fields.all.yml contain two same records. The same story should be for docs I believe, not sure about the (rendering) outcome...

Not sure if this is going to work. When we have had several metricsets using the same fields we have documented these fields in a single place, in only one of the metricsets or in the module. This avoids duplication and fields validation should still work.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 18, 2020

Just want to highlight one thing - to properly validate all fields, some fields.yml files have been also updated (e.g. mgr_pool_disk -> pool_disk), which means that the aggregated fields.all.yml contain two same records. The same story should be for docs I believe, not sure about the (rendering) outcome...

Not sure if this is going to work. When we have had several metricsets using the same fields we have documented these fields in a single place, in only one of the metricsets or in the module. This avoids duplication and fields validation should still work.

Fixed. Removed definitions from new metricsets.

@mtojek
Copy link
Contributor Author

mtojek commented Feb 19, 2020

jenkins, test this please

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions I am ok with negative responses 🙂 So in any case, it LGTM.

@mtojek mtojek merged commit 273c6a8 into elastic:master Feb 19, 2020
mtojek added a commit to mtojek/beats that referenced this pull request Feb 19, 2020
* add new ceph metricsets

* adjust system tests

(cherry picked from commit 273c6a8)
@mtojek mtojek added the v7.7.0 label Feb 19, 2020
mtojek added a commit that referenced this pull request Feb 19, 2020
… new API (#16404)

* [Metricbeat] Update Ceph module to support new API (#16254)

* add new ceph metricsets

* adjust system tests

(cherry picked from commit 273c6a8)

* Fix: version
kvch pushed a commit to kvch/beats that referenced this pull request Feb 20, 2020
* add new ceph metricsets

* adjust system tests
@mtojek mtojek added the test-plan Add this PR to be manual test plan label Mar 16, 2020
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Mar 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat Team:Services (Deprecated) Label for the former Integrations-Services team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants