-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat] Update Ceph module to support new API #16254
Conversation
jenkins, test this please |
Adding @jsoriano as there is an issue with nested objects. I'm afraid that this might be a blocker for light module, hence considering either rewriting this or using some magic with processors. |
@@ -1,11 +1,23 @@ | |||
version: '2.3' | |||
version: '3.7' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason you need 3.7? We try to have the same version in all docker compose files because some times we use multiple of them in the same scenario, and we will do it more in the future for E2E testing.
3.X versions can be problematic for us because we make use of depends_on
and extends
, that are not supported in these versions focused on docker swarm. See these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (forgot to clean this after experimenting).
"num_osds": 1, | ||
"num_per_pool_osds": 1 | ||
}, | ||
"pools": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding @jsoriano as there is an issue with nested objects.
I'm afraid that this might be a blocker for light module, hence considering either rewriting this or using some magic with processors.
Yes, this kind of arrays of objects are unsupported, this question appears with some frequency, see this thread for example that also shows this problem with the http module.
Something similar was also discussed for the Azure input, there we receive messages formatted as JSON objects containing lists of records, we wanted an event per record, and there was no way to do it using processors. It was finally supported by handling this in the underlying Kafka input (see #13965). We could consider doing something similar for the http module.
The reason processors cannot be used for that, is that they are not able to generate multiple events from a single event, what is what we would need here.
For this specific data I think that ideally we should generate:
- An event for the
disk
, that contains the global stats and maybe a summarization of the pools (as number of pools). - An event per
pool
, that also contains a field with the name of thedisk
for correlation.
A note about nested objects. We could make the effort of supporting them, and then we could store events like this one without unexpected results, but it would be an important effort with (in my opinion) little benefits as the information stored this way would be more difficult to query, and we could hardly take advantage of existing Kibana applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adjusting the PR there is no need for nested arrays.
Content-Type: application/json | ||
query: | ||
wait: "1" | ||
body: '{"prefix": "df", "format": "json"}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are collecting this information now from two metricsets, one for disks (cluster_disk
), and another one for pools (pool_disk
), they both make the same query (to /api/v0.1/df
), but collect different data from the response.
We could keep this model and have two metricsets for the same data, this would allow a more intuitive transition from the old metricsets to the new ones. It would also allow to create by now the replacement for cluster_disk
, that gets the stats for the disks and drops the pools
array. And we can continue thinking on how to improve the http json metricset to support this kind of arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the response objects once again, I came to the conclusion that there is too much wiring, removing and transforming stuff around to have a nice looking metric structure. I'm ok with rewriting this to standard metric sets.
Alright, I've rewritten metricsets to a standard approach. The PR is ready to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some comments, the only thing that could be important to address before merging is to decide if we could and should reuse old fields when possible. I think we should do that to ease migration and allow to reuse dashboards. It may be also important to do some modifications to the reference configuration files we are providing.
The rest are suggestions or things that could be addressed in the future.
type: long | ||
description: > | ||
Used bytes of the cluster | ||
format: bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse the same field names as we are using with old modules? This way the transition will be easier and we will be able to reuse dashboards and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now all field names are as much identical as possible. The difference is in metric set names:
- name: cluster_disk
type: group
description: >
cluster_disk
release: ga
fields:
- name: available.bytes
type: long
description: >
Available bytes of the cluster
format: bytes
- name: total.bytes
type: long
description: >
Total bytes of the cluster
format: bytes
- name: used.bytes
type: long
description: >
Used bytes of the cluster
format: bytes
and
- name: mgr_cluster_disk
type: group
description: >
mgr_cluster_disk
release: beta
fields:
- name: available.bytes
type: long
description: >
Available bytes of the cluster
format: bytes
- name: total.bytes
type: long
description: >
Total bytes of the cluster
format: bytes
- name: used.bytes
type: long
description: >
Used bytes of the cluster
format: bytes
metricbeat/module/ceph/mgr_cluster_disk/mgr_cluster_disk_test.go
Outdated
Show resolved
Hide resolved
"name": "host.example.com" | ||
}, | ||
"ceph": { | ||
"mgr_cluster_health": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in other comment, it'd be nice if we could use the same fields as old modules for common data. Here for example it could be cluster_health
.
"mgr_cluster_health": { | |
"cluster_health": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is also metricset name, hence it can't be replaced (causes "metricset name already registered" error). Other fields are equal in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the metricset doesn't need to be neccesarily in the event, and in this case I think it would be beneficial to have equivalent fields in the same places.
To do that we can report events with the metrics at the module level:
reporter.Event(mb.Event{
ModuleFields: common.MapStr{
"cluster_health": event,
},
})
Or even at the root level:
reporter.Event(mb.Event{
RootFields: common.MapStr{
"ceph": common.MapStr{
"cluster_health": event,
},
},
})
If, for some reason, it is needed to differentiate per metricset in a query or dashboard, the event.dataset
field can be used.
metricbeat/module/ceph/mgr_cluster_health/mgr_cluster_health.go
Outdated
Show resolved
Hide resolved
metricbeat/module/ceph/mgr_cluster_health/mgr_cluster_health_integration_test.go
Outdated
Show resolved
Hide resolved
Also, rebased against master (update to python 3). |
jenkins, test this again please |
libbeat/tests/compose/project.go
Outdated
// UpWithEnvironmentVariable adds environment variable. | ||
func UpWithEnvironmentVariable(keyValue string) UpOption { | ||
return func(options *UpOptions) { options.Environment = append(options.Environment, keyValue) } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be needed at the end, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Removed entire feature.
return compose.EnsureUp(t, "ceph", | ||
compose.UpWithTimeout(upTimeout), | ||
compose.UpWithEnvironmentVariable(fmt.Sprintf("CEPH_CODENAME=%s", defaultCodename)), | ||
compose.UpWithEnvironmentVariable(fmt.Sprintf("CEPH_VERSION=%s", defaultVersion)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding the environment shouldn't be needed as we have now the defaults in the docker-compose.yml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Removed.
"name": "host.example.com" | ||
}, | ||
"ceph": { | ||
"mgr_cluster_health": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the metricset doesn't need to be neccesarily in the event, and in this case I think it would be beneficial to have equivalent fields in the same places.
To do that we can report events with the metrics at the module level:
reporter.Event(mb.Event{
ModuleFields: common.MapStr{
"cluster_health": event,
},
})
Or even at the root level:
reporter.Event(mb.Event{
RootFields: common.MapStr{
"ceph": common.MapStr{
"cluster_health": event,
},
},
})
If, for some reason, it is needed to differentiate per metricset in a query or dashboard, the event.dataset
field can be used.
Didn't know it's allowed to access them there. Cool! Just want to highlight one thing - to properly validate all fields, some fields.yml files have been also updated (e.g. mgr_pool_disk -> pool_disk), which means that the aggregated fields.all.yml contain two same records. The same story should be for docs I believe, not sure about the (rendering) outcome... |
Not sure if this is going to work. When we have had several metricsets using the same fields we have documented these fields in a single place, in only one of the metricsets or in the module. This avoids duplication and fields validation should still work. |
Fixed. Removed definitions from new metricsets. |
jenkins, test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions I am ok with negative responses 🙂 So in any case, it LGTM.
* add new ceph metricsets * adjust system tests (cherry picked from commit 273c6a8)
* add new ceph metricsets * adjust system tests
Opening as draft to feel all fields and starting building using CI.
This PR updates Ceph module to use restful API introduced in latest releases.
Issue: #7723
How to test this PR locally
docker-compose up ceph
to boot up Ceph Nautilus module. The port 5000 and 8003 should be exposed.ceph
metricbeat module.user - "demo"
password - go to https://localhost:5000/restful-list-keys.json
metricsets - include all: mgr_cluster_disk, mgr_cluster_health, mgr_osd_perf, mgr_osd_pool_stats, mgr_osd_tree, mgr_pool_disk
metricbeat
with-e -d processors
and observe events flow.