-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calamari #321
base: master
Are you sure you want to change the base?
Calamari #321
Conversation
Previously the prefix on admin socket names was a config setting, had to be manually set for one and only one cluster name. In the output stats, the cluster name was not used, 'ceph' was always passed through. This change makes the collector gather stats for all clusters it sees, and include the name of the cluster in the metric names.
These stats are only present in more recent versions of Ceph (currently >=0.72, but expected to be backported). Because these stats are for cluster-wide pools rather than an individual host/service, they are reported with a global name, ceph.<cluster name>.pool.<pool id>.*
pool statistics. Optionally (off by default) report service statistics under ceph.cluster isntead of servers.<hostname>
This fixes some of the Ceph tests, and disables the tests that use mocking. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
- Removing use of 2.7-only subprocess.check_call - Using 'perf schema' for perf counter output
I don't think the filter(None,...) idiom is necessary here?... Signed-off-by: Dan Mick <dan.mick@inktank.com>
The goal here is to have failures to talk to one service not prevent us gathering stats from other services. The symptom was that when there was one stale ceph socket file all ceph stats stopped being collected.
For larger systems, the stat collection can be overly verbose and cause excessive load on low powered graphite servers. We introduce 'osd_stats_enabled' and 'long_running_detail' to provide a flexible way to tone down the verbosity.
Signed-off-by: John Spray <john.spray@inktank.com>
Longrunning avg and _bytes statistics were losing the GlobalName type and getting emitted as stats within a server's path. Introduce LocalName and require explicit local/global type for all stats names to make issues like this more obvious.
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
|
||
|
||
# Metric name/path separator | ||
_PATH_SEP = "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to make this a global, but there are a number of collectors that just simply use '.'
We should either convert everything to use a common variable, or use '.' throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaingaurav I am happy to change this and I'm not quite sure I understand your concern here. _PATH_SEP is "private" to this module and currently set to the same token other collectors use. It's only really a problem if someone goes and changes it. Are you suggesting that I pull this into a separate PR and we can discuss it later? I don't really have strong feelings one way or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my main concern is that this is private at all. Technically the way a metric is separated might mot be the role of the collector, but in fact the handler. It's just odd that a specific collector is choosing to define this as opposed to it being a global in the base class.
return message % (self.command, self.socket_name) | ||
|
||
|
||
class MonError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add brief documentation for the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will. Happy to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I thought we were going to try and separate out the collectors into different packages? |
This technically works. :) See: https://github.com/python-diamond/diamond-redis/blob/master/setup.py#L23-L27 When I last did this, it fully worked. This was just never documented or "released" as a feature. |
Signed-off-by: Gregory Meno <gmeno@redhat.com>
@josegonzalez @mattrobenolt I would be happy to separate the CephCollector into it's own package. Is this something I should consider doing in this pull request? |
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
There is still quite a bit of work to do before we can break up the collectors into separate packages, so I consider this pull request still valid. However this PR demonstrates the motivation for the separation, i.e. for many collectors, the diamond maintainers are not in a great position to comment. I'll try to take a look more next week as I plough through the outstanding PRs. |
@jaingaurav Thank you for setting my expectations. I appreciate your effort to review this. |
@jaingaurav @josegonzalez @mattrobenolt I'd like to pick this discussion back up. What can I do to get some or all of this upstream? Should I be working to package the collector? Is that something gets into distros that ship diamond? Thank you for your time. |
Sorry, I've been busy with quite a few other OSS projects. I'll take a look at this in a bit. |
@GregMeno can you squash your commits a bit? |
Hey guys, hope this doesn't come over as pushy (not the intention!) 😄 Any chance this PR can get some lovin'? It seems to be blocking several things downstream which in turn is causing users (of Calamari) headaches who're trying to update it to upgrade Ceph to Infernalis or Jewel+ edit: disclaimer, I'm a user of Calamari... not a dev. |
I will take a look at carrying these commits into a new PR that has them squashed down |
@ChristinaMeno Can you rebase this on master? Then I will merge it. |
This is a fork of the CephCollector that we've used in calamari since diamond3.4
I've rebased, cleaned up, fixed tests, and tested the package with real data
I look forward to discussing what needs to change in order to get this back upstream.
The main change here is to optionally roll up stats about a ceph cluster under one reporting structure instead of spreading it out under all the server nodes that comprise the cluster.