Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calamari #321

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open

Calamari #321

wants to merge 26 commits into from

Conversation

ChristinaMeno
Copy link

This is a fork of the CephCollector that we've used in calamari since diamond3.4
I've rebased, cleaned up, fixed tests, and tested the package with real data
I look forward to discussing what needs to change in order to get this back upstream.

The main change here is to optionally roll up stats about a ceph cluster under one reporting structure instead of spreading it out under all the server nodes that comprise the cluster.

John Spray and others added 23 commits October 19, 2015 18:59
Previously the prefix on admin socket names
was a config setting, had to be manually set
for one and only one cluster name.  In the
output stats, the cluster name was not used,
'ceph' was always passed through.

This change makes the collector gather stats
for all clusters it sees, and include the
name of the cluster in the metric names.
These stats are only present in more recent versions
of Ceph (currently >=0.72, but expected to be backported).

Because these stats are for cluster-wide pools rather than
an individual host/service, they are reported with a global
name, ceph.<cluster name>.pool.<pool id>.*
pool statistics.

Optionally (off by default) report service
statistics under ceph.cluster isntead
of servers.<hostname>
This fixes some of the Ceph tests, and disables the tests that use
mocking.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
- Removing use of 2.7-only subprocess.check_call
- Using 'perf schema' for perf counter output
I don't think the filter(None,...) idiom is necessary here?...

Signed-off-by: Dan Mick <dan.mick@inktank.com>
The goal here is to have failures to talk to one service
not prevent us gathering stats from other services.  The
symptom was that when there was one stale ceph socket file
all ceph stats stopped being collected.
For larger systems, the stat collection can be
overly verbose and cause excessive load on low
powered graphite servers.  We introduce 'osd_stats_enabled'
and 'long_running_detail' to provide a flexible way
to tone down the verbosity.
Signed-off-by: John Spray <john.spray@inktank.com>
Longrunning avg and _bytes statistics were losing
the GlobalName type and getting emitted as stats within
a server's path.  Introduce LocalName and require explicit
local/global type for all stats names to make issues like
this more obvious.
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>


# Metric name/path separator
_PATH_SEP = "."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to make this a global, but there are a number of collectors that just simply use '.'

We should either convert everything to use a common variable, or use '.' throughout

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaingaurav I am happy to change this and I'm not quite sure I understand your concern here. _PATH_SEP is "private" to this module and currently set to the same token other collectors use. It's only really a problem if someone goes and changes it. Are you suggesting that I pull this into a separate PR and we can discuss it later? I don't really have strong feelings one way or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my main concern is that this is private at all. Technically the way a metric is separated might mot be the role of the collector, but in fact the handler. It's just odd that a specific collector is choosing to define this as opposed to it being a global in the base class.

return message % (self.command, self.socket_name)


class MonError(Exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add brief documentation for the error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will. Happy to

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@josegonzalez
Copy link
Member

I thought we were going to try and separate out the collectors into different packages?

@mattrobenolt
Copy link
Member

I thought we were going to try and separate out the collectors into different packages?

This technically works. :)

See: https://github.com/python-diamond/diamond-redis/blob/master/setup.py#L23-L27

When I last did this, it fully worked. This was just never documented or "released" as a feature.

Signed-off-by: Gregory Meno <gmeno@redhat.com>
@ChristinaMeno
Copy link
Author

@josegonzalez @mattrobenolt I would be happy to separate the CephCollector into it's own package. Is this something I should consider doing in this pull request?

Signed-off-by: Gregory Meno <gmeno@redhat.com>
Signed-off-by: Gregory Meno <gmeno@redhat.com>
@jaingaurav
Copy link
Member

There is still quite a bit of work to do before we can break up the collectors into separate packages, so I consider this pull request still valid. However this PR demonstrates the motivation for the separation, i.e. for many collectors, the diamond maintainers are not in a great position to comment.

I'll try to take a look more next week as I plough through the outstanding PRs.

@ChristinaMeno
Copy link
Author

@jaingaurav Thank you for setting my expectations. I appreciate your effort to review this.

@ChristinaMeno
Copy link
Author

@jaingaurav @josegonzalez @mattrobenolt I'd like to pick this discussion back up. What can I do to get some or all of this upstream? Should I be working to package the collector? Is that something gets into distros that ship diamond? Thank you for your time.

@josegonzalez
Copy link
Member

Sorry, I've been busy with quite a few other OSS projects. I'll take a look at this in a bit.

@shortdudey123
Copy link
Member

shortdudey123 commented Feb 18, 2017

@GregMeno can you squash your commits a bit?
@josegonzalez any thoughts? this looks fine to me (other than squashing)

@mvdkleijn
Copy link

mvdkleijn commented Apr 13, 2017

Hey guys, hope this doesn't come over as pushy (not the intention!) 😄

Any chance this PR can get some lovin'? It seems to be blocking several things downstream which in turn is causing users (of Calamari) headaches who're trying to update it to upgrade Ceph to Infernalis or Jewel+

edit: disclaimer, I'm a user of Calamari... not a dev.

@shortdudey123 shortdudey123 self-assigned this Apr 13, 2017
@shortdudey123
Copy link
Member

I will take a look at carrying these commits into a new PR that has them squashed down

@shortdudey123
Copy link
Member

@ChristinaMeno Can you rebase this on master? Then I will merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants