-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RPC stats for the client #90
Conversation
sources/procfs.go
Outdated
@@ -320,6 +325,14 @@ func (s *lustreProcfsSource) generateClientMetricTemplates() error { | |||
{"stats", "inode_permission_total", inodePermissionHelp, s.counterMetric}, | |||
{"xattr_cache", "xattr_cache_enabled", "Returns '1' if extended attribute cache is enabled", s.gaugeMetric}, | |||
}, | |||
"mdc/*": { | |||
{"rpc_stats", "rpc_in_flight", rpcsInFlightHelp, s.gaugeMetric}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So...I see that this metric is named differently than the 'rpcs_in_flight' value in osc/*. Maybe we just need to name these mdc_rpcs_in_flight and osc_rpcs_in_flight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was actually just a typo as I meant to put rpcs_in_flight
, but I can certainly do osc*
and mdc*
if you desire!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, rpcs_in_flight would cause metric conflicts, wouldn't it? Or can mdc and osc not be populated at the same time? What do the labels end up looking like for those metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You are right, we would need to prepend the values with the location. The labels would be identical since they are both on the same node and the nodeType
will always be client
for these two. Also, the metrics can and likely always will be populated at the same time, so we definitely need to change them.
9c8dd0a
to
fae3d95
Compare
@knweiss @mjtrangoni Do either of you have strong opinions about the metric names for this PR? We're trying to decided if the rpcs_in_flight value should have the osc/mdc text prepended to it, or if we should try to add a new label to the rpcs_in_flight metric for what type of client rpc the value represents. |
@roclark I'll merge this sometime after 1 PM if I haven't heard anyone say that I shouldn't! |
Hi @joehandzik, |
@mjtrangoni Thanks, I figured that would be the answer but wanted to confirm before sending Robert down that path. @roclark Rename the metrics back to what you originally had for rpcs_in_flight, and add some sort of label (node/type/???). You should be able to reuse the extra value fields I added for my converged stats PR. |
On it! Thanks @mjtrangoni and @joehandzik! |
fae3d95
to
82f8c6e
Compare
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
82f8c6e
to
8328f98
Compare
Done! Though I did notice the |
LGTM! We can wait until Monday if necessary to get full buy-in from @mjtrangoni |
@joehandzik Sorry, for not answering earlier. I'm busy and didn't have time to test or review these PR yet. It doesn't help that it's hard to review the patches without access to a real Lustre system (i.e. at home). Did you ever think about a test suite with copies of proc files (one of each) from a real Lustre OSS, MDS, and client respectively? I.e. a test suite that allows simulating the lustre_exporter output for each of these machine types? This would make before/after comparisons of PRs much easier and faster. Also, it would be great help for end-to-end testing and benchmarking/performance tuning. I think @mjtrangoni's issue #93 goes into the same direction. (E.g. see node_exporter's end-to-end-test.sh with its reference output e22-output.txt. However, please also notice prometheus/node_exporter#478). There's also the remaining problem of negative tests that must fail...) |
LGTM, merging. |
This PR closes #69.
Signed-Off-By: Robert Clark robert.d.clark@hpe.com