-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/vcenter] Add vCenter Host metrics (dropped packet rate + capacity) #33646
[receiver/vcenter] Add vCenter Host metrics (dropped packet rate + capacity) #33646
Conversation
f137150
to
4b6d86a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small things. Also needs the scraper test updated with new results.
// disk metrics | ||
"disk.totalReadLatency.average", | ||
"disk.totalWriteLatency.average", | ||
"disk.maxTotalLatency.latest", | ||
"disk.read.average", | ||
"disk.write.average", | ||
// cpu metrics | ||
"cpu.reservedCapacity.average", | ||
"cpu.totalCapacity.average", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how different this number ends up (if at all) from numCpuCores
* cpuMhz
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, so I've been trying to figure this out. From what I understand totalCapacity
would be the more accurate metric to measure here. In most cases the numbers should be very close together (usually a bit lower). This is because numCpuCores might get artificially inflated by any logical cores caused by hyper threading. Beyond that the numbers should be very similar if not the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking with Stefan a bit more on this topic, we uncovered that what we think of as totalCapacity as shown on these performance metrics is a bit different than what someone would normally think (numCpuCores * cpuMhz) as shown on the vSphere client.
The difference being the calculated total capacity using the cpuCores and cpuMhz would be talking about how much capacity the host has for it
The performance metric totalCapacity is referring to the Total reservation capacity
and the performance metric reservedCapacity is referring "used reservation" of the total reservation capacity.
In the end, we think both the performance metric and the quickstat metric are both very useful depending on the user use case and setup, so we will be including them both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good to move from DRAFT. I still would like to know the difference between the total CPU performance metric and one you could calculate though.
receiver/vcenterreceiver/client.go
Outdated
if err == nil { | ||
return vApps, nil | ||
// ResourcePoolInventoryListObjects returns the ResourcePools (with populated InventoryLists) of the vSphere SDK | ||
func (vc *vcenterClient) ResourcePoolInventoryListObjects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'd say this PR is blocked until this PR is merged in and you can rebase (because this function for example isn't actually quite right ATM)
if_enabled_not_set: "this metric will be enabled by default starting in release v0.105.0" | ||
vcenter.host.cpu.reserve.capacity: | ||
enabled: false | ||
description: Total CPU capacity that is available for reserve or reserved by virtual machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the description and name of this one has some room for improvement (still sounds a bit confusing).
What do you think about something like vcenter.host.cpu.reserved
for the name? I know the performance metric names have "capacity" in them, but the UI equivalents do not. "Capacity" seems to only make sense for "total" and not "used" to me.
Then the description could then be something like The CPU of the host reserved for use by virtual machines.
Attribute could be changed to cpu_reservation_type
with values of total
and used
. Description could be The type of CPU reservation for the host
.
These are all just suggestions, but what do you think @BominRahmani ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually teetering back and forth between vcenter.host.cpu.reserved
and vcenter.host.cpu.reserve.capacity
I only decided the latter since it felt a bit more verbose. I was also thinking about switching the cpu_reservation_type
to total
and used
to match the UI a bit more originally, So I am totally ok with these changes.
a06edcb
to
fdd24e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just need checks fixed
fdd24e7
to
e44f552
Compare
800febe
to
b51853d
Compare
Description:
The following PR adds the following metrics
These metrics can be found in the following links respectively:
errorTx and errorRx
reservedCapacity and totalCapacity
Link to tracking Issue: #33607
Testing:
Tested against a live environment to scrape added metrics, and updated golden test files.
Documentation:
Updated documentation through mdatagen.