-
Notifications
You must be signed in to change notification settings - Fork 1
feat: utilization collector #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.47.0-d2iq
Are you sure you want to change the base?
Conversation
4b598ae to
c864ced
Compare
|
sample metrics for the kommander namespace: |
| Memory float64 `json:"memory,omitempty" yaml:"memory,omitempty"` | ||
| } | ||
|
|
||
| type Utilisation struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since this is an exported struct, I would add a short description. i.e Readings map[string] this is and map of ... which index represents ...
| if err != nil { | ||
| return nil, errors.Wrap(err, "Could not look up thanos-query pods: ") | ||
| } | ||
| if len(pods.Items) < 1 { | ||
| // Check if no thanos-query pods are found. | ||
| return nil, errors.New("Could not find any thanos-query pods.") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like these two if-statement can be combined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one checks for a failed request, the other asserts empty response
|
|
||
| // Query for CPU utilization | ||
| cpuQuery := "sum(rate(container_cpu_usage_seconds_total{container!='POD', container!=''}[1h])) by (namespace)" | ||
| start := time.Now().Add(-1 * time.Duration(c.Collector.Duration) * 24 * time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is time.Duration(c.Collector.Duration) * 24 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // How far back to query metrics in days. |
pkg/collect/utilisation.go
Outdated
|
|
||
| // PortForward creates an asynchronous port-forward to a pod on the cluster in order to | ||
| // access it over the local network. | ||
| func PortForward(cfg *rest.Config, ns, pod string, port int, out output.Output) (localPort int, stop chan struct{}, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these port forwarding things shouldn't be part of this package. what do you think?
pkg/collect/utilisation.go
Outdated
| } | ||
| } | ||
|
|
||
| // Include remaining items from cpu readings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Include remaining items from cpu readings | |
| // including the remaining items from cpu readings |
| func StartPortForwarding( | ||
| ctx context.Context, | ||
| restConfig *rest.Config, | ||
| out output.Output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument is hardcoded to non interactive output, so I would use nomral gaoling log package instead of dkp-cli-runtime as the runtime is built for printing CLI messages.
pkg/collect/utilisation.go
Outdated
|
|
||
| // PortForward creates an asynchronous port-forward to a pod on the cluster in order to | ||
| // access it over the local network. | ||
| func PortForward(cfg *rest.Config, ns, pod string, port int, out output.Output) (localPort int, stop chan struct{}, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we control port forwarding lifetime with context cancellation instead of returning back the stop channel?
3652635 to
67a3ea1
Compare
| // UsageReading the CPU and memory utilisation at a given timestamp | ||
| type UsageReading struct { | ||
| Timestamp time.Time `json:"timestamp,omitempty" yaml:"timestamp,omitempty"` | ||
| CPU float64 `json:"cpu,omitempty" yaml:"cpu,omitempty"` | ||
| Memory float64 `json:"memory,omitempty" yaml:"memory,omitempty"` | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked if there is a some known data format for storing metrics? Maybe from prometheus itself. It would be great for interoperability and further data processing.
67a3ea1 to
b9991f7
Compare
Here we introduce a simple prometheus metrics collector that collects, for now, cpu and memory utilisation per namespace;
the metrics are normalized over a period of 1h, cpu usage is measured as
sum(rate(container_cpu_usage_seconds_total{container!='POD', container!=''}[1h])) by (namespace) and memory usage is measured assum(container_memory_working_set_bytes{container!='POD', container!=''}) by (namespace)`. The collector can be easily expanded to include other metrics if needed.the output is stored in the bundle as follows
and the format is as follows