Skip to content

Conversation

@ahmedElqutb
Copy link

Here we introduce a simple prometheus metrics collector that collects, for now, cpu and memory utilisation per namespace;
the metrics are normalized over a period of 1h, cpu usage is measured as sum(rate(container_cpu_usage_seconds_total{container!='POD', container!=''}[1h])) by (namespace) and memory usage is measured as sum(container_memory_working_set_bytes{container!='POD', container!=''}) by (namespace)`. The collector can be easily expanded to include other metrics if needed.
the output is stored in the bundle as follows

utilisation
|_<NAMESPACE>.json

and the format is as follows

{
    "timestamp": "2023-05-12T08:28:43.401Z",
    "cpu": 0.0042575040813548,
    "memory": 56418304
  }

@ahmedElqutb ahmedElqutb self-assigned this May 12, 2023
@ahmedElqutb ahmedElqutb force-pushed the ashraf/utilization-collector branch from 4b598ae to c864ced Compare May 12, 2023 13:07
@ahmedElqutb
Copy link
Author

sample metrics for the kommander namespace:

[
  {
    "timestamp": "2023-05-12T05:28:43.401Z",
    "cpu": 1.3415371079170424,
    "memory": 11176812544
  },
  {
    "timestamp": "2023-05-12T06:28:43.401Z",
    "cpu": 1.6222203072304835,
    "memory": 12486553600
  },
  {
    "timestamp": "2023-05-12T07:28:43.401Z",
    "cpu": 1.6127546498546972,
    "memory": 12850171904
  },
  {
    "timestamp": "2023-05-12T08:28:43.401Z",
    "cpu": 1.6386345158054396,
    "memory": 13139566592
  },
  {
    "timestamp": "2023-05-12T09:28:43.401Z",
    "cpu": 1.6826758809251552,
    "memory": 13567676416
  },
  {
    "timestamp": "2023-05-12T10:28:43.401Z",
    "cpu": 1.72041521148195,
    "memory": 13700907008
  },
  {
    "timestamp": "2023-05-12T11:28:43.401Z",
    "cpu": 1.718887859585921,
    "memory": 13959213056
  },
  {
    "timestamp": "2023-05-12T12:28:43.401Z",
    "cpu": 1.7229104660950567,
    "memory": 14513143808
  }
]

Memory float64 `json:"memory,omitempty" yaml:"memory,omitempty"`
}

type Utilisation struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this is an exported struct, I would add a short description. i.e Readings map[string] this is and map of ... which index represents ...

Comment on lines 71 to 77
if err != nil {
return nil, errors.Wrap(err, "Could not look up thanos-query pods: ")
}
if len(pods.Items) < 1 {
// Check if no thanos-query pods are found.
return nil, errors.New("Could not find any thanos-query pods.")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like these two if-statement can be combined?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one checks for a failed request, the other asserts empty response


// Query for CPU utilization
cpuQuery := "sum(rate(container_cpu_usage_seconds_total{container!='POD', container!=''}[1h])) by (namespace)"
start := time.Now().Add(-1 * time.Duration(c.Collector.Duration) * 24 * time.Hour)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is time.Duration(c.Collector.Duration) * 24 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// How far back to query metrics in days.


// PortForward creates an asynchronous port-forward to a pod on the cluster in order to
// access it over the local network.
func PortForward(cfg *rest.Config, ns, pod string, port int, out output.Output) (localPort int, stop chan struct{}, err error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these port forwarding things shouldn't be part of this package. what do you think?

}
}

// Include remaining items from cpu readings
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Include remaining items from cpu readings
// including the remaining items from cpu readings

func StartPortForwarding(
ctx context.Context,
restConfig *rest.Config,
out output.Output,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is hardcoded to non interactive output, so I would use nomral gaoling log package instead of dkp-cli-runtime as the runtime is built for printing CLI messages.


// PortForward creates an asynchronous port-forward to a pod on the cluster in order to
// access it over the local network.
func PortForward(cfg *rest.Config, ns, pod string, port int, out output.Output) (localPort int, stop chan struct{}, err error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we control port forwarding lifetime with context cancellation instead of returning back the stop channel?

@ahmedElqutb ahmedElqutb force-pushed the ashraf/utilization-collector branch from 3652635 to 67a3ea1 Compare May 15, 2023 10:01
Comment on lines +34 to +39
// UsageReading the CPU and memory utilisation at a given timestamp
type UsageReading struct {
Timestamp time.Time `json:"timestamp,omitempty" yaml:"timestamp,omitempty"`
CPU float64 `json:"cpu,omitempty" yaml:"cpu,omitempty"`
Memory float64 `json:"memory,omitempty" yaml:"memory,omitempty"`
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked if there is a some known data format for storing metrics? Maybe from prometheus itself. It would be great for interoperability and further data processing.

@ahmedElqutb ahmedElqutb force-pushed the ashraf/utilization-collector branch from 67a3ea1 to b9991f7 Compare May 15, 2023 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants