Skip to content
This repository has been archived by the owner on Apr 20, 2021. It is now read-only.

Install the bpf program for all pods and collect/expose the data #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

robertgzr
Copy link
Contributor

@robertgzr robertgzr commented Jul 26, 2017

This introduces a bpf controller that runs the lookup look for every cgroup we attach a program to and executes handlers that write the tracked data to Prometheus metrics that get exposed on port 9101. The top subcommand uses the same code but implements different handlers. It also improves overall logging and adds a flag to adjust the verbosity.

Closes #4, #5, #6, #9

@robertgzr robertgzr added this to the 0.1.0 milestone Jul 26, 2017
@robertgzr robertgzr force-pushed the robertgzr/bpf-per-pod branch 2 times, most recently from 98d10c0 to 3be4279 Compare July 26, 2017 16:50
@@ -104,6 +104,12 @@
revision = "f3f9494671f93fcff853e3c6e9e948b3eb71e590"

[[projects]]
name = "github.com/go-stack/stack"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this pkg mentioned here? it does not seem to be used anywhere (neither in cgnet or in the vendored pkgs), or vendored?

Copy link

@asymmetric asymmetric Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU this file is generated by the package manager.

So to answer your question, one would have to look into the dep's behaviour.

Dockerfile Outdated
@@ -5,5 +5,5 @@ RUN apk update && apk add --no-cache libc6-compat
COPY cgnet /cgnet

EXPOSE 9101
ENTRYPOINT ["/cgnet", "serve"]
ENTRYPOINT ["/cgnet", "--verbose", "2", "export"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameters could be on the CMD line

revision = "b105bd37f74e5d9dc7b6ad7806715c7a2b83fd3f"
version = "v2.11"
packages = [".","term"]
revision = "74a0988b5f804e8ce9ff74fca4f16980776dff29"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for using master instead of a released version?

return
case <-c.quit:
return
case <-time.After(1000 * time.Millisecond):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use time.Second

return
case <-time.After(1000 * time.Millisecond):
lookup(c.module, packetsKey, c.packetsHandler)
lookup(c.module, bytesKey, c.bytesHandler)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using a callback in parameter 3 instead of just having the lookup function returning the value?

c.packetsHandle(lookup(c.module, packetsKey))

bpf/src/cgnet.c Outdated
@@ -19,7 +19,7 @@
#include <uapi/linux/ip.h>
#include "bpf_helpers.h"

struct bpf_map_def SEC("maps/count") count_map = {
struct bpf_map_def SEC("maps/cgnet") count_map = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the same name for the section and the variable name?
something like:

struct bpf_map_def SEC("maps/count_map") count_map = {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to document:

  • what is the key: define the keys nearby the map definition, so it's clearer.
  • what is the value: a counter
  • how it is used to communicate: values initialized to zero by userspace and incremented by the ebpf program

Just before the map definition:

#define PACKETS_KEY 0
#define BYTES_KEY 1

In the count_packets function:

     int packets_key = PACKETS_KEY, bytes_key = BYTES_KEY;

cmd/serve.go Outdated
metrics.GlobalPodMetrics.TotalNumberPods.Add(1)
podlog := log.New("pod", e.PodSelfLink)

cgPath := buildCgroupPath(cgroupRoot, e.PodUID, e.PodQOSClass)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to get this from the Kubernetes API rather than assuming the path is built in a specific way?

/cc @asymmetric @nhlfr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could follow this and do something like client.Get().Namespace("foo").Resource("bar").URL().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asymmetric But that only gives me the resource URL I need the path to the pods cgroup here

@alban I looked though http://godoc.org/k8s.io/api/core/v1#Pod to find a way to get the cgroup for pods but building it ourselfs seems to be the only way. Maybe @nhlfr knows...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If k8s does not provide it, you might have to ask Docker the leader pid of the container and then check in /proc/$pid/cgroup...

You could file an issue and add a // FIXME: ... if you want to do that later.

cmd/serve.go Outdated
case kube.DeletePodEvent:
metrics.GlobalPodMetrics.TotalNumberPods.Sub(1)
metrics.TotalNum().Sub(1)
log.Debug("pod gone", "pod", e.PodSelfLink, "cgroup", buildCgroupPath(cgroupRoot, e.PodUID, e.PodQOSClass))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup the bpf module? Or at least. add a // FIXME: ... to remember

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually why I filed #7 because I didn't know if you had to do any cleanup work.

Will fix this now...

This introduces a bpf controller that runs the lookup loop for
every cgroup we attach a program to, reads the map and executes
handlers that write the tracked data to Prometheus metrics.

The top subcommand uses the same code but implements different handlers.

Closes #4, closes #5, closes #6, closes #9
@robertgzr
Copy link
Contributor Author

Any more feedback here?

Copy link

@dongsupark dongsupark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, LGTM.
"top" command works fine.
I didn't try out the prometheus part though.

See the License for the specific language governing permissions and
limitations under the License.
*/

package bpf

import (
"bytes"
"fmt"
"log"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"log" doesn't seem to be used anywhere, so we get build error.

func (c *Controller) Stop() {
c.quit <- struct{}{}
if err := c.module.Close(); err != nil {
Log.Error("error closing bpf.Module", "err", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be log instead of Log?
And log needs to be imported, otherwise we get build errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's left over from the logging I introduced but split out of this PR. thx will use the log pkg here

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants