Skip to content

Commit e297800

Browse files
Add ebpf collector (#156)
* feat: Add ebpf collector. Monitor TCP and UDP sockets for network events. Our first approach of monitoring link level funcs is not going anywhere due to lack of process context in those functions. So we resorted to monitoring TCP and UDP sockets which are more high level but surely we will have correct process context. For the case of VFS events, we monitor read, write, create, mkdir, unlink events. Only read write events are aggregated by mount points where as the rest are globally aggregated. Use higher max entries for maps to get more expected behaviour in evicting entries. Keep CPU specific LRU cache for better performance. * refactor: Reorganize code of individual collectors * Make a generic cgroup collector that can be used for different resource managers. Generic cgroup collector wont register itself to Collector interface and it is meant to be used in other collectors. Resource manager collectors must pass a list of valid cgroup paths to cgroup collector for metrics fetching. * Similarly perf collector has been modified to become internal generic collector that must be called from other collectors specific to resource managers. Same goes to ebpf collector where it becomes an internal collector that is meant to be called from other collectors. Use go routines in perf collector to update. * cgroup, perf and ebpf collectors take an argument cgroupManager during instantiation that will direct on which processes/cgroups must be monitored. * The side-effect is that we replicate CLI args for each resource manager but this should not be an issue as exporter on a given host will not/should not target two different resource managers. So, operators will never have to deal with duplication. * ci: Install clang==18 in CI when not found. Add a dummy ELF object file for linter to pass. Make bpf assets before running unit tests --------- Signed-off-by: Mahendra Paipuri <mahendra.paipuri@gmail.com>
1 parent ed0bff3 commit e297800

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+98113
-2880
lines changed

.circleci/config.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,15 @@ jobs:
1919
steps:
2020
- prometheus/setup_environment
2121
- run: go mod download
22+
- run: GOARCH=1 make clang
2223
- run: make
2324
- run: CGO_BUILD=1 make
2425
test-arm:
2526
executor: arm
2627
steps:
2728
- checkout
2829
- run: uname -a
30+
- run: GOARCH=1 make clang
2931
- run: make
3032
- run: CGO_BUILD=1 make
3133
build:

.clang-format

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
AccessModifierOffset: -4
2+
AlignAfterOpenBracket: Align
3+
AlignConsecutiveAssignments: false
4+
AlignConsecutiveBitFields: false
5+
AlignConsecutiveDeclarations: false
6+
AlignConsecutiveMacros: true
7+
AlignEscapedNewlines: Left
8+
AlignOperands: true
9+
AlignTrailingComments: false
10+
AllowAllParametersOfDeclarationOnNextLine: false
11+
AllowShortBlocksOnASingleLine: false
12+
AllowShortEnumsOnASingleLine: false
13+
AllowShortFunctionsOnASingleLine: Inline
14+
AllowShortIfStatementsOnASingleLine: false
15+
AllowShortLoopsOnASingleLine: false
16+
BasedOnStyle: LLVM
17+
BraceWrapping:
18+
AfterControlStatement: false
19+
AfterEnum: false
20+
AfterFunction: true
21+
AfterStruct: false
22+
AfterUnion: false
23+
BeforeElse: false
24+
IndentBraces: false
25+
BreakBeforeBraces: Custom
26+
ColumnLimit: 0
27+
ConstructorInitializerIndentWidth: 4
28+
ContinuationIndentWidth: 8
29+
Cpp11BracedListStyle: false
30+
DerivePointerAlignment: false
31+
IndentCaseLabels: false
32+
IndentPPDirectives: None
33+
IndentWidth: 8
34+
IndentWrappedFunctionNames: false
35+
PointerAlignment: Right
36+
ReflowComments: false
37+
SortIncludes: false
38+
SpaceAfterCStyleCast: false
39+
SpaceAfterTemplateKeyword: false
40+
SpaceBeforeAssignmentOperators: true
41+
SpaceBeforeParens: ControlStatements
42+
SpaceBeforeRangeBasedForLoopColon: true
43+
SpaceInEmptyParentheses: false
44+
SpacesBeforeTrailingComments: 1
45+
SpacesInAngles: false
46+
SpacesInContainerLiterals: false
47+
SpacesInCStyleCastParentheses: false
48+
SpacesInParentheses: false
49+
SpacesInSquareBrackets: false
50+
TabWidth: 8
51+
UseTab: Always

.github/workflows/codeql.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ jobs:
5454
shell: bash
5555
if: ${{ matrix.language == 'go' }}
5656
run: |
57+
echo 'Installing clang 18'
58+
GOARCH=1 make clang
5759
echo 'Building pure go binaries'
5860
make build
5961
echo 'Building cgo binaries'

.github/workflows/step_tests-e2e.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ jobs:
1717
with:
1818
go-version: 1.22.x
1919

20+
- name: Setup clang 18
21+
run: ./scripts/install_clang.sh
22+
2023
- name: Run e2e tests for Go packages
2124
run: make test-e2e
2225

.github/workflows/step_tests-lint.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@ jobs:
1818
with:
1919
go-version: 1.22.x
2020

21+
- name: Create a sample object file
22+
run: |
23+
mkdir -p pkg/collector/bpf/objs
24+
touch pkg/collector/bpf/objs/sample.o
25+
2126
- name: Lint
2227
uses: golangci/golangci-lint-action@v6
2328
with:

.github/workflows/step_tests-unit.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ jobs:
1717
with:
1818
go-version: 1.22.x
1919

20+
- name: Setup clang 18
21+
run: ./scripts/install_clang.sh
22+
2023
- name: Run checkmetrics and checkrules
2124
run: make checkmetrics checkrules
2225

Makefile

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ else
4040
test-docker := test-docker
4141
endif
4242

43+
# Base test flags
44+
test-flags := -covermode=atomic -race
45+
4346
# Use CGO for api and GO for ceems_exporter.
4447
PROMU_TEST_CONF ?= .promu-go-test.yml
4548
ifeq ($(CGO_BUILD), 1)
@@ -67,8 +70,13 @@ else
6770

6871
# go test flags
6972
coverage-file := coverage-go.out
73+
74+
# If running in CI add -exec sudo flags to run tests that require privileges
75+
ifeq ($(CI), true)
76+
test-flags := $(test-flags) -exec sudo
77+
endif
7078
endif
71-
test-flags := -covermode=atomic -coverprofile=$(coverage-file).tmp -race
79+
test-flags := $(test-flags) -coverprofile=$(coverage-file).tmp
7280

7381
ifeq ($(GOHOSTOS), linux)
7482
test-e2e := test-e2e
@@ -109,13 +117,13 @@ coverage:
109117
$(GO) tool cover -func=coverage.out -o=coverage.out
110118

111119
.PHONY: test
112-
test: pkg/collector/testdata/sys/.unpacked pkg/collector/testdata/proc/.unpacked
120+
test: pkg/collector/testdata/sys/.unpacked pkg/collector/testdata/proc/.unpacked bpf
113121
@echo ">> running tests"
114122
$(GO) test -short $(test-flags) $(pkgs)
115123
cat $(coverage-file).tmp | grep -v "main.go" > $(coverage-file)
116124

117125
.PHONY: test-32bit
118-
test-32bit: pkg/collector/testdata/sys/.unpacked
126+
test-32bit: pkg/collector/testdata/sys/.unpacked pkg/collector/testdata/proc/.unpacked bpf
119127
@echo ">> running tests in 32-bit mode"
120128
@env GOARCH=$(GOARCH_CROSS) $(GO) test $(pkgs)
121129

Makefile.common

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ PROMU := $(FIRST_GOPATH)/bin/promu
2626
SWAG := $(FIRST_GOPATH)/bin/swag
2727
pkgs = ./...
2828

29+
# clang format
30+
FORMAT_FIND_FLAGS ?= -name '*.c' -o -name '*.h' -not -path 'pkg/collector/bpf/include/vmlinux.h' -not -path 'pkg/collector/bpf/libbpf/*'
31+
2932
ifeq (arm, $(GOHOSTARCH))
3033
GOHOSTARM ?= $(shell GOARM= $(GO) env GOARM)
3134
GO_BUILD_PLATFORM ?= $(GOHOSTOS)-$(GOHOSTARCH)v$(GOHOSTARM)
@@ -48,7 +51,7 @@ PROMU_URL := https://github.com/prometheus/promu/releases/download/v$(PROMU_
4851
SKIP_GOLANGCI_LINT :=
4952
GOLANGCI_LINT :=
5053
GOLANGCI_LINT_OPTS ?=
51-
GOLANGCI_LINT_VERSION ?= v1.54.2
54+
GOLANGCI_LINT_VERSION ?= v1.60.3
5255
# golangci-lint only supports linux, darwin and windows platforms on i386/amd64.
5356
# windows isn't included here because of the path separator being different.
5457
ifeq ($(GOHOSTOS),$(filter $(GOHOSTOS),linux darwin))
@@ -172,6 +175,10 @@ else
172175
yamllint .
173176
endif
174177

178+
.PHONY: common-clang-format
179+
clang-format: ## Run code formatter on BPF code.
180+
find pkg/collector/bpf $(FORMAT_FIND_FLAGS) | xargs -n 1000 clang-format -i -style=file
181+
175182
# For backward-compatibility.
176183
.PHONY: common-staticcheck
177184
common-staticcheck: lint
@@ -184,7 +191,7 @@ common-unused:
184191

185192
# Dont bother updating swagger docs for release builds
186193
.PHONY: common-build
187-
common-build: promu swag
194+
common-build: promu swag bpf
188195
ifeq ($(RELEASE_BUILD), 0)
189196
ifeq ($(CGO_BUILD), 1)
190197
@echo ">> updating swagger docs"
@@ -246,6 +253,33 @@ $(PROMU):
246253
cp $(PROMU_TMP)/promu-$(PROMU_VERSION).$(GO_BUILD_PLATFORM)/promu $(FIRST_GOPATH)/bin/promu
247254
rm -r $(PROMU_TMP)
248255

256+
# Build bpf assets
257+
.PHONY: bpf
258+
# Build bpf assets only when CGO_BUILD=0
259+
ifeq ($(CGO_BUILD), 0)
260+
bpf: clang bpfclean
261+
@echo ">> building bpf assets using clang"
262+
$(MAKE) -C ./pkg/collector/bpf
263+
264+
# Clean existing bpf assets. When GOARCH is set we ALWAYS clean the
265+
# assets as we need to build them for each architecture
266+
.PHONY: bpfclean
267+
ifdef GOARCH
268+
bpfclean:
269+
@echo ">> cleaning existing bpf assets"
270+
$(MAKE) -C ./pkg/collector/bpf clean
271+
endif
272+
273+
# Install clang using script. Do it only when GOARCH is set as we need
274+
# clang to build go binaries inside golang-builder container.
275+
.PHONY: clang
276+
ifdef GOARCH
277+
clang:
278+
@echo ">> installing clang"
279+
@./scripts/install_clang.sh
280+
endif
281+
endif
282+
249283
# Dont run swagger for release builds. This is due to cross compiling with GOARCH set
250284
# to different archs and swag will be built in arch specific bin folder.
251285
.PHONY: swag

README.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
| | |
55
| ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
6-
| CI/CD | [![ci](https://github.com/mahendrapaipuri/ceems/workflows/CI/badge.svg)](https://github.com/mahendrapaipuri/ceems) [![CircleCI](https://dl.circleci.com/status-badge/img/circleci/8jSYT1wyKY8mKQRTqNLThX/TzM1Mr3AEAqmehnoCde19R/tree/main.svg?style=svg&circle-token=28db7268f3492790127da28e62e76b0991d59c8b)](https://dl.circleci.com/status-badge/redirect/circleci/8jSYT1wyKY8mKQRTqNLThX/TzM1Mr3AEAqmehnoCde19R/tree/main) [![Coverage](https://img.shields.io/badge/Coverage-75.9%25-brightgreen)](https://github.com/mahendrapaipuri/ceems/actions/workflows/ci.yml?query=branch%3Amain) |
6+
| CI/CD | [![ci](https://github.com/mahendrapaipuri/ceems/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/mahendrapaipuri/ceems/actions/workflows/ci.yml?query=branch%3Amain) [![CircleCI](https://dl.circleci.com/status-badge/img/circleci/8jSYT1wyKY8mKQRTqNLThX/TzM1Mr3AEAqmehnoCde19R/tree/main.svg?style=svg&circle-token=28db7268f3492790127da28e62e76b0991d59c8b)](https://dl.circleci.com/status-badge/redirect/circleci/8jSYT1wyKY8mKQRTqNLThX/TzM1Mr3AEAqmehnoCde19R/tree/main) [![Coverage](https://img.shields.io/badge/Coverage-75.9%25-brightgreen)](https://github.com/mahendrapaipuri/ceems/actions/workflows/ci.yml?query=branch%3Amain) |
77
| Docs | [![docs](https://img.shields.io/badge/docs-passing-green?style=flat&link=https://mahendrapaipuri.github.io/ceems/docs/)](https://mahendrapaipuri.github.io/ceems/) |
88
| Package | [![Release](https://img.shields.io/github/v/release/mahendrapaipuri/ceems.svg?include_prereleases)](https://github.com/mahendrapaipuri/ceems/releases/latest) |
99
| Meta | [![GitHub License](https://img.shields.io/github/license/mahendrapaipuri/ceems)](https://github.com/mahendrapaipuri/ceems) [![Go Report Card](https://goreportcard.com/badge/github.com/mahendrapaipuri/ceems)](https://goreportcard.com/report/github.com/mahendrapaipuri/ceems) [![code style](https://img.shields.io/badge/code%20style-gofmt-blue.svg)](https://pkg.go.dev/cmd/gofmt) |
@@ -14,30 +14,35 @@
1414
<img src="https://raw.githubusercontent.com/mahendrapaipuri/ceems/main/website/static/img/logo.png" width="200">
1515
</p>
1616

17-
Compute Energy & Emissions Monitoring Stack (CEEMS) (pronounced as *kiːms*) contains
18-
a Prometheus exporter to export metrics of compute instance units and a REST API
17+
Compute Energy & Emissions Monitoring Stack (CEEMS) (pronounced as *kiːms*) contains
18+
a Prometheus exporter to export metrics of compute instance units and a REST API
1919
server that serves the metadata and aggregated metrics of each
2020
compute unit. Optionally, it includes a TSDB load balancer that supports basic access
2121
control on TSDB so that one user cannot access metrics of another user.
2222

2323
"Compute Unit" in the current context has a wider scope. It can be a batch job in HPC,
24-
a VM in cloud, a pod in k8s, _etc_. The main objective of the repository is to quantify
24+
a VM in cloud, a pod in k8s, *etc*. The main objective of the repository is to quantify
2525
the energy consumed and estimate emissions by each "compute unit". The repository itself
2626
does not provide any frontend apps to show dashboards and it is meant to use along
2727
with Grafana and Prometheus to show statistics to users.
2828

29+
Although CEEMS was born out of a need to monitor energy and carbon footprint of compute
30+
workloads, it supports monitoring performance metrics as well. In addition, it leverages
31+
[eBPF](https://ebpf.io/what-is-ebpf/) framework to monitor IO and network metrics
32+
in a resource manager agnostic way.
33+
2934
## Install CEEMS
3035

31-
> [!WARNING]
32-
> DO NOT USE pre-release versions as the API has changed quite a lot between the
36+
> [!WARNING]
37+
> DO NOT USE pre-release versions as the API has changed quite a lot between the
3338
pre-release and stable versions.
3439

35-
Installation instructions of CEEMS components can be found in
40+
Installation instructions of CEEMS components can be found in
3641
[docs](https://mahendrapaipuri.github.io/ceems/docs/category/installation).
3742

3843
## Visualizing metrics with Grafana
3944

40-
CEEMS is meant to be used with Grafana for visualization and below are some of the
45+
CEEMS is meant to be used with Grafana for visualization and below are some of the
4146
screenshots few possible metrics.
4247

4348
### Time series compute unit CPU metrics
@@ -46,7 +51,7 @@ screenshots few possible metrics.
4651
<img src="https://raw.githubusercontent.com/mahendrapaipuri/ceems/main/website/static/img/dashboards/cpu_ts_stats.png" width="1200">
4752
</p>
4853

49-
### Time series compute unit GPU metrics
54+
### Time series compute unit GPU metrics
5055

5156
<p align="center">
5257
<img src="https://raw.githubusercontent.com/mahendrapaipuri/ceems/main/website/static/img/dashboards/gpu_ts_stats.png" width="1200">
@@ -71,9 +76,9 @@ screenshots few possible metrics.
7176

7277
## Contributing
7378

74-
We welcome contributions to this project, we hope to see this project grow and become
75-
a useful tool for people who are interested in the energy and carbon footprint of their
79+
We welcome contributions to this project, we hope to see this project grow and become
80+
a useful tool for people who are interested in the energy and carbon footprint of their
7681
workloads.
7782

78-
Please feel free to open issues and/or discussions for any potential ideas of
83+
Please feel free to open issues and/or discussions for any potential ideas of
7984
improvement.

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ go 1.22.5
44

55
require (
66
github.com/alecthomas/kingpin/v2 v2.4.0
7+
github.com/cilium/ebpf v0.11.0
78
github.com/containerd/cgroups/v3 v3.0.4-0.20240117155926-c00d22e55fef
89
github.com/go-chi/httprate v0.14.1
910
github.com/go-kit/log v0.2.1
@@ -30,7 +31,6 @@ require (
3031
github.com/alecthomas/units v0.0.0-20231202071711-9a357b53e9c9 // indirect
3132
github.com/beorn7/perks v1.0.1 // indirect
3233
github.com/cespare/xxhash/v2 v2.3.0 // indirect
33-
github.com/cilium/ebpf v0.11.0 // indirect
3434
github.com/coreos/go-systemd/v22 v22.5.0 // indirect
3535
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
3636
github.com/docker/go-units v0.5.0 // indirect

0 commit comments

Comments
 (0)