A lightweight, pluggable pre-OOM watcher for Kubernetes. Monitors memory pressure in containers and triggers configurable actions before catastrophic OOM kills occur.
- Predictive monitoring - Configurable warn/critical thresholds with hysteresis
- Dual-mode operation - Sidecar (cgroup) or cluster-wide (kubelet) monitoring
- Plugin architecture - Extensible via dynamic shared libraries
- Lua scripting - Custom event handlers without recompilation
- Cloud storage - Stream heap dumps to S3 or local filesystem
- pprof analysis - Analyze Go heap profiles to identify memory hogs
- Minimal footprint - 3.4MB container image, low CPU/memory overhead
Add pressured as a sidecar container:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: my-app:latest
resources:
limits:
memory: 512Mi
- name: pressured
image: ghcr.io/ruiyangke/pressured:latest
args: ["-l", "info"]
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]
resources:
limits:
cpu: 50m
memory: 32Mi
requests:
cpu: 10m
memory: 16Mi
volumeMounts:
- name: cgroup
mountPath: /sys/fs/cgroup
readOnly: true
volumes:
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: DirectoryDeploy with Helm for cluster-wide monitoring:
helm install pressured charts/pressured \
--set mode=cluster \
--set rbac.create=truePrerequisites: CMake 3.16+, C11 compiler, libcurl, json-c, lua5.4, OpenSSL or mbedTLS
# Build
make build
# Run
./build/pressured -c config.json -l debugdocker pull ghcr.io/ruiyangke/pressured:latest
# Image size: 3.4MB
docker images ghcr.io/ruiyangke/pressured:latest# From local chart
helm install pressured charts/pressuredConfiguration is JSON-based. Create a config.json:
{
"source": {
"mode": "cgroup",
"poll_interval_ms": 1000,
"cgroup": {
"path": "/sys/fs/cgroup"
}
},
"thresholds": {
"warn_percent": 70,
"critical_percent": 85,
"hysteresis_percent": 5,
"cooldown_seconds": 60
},
"dry_run": false,
"log_level": "info"
}| Parameter | Default | Description |
|---|---|---|
source.mode |
cgroup |
cgroup (sidecar) or kubelet (cluster) |
poll_interval_ms |
1000 |
Sampling interval in milliseconds |
warn_percent |
70 |
Warning threshold (% of memory limit) |
critical_percent |
85 |
Critical threshold (% of memory limit) |
hysteresis_percent |
5 |
Band to prevent event flapping |
cooldown_seconds |
60 |
Minimum seconds between events per container |
dry_run |
false |
Log-only mode, no action dispatch |
pressured [options]
-c, --config <path> Config file path (JSON)
-l, --log-level <level> Log level (trace, debug, info, warn, error)
-d, --dry-run Dry run mode
-h, --help Show help
-v, --version Show version
Plugins are loaded from PRESSURED_PLUGIN_DIR. Each plugin registers services with the service registry.
Execute Lua scripts on memory pressure events:
function on_event(event, ctx)
if event.severity == "critical" then
log.warn("CRITICAL: " .. event.namespace .. "/" .. event.pod_name)
-- Trigger heap dump, send alert, etc.
end
endLua API:
log.trace/debug/info/warn/error(msg)- Logginghttp.fetch(url, opts)- HTTP requestsstorage.write/read/remove/exists(key)- Storage operationsevent- Current memory event dataconfig- Full configuration
Stream data to cloud storage with minimal memory footprint:
| Backend | Plugin | Priority | Auth |
|---|---|---|---|
| AWS S3 | s3-storage.so |
100 | IAM, IRSA, static credentials |
| Local | local-storage.so |
50 | Filesystem |
S3 configuration example:
storage:
enabled: true
backend: s3
s3:
bucket: "my-bucket"
region: "us-west-2"
prefix: "oom-dumps/"The pprof plugin provides a C service for parsing Go heap profiles. It reads gzip-compressed pprof data from storage and identifies top memory-consuming functions. This is used internally for heap profile analysis after profiles are captured via HTTP.
Pressured supports per-pod threshold overrides via annotations. These are evaluated by the core event generator:
| Annotation | Description | Example |
|---|---|---|
pressured.io/warn-percent |
Override warn threshold (% of memory limit) | "70" |
pressured.io/critical-percent |
Override critical threshold (% of memory limit) | "85" |
pressured.io/cooldown-seconds |
Override cooldown between events | "120" |
Example pod with custom thresholds:
apiVersion: v1
kind: Pod
metadata:
name: memory-hungry-app
annotations:
# This pod needs higher thresholds due to its memory pattern
pressured.io/warn-percent: "80"
pressured.io/critical-percent: "95"
pressured.io/cooldown-seconds: "30"
spec:
containers:
- name: app
image: my-app:latest
resources:
limits:
memory: 2GiThe bundled pprof example script also uses these annotations:
| Annotation | Description | Example |
|---|---|---|
pressured.io/pprof-enabled |
Enable/disable pprof collection for this pod | "true", "false" |
pressured.io/pprof-port |
Override pprof port (default: 6060) | "8080" |
pressured.io/pprof-path |
Override pprof heap path | "/debug/pprof/heap" |
Custom annotations can be read in Lua scripts:
function on_event(event, ctx)
-- Read custom annotation (annotations are a table keyed by name)
local custom_action = event.annotations and event.annotations["myapp.io/oom-action"]
if custom_action == "restart" then
-- Custom logic
end
endFor S3 storage with IAM Roles for Service Accounts:
# values.yaml
mode: cluster
replicaCount: 1
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/pressured-s3-role
rbac:
create: true
storage:
enabled: true
backend: s3
s3:
bucket: "heap-dumps"
region: "us-west-2"
prefix: "oom-dumps/"mode: cluster
replicaCount: 1
image:
repository: ghcr.io/ruiyangke/pressured
# tag defaults to chart appVersion if not set
pullPolicy: IfNotPresent
serviceAccount:
create: true
annotations:
# For AWS IRSA (S3 storage)
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/pressured-role
rbac:
create: true
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 50m
memory: 64Mi
config:
source:
kubelet:
poll_interval_ms: 5000
namespace_filter: "default,production"
label_selector: "app.kubernetes.io/managed-by=pressured"
thresholds:
warn_percent: 70
critical_percent: 85
hysteresis_percent: 5
cooldown_seconds: 60
log_level: info
lua:
enabled: true
scripts:
alert.lua: |
function on_event(event, ctx)
if event.severity == "critical" then
log.error(string.format("CRITICAL: %s/%s at %.0f%%",
event.namespace, event.pod_name, event.usage_percent))
-- Trigger heap dump for Go applications (requires pod_ip and pprof port)
if event.pod_ip then
local pprof_url = "http://" .. event.pod_ip .. ":6060/debug/pprof/heap"
local resp = http.fetch(pprof_url, {
method = "GET",
timeout_ms = 5000
})
if resp.status == 200 then
storage.write(event.pod_name .. "/heap-" .. os.time() .. ".pb.gz", resp.body)
end
end
end
return "ok"
end
storage:
enabled: true
backend: s3
s3:
bucket: "heap-dumps"
region: "us-west-2"
prefix: "oom-dumps/"
podDisruptionBudget:
enabled: true
minAvailable: 1
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: pressuredrules:
- apiGroups: [""]
resources: ["nodes", "pods", "events"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]- CMake 3.16+
- C11 compiler
- libcurl, json-c, lua5.4, zlib
- OpenSSL or mbedTLS (for S3 signing)
make build # Debug build
make build-release # Release build
make test # Run tests
make test-valgrind # Memory leak testing
make lint # Static analysis
make format # Format code
make docker # Build Docker imagepressured/
├── src/ # Core source
│ ├── main.c # Entry point, event loop
│ ├── config.c # Configuration parsing
│ ├── cgroup.c # Cgroup v1/v2 monitoring
│ ├── kubelet.c # Kubernetes API integration
│ ├── event_generator.c # Threshold logic
│ └── service_registry.c # Plugin service management
├── include/ # Public headers
├── plugins/
│ ├── lua/ # Lua scripting plugin
│ ├── storage/ # Storage backends (local, S3)
│ └── pprof/ # Heap profile analyzer
├── charts/pressured/ # Helm chart
└── scripts/ # Build and test scripts
The production image uses a multi-stage build with:
- mbedTLS instead of OpenSSL for smaller crypto footprint
- Minimal curl built from source with only HTTP/HTTPS
- Scratch base for minimal attack surface
- Final size: 3.4MB
# Build locally
make docker
# Verify size
docker images ghcr.io/ruiyangke/pressured:latestMIT