OncaLLM Agent

An LLM-powered agent that helps with Kubernetes debugging based on alerts received from monitoring systems.

Overview

This application provides an API that can:

Receive Kubernetes alert webhooks (e.g., from Prometheus Alertmanager).
Utilize a LangGraph-based agent (OncallmAgent) to analyze these alerts.
The agent uses various tools, including direct Kubernetes API access via KubernetesService, to gather context.
Analyze the gathered information to determine potential root causes and generate recommendations.
Store analysis reports and provide API endpoints to retrieve them.

Features

Alert Webhook Integration: Receives alerts from Prometheus Alertmanager.
LangGraph Powered Analysis: Uses a ReAct agent built with LangGraph for intelligent analysis.
Direct Kubernetes API Integration: KubernetesService interacts directly with your Kubernetes cluster to fetch information about pods, services, deployments, and logs.
Automated Analysis: The LLM agent analyzes alert data and cluster information to identify potential root causes.
Recommendation Generation: Provides actionable recommendations to resolve issues.
Report Storage & API: Stores analysis reports and offers endpoints to list all reports or fetch specific ones by ID.

Prerequisites

Python 3.8+
Access to a Kubernetes cluster (the agent will use the standard kubeconfig resolution, e.g., ~/.kube/config or in-cluster service account).
OpenAI API key (or other compatible LLM provider configured in oncallm/llm_service.py).
Docker installed and running.
Kind installed.
kubectl installed and configured.
Helm (if OncaLLM deployment uses Helm - TBD).

Installation

Clone the repository:

git clone <repository-url>
cd <repository-directory-name> # e.g., oncallm-agent

Install dependencies:
```
pip install -r requirements.txt
```

Create a .env file by copying from .env.example (if provided, otherwise create new) with your configuration:

# FastAPI settings
APP_HOST=0.0.0.0
APP_PORT=8001 # Default port for oncallm.main

# Kubernetes settings (optional, if not using default kubeconfig resolution or in-cluster auth)
# KUBECONFIG_PATH=/path/to/your/kubeconfig

# LLM settings
OPENAI_API_KEY=your_openai_api_key
LLM_MODEL=gpt-4-turbo # Or your preferred model
# LLM_API_BASE=your_llm_api_base_if_not_openai_default # Optional, for self-hosted or proxy

# Langfuse Observability (Optional)
# LANGFUSE_PUBLIC_KEY=pk-lf-...
# LANGFUSE_SECRET_KEY=sk-lf-...
# LANGFUSE_HOST=https://cloud.langfuse.com # or your self-hosted instance

Usage

Starting the Server

python -m oncallm.main

The server will start on the configured host and port (default: 0.0.0.0:8001).

API Endpoints

GET /: Root endpoint with API information.
GET /health: Health check endpoint.
POST /webhook: Endpoint to receive alerts from Alertmanager.
GET /reports: Lists all analysis reports.
GET /reports/{report_id}: Retrieves a specific analysis report by its ID.

Configuring Alertmanager

To configure Prometheus Alertmanager to send alerts to this service, add the following to your alertmanager.yml:

receivers:
  - name: 'oncallm-webhook'
    webhook_configs:
      - url: 'http://<your-oncallm-service-url>:8001/webhook' # Replace with actual URL
        send_resolved: true

Development

Running Tests

The project uses pytest for unit and API testing.

To run all unit tests:
```
pytest tests/unit
```
To run all API tests:
```
pytest tests/api
```
To run all tests (unit and API):
```
pytest tests/
```

Ensure you have installed the necessary dependencies, including pytest from requirements.txt.

License

MIT License

Contributing

We welcome contributions!

Please read the project roadmap to understand priorities and milestones: ROADMAP.md.
Coding style: follow the Google Python Style Guide in codes.
Simplicity first: prefer clear, minimal solutions and small, focused PRs.

Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
charts/oncallm		charts/oncallm
dev-infra		dev-infra
docs		docs
oncallm		oncallm
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
oncallm.Dockerfile		oncallm.Dockerfile
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OncaLLM Agent

Overview

Features

Prerequisites

Installation

Usage

Starting the Server

API Endpoints

Configuring Alertmanager

Development

Running Tests

License

Contributing

About

Uh oh!

Releases

Packages

Languages

License

OncaLLM/oncallm

Folders and files

Latest commit

History

Repository files navigation

OncaLLM Agent

Overview

Features

Prerequisites

Installation

Usage

Starting the Server

API Endpoints

Configuring Alertmanager

Development

Running Tests

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages