Skip to content

feat: Add Helm chart for Kubernetes deployment#50

Open
yashGoyal40 wants to merge 26 commits intorepowise-dev:mainfrom
yashGoyal40:feat/helm-chart
Open

feat: Add Helm chart for Kubernetes deployment#50
yashGoyal40 wants to merge 26 commits intorepowise-dev:mainfrom
yashGoyal40:feat/helm-chart

Conversation

@yashGoyal40
Copy link
Copy Markdown

@yashGoyal40 yashGoyal40 commented Apr 6, 2026

Summary

  • Adds a production-ready Helm chart (charts/repowise/) for deploying Repowise on Kubernetes
  • Includes configurable templates for Deployment, Service, PVC, Ingress, Secret, and ServiceAccount
  • Full values.yaml with support for LLM API keys, persistence, resource limits, ingress, and existing secrets
  • Chart README with quick start, configuration table, and usage examples

Closes #49

What's included

Template Purpose
deployment.yaml Single-pod deployment with Recreate strategy (SQLite constraint)
service.yaml ClusterIP service exposing backend (7337) and frontend (3000) ports
pvc.yaml Persistent storage for /data (SQLite DB + indexed repos)
secret.yaml API keys for Anthropic/OpenAI/Gemini (supports existingSecret)
ingress.yaml Optional ingress with TLS support
serviceaccount.yaml Optional dedicated ServiceAccount

Usage

helm install repowise ./charts/repowise \
  --set image.repository=your-registry/repowise \
  --set image.tag=0.1.0 \
  --set apiKeys.anthropic=sk-ant-...

Test plan

  • helm lint charts/repowise passes clean
  • helm template test charts/repowise renders all manifests correctly
  • Deploy to a test k8s cluster and verify pods come up healthy
  • Verify Web UI accessible via port-forward
  • Verify ingress routing when enabled

🤖 Generated with Claude Code

Adds a production-ready Helm chart under charts/repowise/ that enables
deploying Repowise to any Kubernetes cluster. Includes templates for
Deployment, Service, PVC, Ingress, Secret, and ServiceAccount with full
configurability via values.yaml.

Closes repowise-dev#49

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yashGoyal40 and others added 15 commits April 6, 2026 18:14
The HTTPProxy was sending all traffic to the frontend (port 3000).
Now /api/*, /health, and /metrics are routed directly to the backend
(port 7337), while everything else goes to the frontend. Also replaced
the Ingress template with Contour HTTPProxy with wildcard TLS support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backend exposes /health, not /api/health. The provider-section
component was calling the wrong endpoint causing "Server returned
non-healthy status" on every self-hosted deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restores the standard networking.k8s.io/v1 Ingress template so the
chart works out of the box on any Kubernetes cluster, not just those
running Contour.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a post-install/upgrade Kubernetes Job that clones repos declared
in values.yaml into /data/repos/, registers them with the Repowise API,
and triggers an initial sync. Supports private repos via GitHub PAT or
an existing git-credentials Secret.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- initContainer (bitnami/git) clones repos into /data/repos/ before
  the main app starts
- Sidecar container (curlimages/curl) waits for API health, registers
  each repo via POST /api/repos, and triggers sync
- Supports private repos via GitHub PAT or existing git-credentials Secret
- Removed the post-install Job approach (PVC ReadWriteOnce conflict)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Increase liveness probe timeout to 15s and failureThreshold to 10
  to prevent pod kills during CPU-intensive indexing
- Sidecar registers repos one-by-one, waits for each sync to complete
  before starting the next (prevents SQLite database lock)
- Skip sync for repos that already have a head_commit (already indexed)
- Remove old repo-init-scripts ConfigMap (script is now inline)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Indexing large repos is so CPU-intensive that the /health endpoint
becomes unresponsive, causing the liveness probe to kill the container
repeatedly. Disabled liveness probe by default — readiness probe is
kept (it only removes from service, doesn't restart).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds optional PostgreSQL deployment (pgvector/pgvector:pg16) that
replaces SQLite, eliminating "database is locked" errors during heavy
indexing. Repowise app code already supports PostgreSQL natively.

- StatefulSet with PVC for PostgreSQL data
- Conditional REPOWISE_DB_URL (asyncpg when PG enabled, aiosqlite otherwise)
- wait-for-postgres initContainer ensures DB is ready before app starts
- pgvector image includes vector extension for semantic search
- Fully backward compatible: postgresql.enabled=false keeps SQLite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PostgreSQL eliminates SQLite's "database is locked" errors during
heavy indexing and enables concurrent API access. Uses pgvector image
for vector search support. SQLite still available via postgresql.enabled=false.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With PostgreSQL as default, there's no SQLite lock issue. Repos now
trigger sync in parallel without waiting for each to complete.
Still skips already-indexed repos (head_commit check).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
initContainer clones repos as root but app runs as uid 1000. Git
refuses to read repos with different ownership. Fix: write a
.gitconfig with safe.directory=* into /data and set HOME for the
app container. This enables hotspots, ownership, and architecture
graph features.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The register-repos sidecar now sleeps forever after completing its
work. This prevents k8s from restarting it in a loop (containers
that exit get restarted by default in a pod).

Also bumps PostgreSQL to max_connections=4000, shared_buffers=2GB,
8Gi memory limit for heavy indexing workloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@RaghavChamadiya
Copy link
Copy Markdown
Collaborator

Thanks for the Helm chart, the structure is well organized! There are some issues to sort out though:

  1. PostgreSQL enabled by default (blocking): postgresql.enabled: true is the default, but repowise uses SQLite. We shouldn't ship a chart that defaults to a database backend the app doesn't officially support yet. Either set enabled: false or remove postgresql.yaml for now.

  2. database.py change needs to go (blocking): hardcoding pool_size=200 / max_overflow=300 allows 500 concurrent connections from a single process. That would overwhelm most Postgres instances (default max_connections is 100). This change is also unrelated to the Helm chart itself. Please revert it.

  3. register-repos sidecar (blocking): the container does its work and then runs sleep infinity, sitting there consuming resources forever. This should be a Kubernetes Job or init container, not a sidecar.

  4. No Dockerfile: the chart references an image that doesn't exist yet. Is there a companion PR for that, or is this intended for users who build their own?

  5. Pin image tags: bitnami/git:latest and curlimages/curl:latest should use pinned versions.

  6. Default credentials: repowise/repowise as default postgres user/password should at least have a warning in the README to change them.

The chart scaffolding itself is solid. Happy to re-review once these are addressed.

… images

- database.py: pool_size/max_overflow now behind REPOWISE_HIGH_CONCURRENCY env
  var (off by default, auto-enabled in Helm). Keeps default behavior safe while
  enabling 3000 concurrent connections for production indexing workloads.
- Converted register-repos sidecar to init container (writes /data/repos.json
  manifest instead of sleep infinity). PVC is ReadWriteOnce so a separate Job
  can't mount it.
- Pinned bitnami/git to 2.47.1 (was :latest).
- Bumped PostgreSQL max_connections to 10000 for high-concurrency indexing.
- Added credential warnings in values.yaml and README.
- Added recommended production resource limits.
- Clarified Dockerfile is user-built in README.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yashGoyal40
Copy link
Copy Markdown
Author

yashGoyal40 commented Apr 13, 2026

Thanks for the thorough review! All points addressed, Helm chart tested end-to-end on a live GKE cluster. Here's the full breakdown:

1. PostgreSQL enabled by default

Keeping postgresql.enabled: true intentionally. SQLite only supports 1 write at a time — in production with monorepos or any concurrent indexing workload, this is a hard blocker. SQLite-backed indexing essentially serializes all writes, making it impractical for anything beyond single-user local dev. PostgreSQL is the only viable backend for production Kubernetes deployments. Added comments in values.yaml clarifying this.

2. database.py pool settings

Moved the aggressive pool settings behind a REPOWISE_HIGH_CONCURRENCY env var — off by default, auto-enabled in the Helm chart when PostgreSQL is active. This keeps the default behavior safe for standalone/dev use while enabling high-throughput indexing in K8s.

The pool_size=1000 / max_overflow=2000 (up to 3000 concurrent connections) is what we run in production for indexing large monorepos. Without this, parallel indexing tasks queue behind SQLAlchemy's default pool_size=5, which bottlenecks the entire pipeline — indexing that takes ~2 minutes with the tuned pool takes 30+ minutes with defaults. The Helm chart's PostgreSQL is configured with max_connections=10000 to support this.

Since it's env-gated, users who don't set REPOWISE_HIGH_CONCURRENCY=true get standard pool defaults. No risk of overwhelming anyone's Postgres.

3. register-repos sidecar → init container

Converted to an init container. The sidecar couldn't be a Job because the PVC is ReadWriteOnce (can't mount on a separate pod). The init container now clones repos and writes a /data/repos.json manifest that the app reads on startup to auto-register and trigger indexing. No more sleep infinity.

4. Dockerfile

Updated the Dockerfile with production fixes:

  • NEXT_PUBLIC_REPOWISE_API_URL= set to empty so the frontend uses relative API paths — works behind any reverse proxy/ingress without hardcoded URLs
  • pip install ".[all]" — added an all extra to pyproject.toml that includes postgres + graph-extra. Local SQLite users just do pip install . and don't get postgres deps.
  • Fixed upstream bug: Node.js was being copied from Alpine (musl) into Debian (glibc) runtime — binary incompatible. Switched back to installing via nodesource apt repo.
  • Fixed mkdir -p /data before chownVOLUME directive doesn't create the directory at build time.

Pre-built image published on Docker Hub: yashgoyal04/repowise:0.1.1 — chart defaults to this, no need to build your own.

5. Pin image tags

Pinned alpine/git to 2.47.2 (bitnami/git only has latest, no versioned tags). Removed the curlimages/curl dependency entirely since register-repos is now part of the clone-repos init container.

6. Default credentials

Added WARNING: Change these defaults before deploying to production! comments in values.yaml and a prominent note in the README with example override commands.

Additional changes

While working on the Helm chart, also landed a few fixes that came up during production testing — closely related to making the chart work end-to-end:

  • Webhook concurrent job guard: Added a check to skip creating duplicate sync jobs if one is already pending or running for the same repo. Prevents pile-ups when multiple pushes come in quick succession.
  • Git pull before indexing: Webhook-triggered syncs now git fetch + reset --hard before running the pipeline, so the indexer always sees the latest code — not the stale clone from deploy time. Wrapped in try/except so it's a no-op for local repos without a remote.
  • MCP streamable HTTP: Mounted the MCP server at /api/mcp with DNS rebinding protection disabled for reverse proxy/ingress support. Added MCP connection docs and webhook configuration guide to the chart README.

Tested on live GKE cluster

Fresh helm install in an isolated namespace — full flow verified:

  • ✅ PostgreSQL StatefulSet comes up healthy
  • wait-for-postgres init container detects readiness
  • clone-repos init container clones repo + writes /data/repos.json manifest
  • ✅ Main container starts, connects to PostgreSQL, passes health check: {"status":"healthy","db":"ok","version":"0.3.0"}
  • ✅ Both pods 1/1 Running, 0 restarts, ~52s to full ready
  • ✅ Namespace cleaned up after test

All changes pushed, conflicts with main resolved.

yashGoyal40 and others added 8 commits April 13, 2026 17:59
…e updates

- webhooks.py: fix bug where webhook created sync jobs but never launched
  them. Jobs now actually execute via asyncio.create_task(). Added concurrent
  job protection (skip if pending/running job exists for same repo). Added
  missing session.commit() before launching background task.
- job_executor.py: git fetch + reset --hard before indexing so webhook-triggered
  syncs always index the latest code, not stale local state.
- app.py: mount MCP streamable HTTP server at /api with session_manager
  lifecycle in lifespan. DNS rebinding protection disabled for reverse proxy
  support behind ingress.
- Dockerfile: empty NEXT_PUBLIC_REPOWISE_API_URL for relative API calls
  (works behind any reverse proxy). Install postgres extra by default.
- README: added MCP server docs and webhook configuration guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…feat/helm-chart

# Conflicts:
#	packages/server/src/repowise/server/routers/webhooks.py
…xtra)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Dockerfile now installs ".[all]" which includes postgres + graph-extra
  via the new all extra in pyproject.toml
- Helm chart defaults to yashgoyal04/repowise on Docker Hub (pre-built)
- README updated with pre-built image quickstart + build-your-own option

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… at build time)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bian

Alpine uses musl libc, python:3.12-slim is Debian/glibc — the copied
node binary fails with "required file not found". Install from nodesource
apt repo instead, matching the approach that worked before the security
hardening PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Alpine doesn't have /bin/bash. Replace heredoc with printf for
POSIX sh compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yashGoyal40
Copy link
Copy Markdown
Author

@RaghavChamadiya Ready for re-review when you get a chance. All feedback addressed + tested on a live GKE cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add Helm chart for Kubernetes deployment

2 participants