bawbel · chaksaray · Apr 20, 2026 · Apr 15, 2026 · Apr 17, 2026 · Apr 17, 2026
diff --git a/.gitleaks.toml b/.gitleaks.toml
@@ -0,0 +1,9 @@
+title = "bawbel-scanner gitleaks config"
+
+[allowlist]
+  description = "False positives in scanner source"
+  regexes = [
+    # _KEY_TO_DEFAULT_MODEL maps env var names (e.g. ANTHROPIC_API_KEY) to
+    # LiteLLM model strings — these are not secrets.
+    '"[A-Z_]+_API_KEY":\\s*"[a-z]',
+  ]
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,14 @@ Versioning follows [Semantic Versioning](https://semver.org/).
 
 ## [Unreleased]
 
+### Changed
+- LLM Stage 2 engine rewritten to use LiteLLM — supports any provider (Anthropic, OpenAI, Gemini, Mistral, Groq, Ollama, and 100+ more)
+- `BAWBEL_LLM_MODEL` env var controls which model to use (any LiteLLM model string)
+- Provider auto-detection from known API keys — set `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GEMINI_API_KEY`, etc.
+- 8/15 pattern rules now linked to AVE records (AVE-2026-00004 through 00008 wired)
+- `bawbel version` now shows the active LLM model name when Stage 2 is enabled
+- `pyproject.toml` `[llm]` extra now installs `litellm` instead of provider-specific packages
+
 ---
 
 ## [0.1.0] — 2026-04-19
@@ -27,7 +35,7 @@ First public release.
 - Stage 1a: pattern matching engine — stdlib only, zero dependencies, always runs
 - Stage 1b: YARA engine — optional, requires `yara-python`, 3 rules
 - Stage 1c: Semgrep engine — optional, requires `semgrep`, 5 rules
-- Stage 2: LLM semantic analysis — enabled by setting `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`
+- Stage 2: LLM semantic analysis via LiteLLM — works with any provider (Anthropic, OpenAI, Gemini, Mistral, Ollama, and more). Enable with `pip install "bawbel-scanner[llm]"` and set `BAWBEL_LLM_MODEL` or a provider API key
 
 **Output formats**
 - `text` — human-readable terminal output with severity icons

diff --git a/CLAUDE.md b/CLAUDE.md
diff --git a/README.md b/README.md
@@ -24,6 +24,7 @@ With optional engines:
 ```bash
 pip install "bawbel-scanner[yara]"      # YARA rules
 pip install "bawbel-scanner[semgrep]"   # Semgrep rules
+pip install "bawbel-scanner[llm]"       # LLM Stage 2 (any provider via LiteLLM)
 pip install "bawbel-scanner[all]"       # everything
 ```
 
@@ -139,7 +140,7 @@ pre-commit install
 | 1a | Pattern matching | Nothing (stdlib) | 15 rules, always runs |
 | 1b | YARA | `yara-python` | Binary + text pattern matching |
 | 1c | Semgrep | `semgrep` | Structural pattern matching |
-| 2 | LLM semantic | API key | Nuanced prompt injection |
+| 2 | LLM semantic | `pip install "bawbel-scanner[llm]"` + API key | Nuanced prompt injection, obfuscated payloads |
 | 3 | Behavioral | Docker + eBPF | Runtime behaviour (v1.0) |
 
 **15 built-in pattern rules** cover: goal override, jailbreak, hidden instructions,

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -35,7 +35,13 @@
 #   docker compose run --rm audit
 #
 # Environment variables (optional — set in .env file):
-#   ANTHROPIC_API_KEY   — enables Stage 2 LLM semantic analysis
+#   BAWBEL_LLM_MODEL    — LiteLLM model string (e.g. claude-haiku-4-5, gpt-4o-mini, ollama/mistral)
+#   ANTHROPIC_API_KEY   — enables Stage 2 via Claude (auto-selects claude-haiku-4-5)
+#   OPENAI_API_KEY      — enables Stage 2 via OpenAI (auto-selects gpt-4o-mini)
+#   GEMINI_API_KEY      — enables Stage 2 via Gemini
+#   MISTRAL_API_KEY     — enables Stage 2 via Mistral
+#   GROQ_API_KEY        — enables Stage 2 via Groq
+#   BAWBEL_LLM_ENABLED  — set false to disable Stage 2 even if key is present
 #   BAWBEL_LOG_LEVEL    — DEBUG | INFO | WARNING (default)
 #   SCAN_DIR            — override the scan directory (default: ./scan)
 
@@ -44,11 +50,16 @@ x-base: &base
     context: .
     dockerfile: Dockerfile
   environment:
-    BAWBEL_LOG_LEVEL:    ${BAWBEL_LOG_LEVEL:-WARNING}
-    ANTHROPIC_API_KEY:   ${ANTHROPIC_API_KEY:-}
-    OPENAI_API_KEY:      ${OPENAI_API_KEY:-}
+    BAWBEL_LOG_LEVEL:        ${BAWBEL_LOG_LEVEL:-WARNING}
+    BAWBEL_LLM_MODEL:        ${BAWBEL_LLM_MODEL:-}
+    BAWBEL_LLM_ENABLED:      ${BAWBEL_LLM_ENABLED:-true}
+    ANTHROPIC_API_KEY:       ${ANTHROPIC_API_KEY:-}
+    OPENAI_API_KEY:          ${OPENAI_API_KEY:-}
+    GEMINI_API_KEY:          ${GEMINI_API_KEY:-}
+    MISTRAL_API_KEY:         ${MISTRAL_API_KEY:-}
+    GROQ_API_KEY:            ${GROQ_API_KEY:-}
     PYTHONDONTWRITEBYTECODE: "1"
-    PYTHONUNBUFFERED:    "1"
+    PYTHONUNBUFFERED:        "1"
 
 x-scan-base: &scan-base
   <<: *base

diff --git a/docs/api/engines.md b/docs/api/engines.md
@@ -71,11 +71,32 @@ findings = run_semgrep_scan(resolved_file_path_string)
 
 ---
 
+### Stage 2 — LLM Engine (`engines/llm_engine.py`)
+
+- **Dependency:** `litellm` — `pip install "bawbel-scanner[llm]"`
+- **Always runs:** No — skips silently if litellm not installed or no model configured
+- **Providers:** Any LiteLLM-supported provider (Anthropic, OpenAI, Gemini, Mistral, Ollama, 100+ more)
+- **Activation:** Set `BAWBEL_LLM_MODEL` or a known provider API key
+
+```python
+from scanner.engines.llm_engine import run_llm_scan
+findings = run_llm_scan(file_content_string)
+```
+
+```bash
+# Provider examples
+export ANTHROPIC_API_KEY=sk-ant-...           # uses claude-haiku-4-5
+export OPENAI_API_KEY=sk-...                  # uses gpt-4o-mini
+export BAWBEL_LLM_MODEL=ollama/mistral        # local, no key needed
+export BAWBEL_LLM_MODEL=gemini/gemini-1.5-flash && export GEMINI_API_KEY=...
+```
+
+---
+
 ## Planned Engines
 
 | Engine | Stage | File | Status |
 |---|---|---|---|
-| LLM semantic analysis | 2 | `engines/llm_engine.py` | Planned v0.2.0 |
 | Behavioral sandbox | 3 | `engines/sandbox_engine.py` | Planned v1.0.0 |
 
 ---
@@ -97,7 +118,7 @@ Summary:
 `scanner/engines/__init__.py` exports all active engines:
 
 ```python
-from scanner.engines import run_pattern_scan, run_yara_scan, run_semgrep_scan
+from scanner.engines import run_pattern_scan, run_yara_scan, run_semgrep_scan, run_llm_scan
 ```
 
 To disable an engine temporarily: comment out its import in `__init__.py`.
diff --git a/docs/guides/configuration.md b/docs/guides/configuration.md
@@ -41,19 +41,46 @@ BAWBEL_SCAN_TIMEOUT_SEC=10 bawbel scan ./skills/
 
 ### Stage 2: LLM Semantic Analysis (optional)
 
+Stage 2 uses [LiteLLM](https://docs.litellm.ai) — works with any LLM provider.
+Install first: `pip install "bawbel-scanner[llm]"`
+
 | Variable | Default | Description |
 |---|---|---|
-| `ANTHROPIC_API_KEY` | — | Enables LLM analysis via Claude |
-| `OPENAI_API_KEY` | — | Alternative LLM provider |
-| `BAWBEL_LLM_MODEL` | `claude-sonnet-4-20250514` | LLM model to use |
-| `BAWBEL_LLM_MAX_TOKENS` | `1000` | Max tokens per LLM call |
-| `BAWBEL_LLM_TIMEOUT_SEC` | `60` | LLM call timeout |
+| `BAWBEL_LLM_MODEL` | auto-detected | LiteLLM model string — any provider |
+| `BAWBEL_LLM_MAX_CHARS` | `8000` | Max content chars sent to LLM |
+| `BAWBEL_LLM_TIMEOUT` | `30` | LLM call timeout in seconds |
+| `BAWBEL_LLM_ENABLED` | `true` | Set `false` to disable Stage 2 |
+
+Provider API keys — set whichever you use:
 
-Stage 2 is disabled by default. Set an API key to enable it:
+| Key | Default model |
+|---|---|
+| `ANTHROPIC_API_KEY` | `claude-haiku-4-5` |
+| `OPENAI_API_KEY` | `gpt-4o-mini` |
+| `GEMINI_API_KEY` | `gemini/gemini-1.5-flash` |
+| `MISTRAL_API_KEY` | `mistral/mistral-small` |
+| `GROQ_API_KEY` | `groq/llama3-8b-8192` |
+
+Stage 2 activates as soon as `litellm` is installed and a key (or model) is set:
 
 ```bash
+# Anthropic
+pip install "bawbel-scanner[llm]"
 export ANTHROPIC_API_KEY=sk-ant-...
-bawbel scan ./skill.md   # now runs semantic analysis
+bawbel scan ./skill.md
+
+# OpenAI
+export OPENAI_API_KEY=sk-...
+bawbel scan ./skill.md
+
+# Local Ollama (no API key needed)
+export BAWBEL_LLM_MODEL=ollama/mistral
+bawbel scan ./skill.md
+
+# Explicit model override (any LiteLLM model string)
+export BAWBEL_LLM_MODEL=gemini/gemini-1.5-flash
+export GEMINI_API_KEY=...
+bawbel scan ./skill.md
 ```
 
 ### Stage 3: Behavioral Sandbox (future)
@@ -86,9 +113,8 @@ output:
   file: bawbel-results.sarif
 
 llm:
-  enabled: false          # set true to enable Stage 2
-  provider: anthropic
-  model: claude-sonnet-4-20250514
+  enabled: false          # set true to enable Stage 2 (requires bawbel-scanner[llm])
+  model: claude-haiku-4-5 # any LiteLLM model string
 ```
 
 ---
@@ -110,10 +136,15 @@ if r.returncode == 0:
 else:
     print('✗ semgrep — install: pip install semgrep')
 
-import os
-if os.environ.get('ANTHROPIC_API_KEY') or os.environ.get('OPENAI_API_KEY'):
-    print('✓ LLM key set — Stage 2 enabled')
-else:
-    print('✗ No LLM key — Stage 2 disabled')
+try:
+    import litellm
+    from scanner.engines.llm_engine import _resolve_model
+    model = _resolve_model()
+    if model:
+        print(f'✓ LLM Stage 2 enabled — model={model}')
+    else:
+        print('✗ LLM installed but no model set — set BAWBEL_LLM_MODEL or a provider API key')
+except ImportError:
+    print('✗ litellm not installed — pip install "bawbel-scanner[llm]"')
 "
 ```
diff --git a/docs/guides/docker.md b/docs/guides/docker.md
@@ -162,26 +162,35 @@ docker compose run --rm test
 Pass environment variables to enable optional features:
 
 ```bash
-# Enable Stage 2 LLM semantic analysis
+# Stage 2 LLM — Anthropic
 docker run --rm \
   -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
   -v /path/to/skills:/scan:ro \
-  bawbel/scanner:0.1.0 \
-  scan /scan --recursive
+  bawbel/scanner:0.1.0 scan /scan --recursive
 
-# Set log level
+# Stage 2 LLM — OpenAI
 docker run --rm \
-  -e BAWBEL_LOG_LEVEL=DEBUG \
+  -e OPENAI_API_KEY=$OPENAI_API_KEY \
   -v /path/to/skills:/scan:ro \
-  bawbel/scanner:0.1.0 \
-  scan /scan
+  bawbel/scanner:0.1.0 scan /scan
+
+# Stage 2 LLM — explicit model (any LiteLLM provider)
+docker run --rm \
+  -e BAWBEL_LLM_MODEL=gemini/gemini-1.5-flash \
+  -e GEMINI_API_KEY=$GEMINI_API_KEY \
+  -v /path/to/skills:/scan:ro \
+  bawbel/scanner:0.1.0 scan /scan
 
-# Use a .env file
-echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
+# Use a .env file (recommended — works with any provider)
 docker run --rm --env-file .env \
   -v /path/to/skills:/scan:ro \
-  bawbel/scanner:0.1.0 \
-  scan /scan
+  bawbel/scanner:0.1.0 scan /scan
+
+# Set log level
+docker run --rm \
+  -e BAWBEL_LOG_LEVEL=DEBUG \
+  -v /path/to/skills:/scan:ro \
+  bawbel/scanner:0.1.0 scan /scan
 ```
 
 ---

diff --git a/docs/guides/getting-started.md b/docs/guides/getting-started.md
@@ -46,7 +46,7 @@ Detection Engines:
   ✓  Pattern     15 rules  ·  stdlib only  ·  always active
   ✗  YARA        not installed  ·  pip install "bawbel-scanner[yara]"
   ✗  Semgrep     not installed  ·  pip install "bawbel-scanner[semgrep]"
-  ✗  LLM         no API key  ·  set ANTHROPIC_API_KEY to enable Stage 2
+  ✗  LLM         not installed  ·  pip install "bawbel-scanner[llm]"
 ```
 
 ---

diff --git a/pyproject.toml b/pyproject.toml
@@ -46,8 +46,8 @@ dependencies = [
 # Detection engines (optional — scanner works without them)
 yara    = ["yara-python>=4.5.0"]
 semgrep = ["semgrep>=1.60.0"]
-llm     = ["litellm>=1.30.0"]
-all     = ["yara-python>=4.5.0", "semgrep>=1.60.0", "litellm>=1.30.0"]
+llm     = ["litellm>=1.30.0", "jsonschema~=4.25.1"]
+all     = ["yara-python>=4.5.0", "semgrep>=1.60.0", "litellm>=1.30.0", "jsonschema~=4.25.1"]
 
 # Development tooling
 dev = [

diff --git a/requirements.txt b/requirements.txt
@@ -1,9 +1,11 @@
 # Core dependencies — required for all installs
-# Optional engines: pip install "bawbel-scanner[yara]" or "[semgrep]" or "[all]"
+# Optional engines: pip install "bawbel-scanner[yara]"    — YARA rules
+#                   pip install "bawbel-scanner[semgrep]" — Semgrep rules
+#                   pip install "bawbel-scanner[llm]"     — LiteLLM Stage 2 (any provider)
+#                   pip install "bawbel-scanner[all]"     — everything
 # Dev tools:        pip install "bawbel-scanner[dev]"
 # See pyproject.toml for full dependency groups
 
 click>=8.1.0
 rich>=13.7.0
-requests>=2.31.0
 pydantic>=2.5.0
diff --git a/scanner/cli.py b/scanner/cli.py
@@ -352,10 +352,8 @@ def report_cmd(path: str, fmt: str) -> None:
     name = Path(result.file_path).name
     console.print(f"[dim]Report for:[/]  [bold white]{name}[/]")
     console.print(f"[dim]Type:[/]        [bold white]{result.component_type}[/]")
-    console.print(
-        "[dim]AVE Standard:[/] "
-        "[link=https://github.com/bawbel/bawbel-ave]github.com/bawbel/bawbel-ave[/link]"
-    )
+    ave_url = "https://github.com/bawbel/bawbel-ave"
+    console.print(f"[dim]AVE Standard:[/] [link={ave_url}]github.com/bawbel/bawbel-ave[/link]")
     console.print()
 
     if result.has_error:
@@ -396,9 +394,10 @@ def report_cmd(path: str, fmt: str) -> None:
         table.add_column("value", style="white")
 
         if f.ave_id:
+            ave_base = "https://github.com/bawbel/bawbel-ave/blob/main/records"
             table.add_row(
                 "AVE ID",
-                f"[link=https://github.com/bawbel/bawbel-ave/blob/main/records/{f.ave_id}.md]{f.ave_id}[/link]",  # noqa: E501
+                f"[link={ave_base}/{f.ave_id}.md]{f.ave_id}[/link]",
             )
         table.add_row("Rule ID", f.rule_id)
         table.add_row("CVSS-AI", f"{f.cvss_ai:.1f} / 10.0")
@@ -475,13 +474,13 @@ def version_cmd() -> None:
         )
     except ImportError:
         console.print(
-            "  [dim]✗  YARA        not installed  ·  " 'pip install "bawbel-scanner[yara]"[/]'
+            "  [dim]✗  YARA        not installed  ·  " 'pip install "bawbel-scanner\\[yara]"[/]'
         )
 
     try:
-        import subprocess  # nosec B404  # noqa: S404
+        import subprocess  # nosec B404 # noqa: S404
 
-        r = subprocess.run(  # nosec B603 B607  # noqa: S603,S607
+        r = subprocess.run(  # nosec B603 B607 # noqa: S603 S607
             ["semgrep", "--version"],
             capture_output=True,
             text=True,
@@ -492,19 +491,33 @@ def version_cmd() -> None:
             console.print(f"  [bold #1DB894]✓[/]  Semgrep     " f"[dim]v{ver}  ·  active[/]")
         else:
             raise FileNotFoundError
-    except Exception:
+    except Exception:  # noqa: B014
         console.print(
-            "  [dim]✗  Semgrep     not installed  ·  " 'pip install "bawbel-scanner[semgrep]"[/]'
+            "  [dim]✗  Semgrep     not installed  ·  " 'pip install "bawbel-scanner\\[semgrep]"[/]'
         )
 
-    import os
+    try:
+        import litellm  # noqa: F401
+
+        llm_installed = True
+    except ImportError:
+        llm_installed = False
+
+    from scanner.engines.llm_engine import _resolve_model
 
-    llm_key = os.environ.get("ANTHROPIC_API_KEY") or os.environ.get("OPENAI_API_KEY")
-    if llm_key:
-        console.print("  [bold #1DB894]✓[/]  LLM         " "[dim]API key set  ·  Stage 2 active[/]")
+    active_model = _resolve_model() if llm_installed else None
+
+    if llm_installed and active_model:
+        console.print(
+            f"  [bold #1DB894]✓[/]  LLM         " f"[dim]{active_model}  ·  Stage 2 active[/]"
+        )
+    elif llm_installed and not active_model:
+        console.print(
+            "  [dim]✗  LLM         installed  ·  " "set BAWBEL_LLM_MODEL or a provider API key[/]"
+        )
     else:
         console.print(
-            "  [dim]✗  LLM         no API key  ·  " "set ANTHROPIC_API_KEY to enable Stage 2[/]"
+            "  [dim]✗  LLM         not installed  ·  " r'pip install "bawbel-scanner\[llm]"[/]'
         )
 
     console.print()
@@ -600,7 +613,10 @@ def _print_sarif(results: list[ScanResult]) -> None:
             )
 
     sarif = {
-        "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",  # noqa: E501
+        "$schema": (
+            "https://raw.githubusercontent.com/oasis-tcs/sarif-spec"
+            "/master/Schemata/sarif-schema-2.1.0.json"
+        ),
         "version": "2.1.0",
         "runs": [
             {