Skip to content

feat(ccproxy): v2.0.0 — inspector architecture, lightllm, DAG pipeline, compliance#16

Open
starbaser wants to merge 369 commits into
mainfrom
dev
Open

feat(ccproxy): v2.0.0 — inspector architecture, lightllm, DAG pipeline, compliance#16
starbaser wants to merge 369 commits into
mainfrom
dev

Conversation

@starbaser
Copy link
Copy Markdown
Owner

@starbaser starbaser commented Apr 16, 2026

AI Summary

Complete rewrite of ccproxy from a LiteLLM proxy subprocess model to an in-process mitmproxy-based transparent LLM API interceptor. This is the v2.0.0 release (tagged v2.0.0-rc1).

  • Inspector architecture: mitmweb runs in-process via WebMaster API with dual listeners — reverse proxy + WireGuard namespace jail. No subprocess, no gateway server.
  • lightllm: Surgical nerve connector into LiteLLM's BaseConfig transformation pipeline, bypassing cost tracking and callback machinery entirely.
  • DAG-based hook pipeline: @hook(reads=..., writes=...) decorator-declared data dependencies, topologically sorted via Kahn's algorithm. Per-request overrides via x-ccproxy-hooks header.
  • SSE streaming: SseTransformer stateful stream callable — parses, transforms per-chunk via LiteLLM's provider iterators, re-serializes as OpenAI-format SSE.
  • Compliance profile learning: Provider-agnostic system that observes legitimate request shapes from WireGuard traffic and stamps compliance profiles onto proxied requests.
  • Gemini/Vertex AI support: Full routing, OAuth handling, context caching via cachedContents API, path rewriting for cloudcode-pa.googleapis.com.
  • Flows CLI: ccproxy flows list/dump/diff/compare/clear with multi-page HAR 1.2 output, jq filtering, and sliding-window diff across flow sets.
  • MCP notification endpoint: POST /mcp/notify for terminal event ingestion, buffered and injected as synthetic tool_use/tool_result pairs.
  • XDG config directory: Default config moved to ~/.config/ccproxy/ (breaking change).
  • init replaces install: CLI rename (breaking change).
  • Rich pipeline visualization: render_pipeline() builds a full DAG display with parallel groups via rich.columns.Columns.

Breaking Changes

  • Config directory: ~/.ccproxy/~/.config/ccproxy/
  • CLI: ccproxy installccproxy init
  • --debug flag replaced by --log-level / -v
  • forward_port / reverse_port replaced by unified port config
  • mitm config section renamed to inspect
  • Prisma/database infrastructure removed entirely
  • LiteLLM proxy subprocess removed
  • to_mermaid / to_ascii removed from HookDAG

Test plan

  • just test passes with ≥90% coverage
  • just lint / just typecheck clean
  • Smoke test: ccproxy run --inspect -- claude --model haiku -p "what's 2+2"
  • Verify ccproxy init creates config at ~/.config/ccproxy/
  • Verify flows CLI: ccproxy flows list, ccproxy flows dump
  • Verify Gemini routing through inspector

starbaser added 30 commits April 9, 2026 12:50
- Defer web/streaming options via update_defer() since mitmproxy 12.x
  registers them through addons inside WebMaster.__init__, not on Options
- Replace nonexistent --port-map flag with add_hostfwd API socket call
  (slirp4netns never had --port-map; this was a latent bug)
- Bind LiteLLM to 0.0.0.0 in inspect mode so slirp4netns hostfwd
  traffic arriving at tap0 IP (10.0.2.100) reaches it without iptables
- Pass litellm_port (not main_port) to gateway namespace — mitmproxy
  reverse proxy needs to reach LiteLLM, not the other way around
…aul logging

Remove the vendored mitmpcap PCAP synthesizer (fake TCP/IP frame reconstruction)
and replace with mitmproxy's native MITMPROXY_SSLKEYLOGFILE for real TLS key
logging. Combined with the existing WireGuard keylog, packet captures can now be
fully decrypted in Wireshark without synthetic frames.

Overhaul logging to use unified tagged namespaces across all components:
- Rewrite setup_logging() with stderr + truncate-on-restart file handler
- Initialize config singleton early in main() for correct debug level
- Route LiteLLM subprocess output through ccproxy.subprocess.litellm logger
- Route slirp4netns output through ccproxy.subprocess.slirp4netns logger
- Add nsenter command logging via ccproxy.subprocess.nsenter logger
- Disable mitmproxy TermLog to prevent root logger hijack
- Remove competing debug handler from CCProxyHandler.__init__
- Fix view_logs() missing -n flag for process-compose, add file fallback
- Fix show_status() to report actual log file path
- Gate web_open_browser on config, pass MitmproxyOptions through directly

Deleted: inspector/pcap.py, tests/test_pcap.py, inspector/script.py references
Gemini CLI targets cloudcode-pa.googleapis.com (Google's proprietary
Cloud Code API), which LiteLLM doesn't understand natively. Route this
traffic through LiteLLM's /gemini/ pass-through endpoint with outbound
host/path restoration so the correct upstream is reached.

- Change forward_domains from list[str] to dict[str, str | None]
  where the value is the LiteLLM endpoint prefix (e.g. /gemini/) or
  None for direct forwarding
- Add OriginalRequest dataclass to FlowRecord for storing the
  pre-rewrite host/port/scheme/path
- Propagate flow ID through LiteLLM pass-through via x-pass- prefix
  (LiteLLM strips custom headers by default but always forwards
  x-pass-* headers with the prefix stripped)
- Outbound handler looks up FlowRecord via flow ID header and
  restores original host/path before the request hits the provider
- Split pyright (editor, standard mode) and mypy (CI, explicit strict
  flags) to eliminate cast+redundant-cast friction per Stainless SDK
  pattern: disable warn_unused_ignores and warn_redundant_casts
- Add litellm stub modules for litellm_core_utils and proxy internals
- Remove dead else-branch in hook registration loop (hooks list is
  typed list[str | dict], so the else was unreachable)
- Annotate double-check lock pattern in ModelRouter with
  type: ignore[unreachable] since mypy can't model concurrent mutation
Introduces ccproxy.lightllm — a thin orchestration layer that imports
LiteLLM's BaseConfig transformation pipeline directly and exposes it at
the mitmproxy inspector layer. Zero vendored code; pure import glue.

- dispatch.py: sequences validate_environment → get_complete_url →
  transform_request → sign_request for standard providers, with a
  dedicated Gemini path using _get_gemini_url + _transform_request_body
- registry.py: wraps ProviderConfigManager (~90 providers for free)
- noop_logging.py: duck-type stub for logging_obj parameter
- inspector/routes/transform.py: mitmproxy route handler that matches
  InspectorConfig.transforms rules and rewrites flows to dest provider
- TransformRoute config model on InspectorConfig.transforms
- Transform router added to addon chain (after inbound, before outbound)
- docs/light_llm_transform.md: full architecture reference
…l redaction

- Strip ?key= from Gemini URL when using OAuth tokens (ya29.*), use
  Authorization: Bearer header only
- Add match_model to TransformRoute for reverse proxy flows where all
  traffic arrives at the same host
- Make match_host optional (None matches any host)
- Parse request body before matching so match_model can inspect it
- Collect hosts from pretty_host, Host header, and X-Forwarded-Host
- Redact query params from transform log output (prevents credential leak)
The transform route now supports mode=passthrough which restores the
original destination from FlowRecord.original_request, bypassing LiteLLM
entirely. This fixes Gemini CLI routing — _maybe_forward rewrites
cloudcode-pa.googleapis.com traffic to LiteLLM's /gemini/ pass-through,
which incorrectly routes to generativelanguage.googleapis.com. The
passthrough mode intercepts at the inbound layer and sends traffic
directly to cloudcode-pa.googleapis.com with the CLI's own OAuth token.

Verified: `ccproxy run --inspect -- gemini -p "..."` returns correct
responses through the passthrough route.
…request path

The lightllm nerve connector now handles all provider transformations
directly at the mitmproxy layer. Traffic flows client → mitmweb →
[inbound → transform → outbound] → provider with no LiteLLM subprocess
or second WireGuard tunnel.

- Remove _maybe_forward(), gateway direction detection, litellm_port
- Collapse three mitmproxy listeners to two (reverse + WG-CLI)
- Delete create_gateway_namespace() and run_in_namespace_async()
- Remove forward_domains from InspectorConfig
- Rewrite outbound routes for post-transform fixups (beta headers,
  Claude Code identity injection, auth failure observation)
- Add fallback policy: WG flows passthrough, reverse proxy gets 501
…spector

Context is now flow-native — wraps HTTPFlow as first-class member with
body fields parsed once and flushed via commit(). Header mutations are
live. Removes from_litellm_data/to_litellm_data.

PipelineExecutor.execute() takes HTTPFlow directly. Two-DAG addon chain:
inbound pipeline (OAuth, session extraction) → transform (lightllm) →
outbound pipeline (beta headers, identity injection).

Hooks adapted for flow-native Context:
- forward_oauth: sentinel substitution + cached token via set_header()
- add_beta_headers: single-write merge, anthropic-version guard
- inject_claude_code_identity: string + list system types
- extract_session_id: reads ctx.metadata, drops Langfuse plumbing
- verbose_mode: strips redact-thinking-* via get/set_header()

Config hooks field now supports inbound/outbound dict structure.
Remove handler.py, router.py, metadata_store.py, classifier.py, rules.py,
patches/, and LiteLLM-only hooks (rule_evaluator, model_router,
forward_apikey, capture_headers). Delete inbound.py and outbound.py
route handlers (replaced by DAG pipeline).

ccproxy start no longer has --inspect flag — inspect mode is the
default. The non-inspect LiteLLM subprocess path is removed along with
generate_handler_file(). ccproxy run --inspect remains for WG namespace
jail.

Update Nix defaults and YAML template to two-stage hook dict format.
Strip RuleConfig, patches, default_model_passthrough from config.

-9,470 lines deleted across 42 files.
Old CLAUDE.md documented the deleted LiteLLM handler/classifier/router
pipeline. Rewritten from scratch to reflect the current architecture:
mitmweb in-process, lightllm nerve connector, DAG-driven hook pipeline,
single WireGuard tunnel. Marketplace plugin sync section preserved.
…oxy.yaml

LiteLLM proxy was removed but config.yaml (its config file) persisted as
dead weight. Delete it and promote host/port to first-class CCProxyConfig
fields with CCPROXY_ env prefix override via pydantic-settings.
…rmation

Universal SSE streaming: responseheaders hook on InspectorAddon detects
text/event-stream responses and enables flow.response.stream before the
body arrives — fixes client hanging for all providers.

Cross-provider response transformation: SseTransformer wraps LiteLLM's
per-provider ModelResponseIterator.chunk_parser() to rewrite SSE chunks
on the fly. Non-streaming responses use transform_to_openai() via a
MitmResponseShim that duck-types httpx.Response.

TransformMeta on FlowRecord propagates provider/model/request_data from
request phase to response phase.
extract_session_id wrote session_id into the body's metadata dict, which
upstream APIs reject (Anthropic: "Extra inputs are not permitted",
Google: "Unknown name metadata"). Store on flow.metadata instead.

Context.metadata getter uses setdefault which creates an empty metadata
key even for read-only guard checks. Strip empty metadata dicts in
commit() so they don't leak into the request body.
extract_session_id declared writes=["session_id"] but now writes to
flow.metadata — update to writes=[]. inject_mcp_notifications read
session_id from ctx.metadata (body) which was always empty after the
previous fix — read from flow.metadata instead.
Hardcoded 40-char width caused right border misalignment when parallel
group labels overflowed. Width now computed from longest content line.
…urce

The LiteLLM proxy server was removed several commits ago but many files
still described the old architecture. This commit systematically removes
every stale reference: rewrites README, configuration, and inspect docs
from scratch; deletes the superseded skills/using-litellm-ccproxy skill;
drops 8 unused dependencies from pyproject.toml; removes 9 dead type
stubs; fixes source docstrings/comments/types across 6 source files;
and cleans infrastructure files (process-compose, docker-compose, nix
module, .gitignore).
# Conflicts:
#	README.md
#	pyproject.toml
#	src/ccproxy/templates/config.yaml
# Conflicts:
#	pyproject.toml
Remove stale litellm-db postgres reference from Docker services,
correct type stubs listing (litellm/opentelemetry/xepor, not mitmproxy).
…d retry

Remove dead oauth_ttl/oauth_refresh_buffer machinery — tokens were loaded
once at startup and never proactively refreshed. Now on 401, the credential
source is re-read (file) or re-run (command); if the token changed, the
request is retried with the fresh value via httpx. Unchanged tokens fail
through as truly stale credentials.

Also moves skills/ to plugin root per Claude Code plugin spec and updates
plugin.json to reflect the current mitmproxy-based architecture.
…tion

Extract shared CredentialSource base model from OAuthSource — supports
`file` (read path) and `command` (run shell) credential resolution.
MitmproxyOptions.web_password now accepts CredentialSource for stable,
deterministic mitmweb auth via 1Password or opnix secrets.

Fix mitmproxy web_password: update_defer doesn't trigger WebAuth.configure,
so web_password is now set via opts.update() after WebMaster creation.
Status command resolves the credential source to show the full tokenized URL.
Capture the full pre-pipeline client request (method, URL, headers, body)
in InspectorAddon.request() before any hooks mutate the flow. Expose it
via a custom mitmproxy content view at /flows/{id}/request/content/client-request
and a ccproxy.clientrequest command for structured JSON access.

Renames OriginalRequest → ClientRequest using canonical MITM terminology:
client request (what the caller sent) vs forwarded request (post-pipeline).
…ystem

Replace hardcoded add_beta_headers and inject_claude_code_identity hooks
with a dynamic observation-based system that learns compliance contracts
from legitimate CLI traffic and applies them to SDK requests.

Observation is built into InspectorAddon.request() pre-pipeline, reading
raw ClientRequest snapshots from WireGuard flows. Application runs as
the last outbound pipeline hook on reverse proxy flows after transform.
Profiles are persisted to {config_dir}/compliance_profiles.json and
keyed by (provider, user_agent). An Anthropic v0 seed profile bootstraps
from existing constants to prevent regression.
… defaults

Add x-goog-api-key to HEADER_EXCLUSIONS (Google's API key header should
not be stamped onto other requests). Update nix/defaults.nix to use the
new compliance hooks instead of the deprecated add_beta_headers and
inject_claude_code_identity.
Replace deprecated add_beta_headers/inject_claude_code_identity hook
references with apply_compliance. Document the compliance/ subsystem
and add ProfileStore to singleton patterns.
Three transform modes: redirect (default) preserves request body and
rewrites destination host for same-format flows (Anthropic→Anthropic,
Gemini→Gemini). Transform mode runs lightllm for cross-format
conversion. Passthrough leaves everything unchanged.

Also adds dest_host field to TransformRoute and excludes x-goog-api-key
from compliance profiling.
starbaser added 27 commits May 20, 2026 20:34
parse_sync and render_outbound_sync previously created a private event
loop and called run_until_complete unconditionally. When invoked from a
sync hook running inside mitmproxy's async runtime (e.g.
inject_mcp_notifications reading ctx.messages), asyncio raised
"Cannot run the event loop while another loop is running" because
nested run_until_complete in the same thread isn't allowed. Add a
worker-thread fallback: if a running loop is detected on the current
thread, dispatch the awaitable to a ThreadPoolExecutor that owns its
own private loop. The CaptureSentinel pattern keeps this bounded.
Replace the CaptureSentinel + AnthropicModel/OpenAIChatModel instantiation
hack with pydantic-graph FSM dumps and per-listener parsers with FSM loads.
The new lightllm/graph/ package owns dispatch_load / dispatch_dump_sync;
Context.ensure_parsed and inspector/routes/transform.py call through it.

Anthropic and OpenAI dumps build their wire bodies directly from typed
SDK TypedDicts (anthropic.types.beta.*, openai.types.chat.*) via per-IR-part
nodes routed by structural pattern matching, with an ApplyCacheNode middleware
that attaches cache_control to the last-emitted block. Google and Perplexity
dumps move into the graph package under their original mechanisms (Google still
wraps pydantic-ai's GoogleModel; Perplexity remains a clean IR-to-helper
bridge).

KEEPS Context._run_coro_sync and the worker-thread bridge. pydantic_graph's
Graph.run_sync is deprecated and uses loop.run_until_complete (graph.py:189),
which crashes inside mitmproxy's running asyncio loop -- the bug commit 14b8904
already fixed. The FSM nodes are async def run(...); they are driven via
await graph.run(...) inside the bridge.

1689 tests pass, matching baseline d95834d. Lossiness regressions for
tool_name two-pass, image media_type, non-standard cache TTLs, and unknown
content blocks are preserved verbatim. Test files renamed to
tests/test_lightllm_graph_*.py with the implementation parametrize collapsed
to fsm-only.
AGENTS.md becomes the tracked canonical (Codex native).
CLAUDE.md is a small file containing @AGENTS.md (Claude Code import).
Both files tracked; consistent across all user repos.
Migrates anthropic_dump, openai_dump, and openai_load from
pydantic_graph's BaseNode class-based FSM to pydantic_graph.beta's
GraphBuilder step-based FSM. Replaces class-per-operation with
function-per-operation for cleaner dispatch.
Migrates from pydantic_graph's BaseNode class hierarchy to
pydantic_graph.beta's GraphBuilder pattern with typed dispatch
envelopes, eliminating boilerplate run() methods while preserving the
same FSM logic.
mypy 1.19 does not recognize pydantic_graph.beta's infer_variance
TypeVars as generic at runtime, causing cascading type errors in FSM
wire-translation modules that pyright handles correctly.
…e litellm

Completes the bi-modal → symmetric-FSM migration planned in nextplan.md
(phases J–S). New graph/*_intake.py + graph/*_render.py modules plus
graph/sse_pipeline.py (persistent asyncio loop per stream) and
graph/buffered.py replace the hand-rolled lightllm/response/ subpackage
and the LiteLLM-mediated dispatch.py + context_cache.py + noop_logging.py.

litellm is removed from src/ and pyproject.toml; the request and response
sides now share one FSM idiom, one dispatcher pattern, and one IR boundary
in both directions.
Replaces the four FSM modules (anthropic_load, anthropic_dump,
openai_load, openai_dump) with procedural AnthropicAdapter and
OpenAIChatAdapter classes that extend pydantic-ai's UIAdapter. Removes
dispatch_load and simplifies the request-side translation to synchronous
code using MessagesBuilder and SDK TypedDicts directly.
- adapters/google.py: direct generateContent wire construction; kills
  CaptureSentinel exception-capture hack in graph/google_dump.py (deleted)
- adapters/perplexity.py: thin wrapper around pplx.py:_build_pplx_payload;
  graph/perplexity_dump.py deleted (now 1-line indirection)
- graph/__init__.py:dispatch_dump_sync routes all providers (Anthropic,
  OpenAI, Google, Perplexity) through adapters/; async dispatch_dump kept
  only as test-compat shim
- lightllm/graph_ext.py: monkey-patches GraphBuilder.add_subgraph and
  wraps Graph.render so future SSE FSM refactors can compose subgraphs.
  Applied at lightllm import time via idempotent apply_patches()
- pipeline/results.py: Temporal-style HookResult discriminated union
  (_HookSuccess | _HookSkipped | _HookError | _HookDeferred) with
  wrap/unwrap helpers; executor.py captures every invocation, flow
  records carry structured failure metadata
- adapters/{anthropic,openai_chat,_envelope}.py: thread raw_extras
  through load_messages so refusal text, INVALID_JSON wrapping,
  image_detail, file blocks, unknown blocks, and non-standard cache TTLs
  all survive round-trip
- _envelope.py:_render_anthropic re-attaches anthropic_cache_instructions
  to system blocks at dump time
- hooks/pplx_thread_inject.py: fix pre-existing mypy arg-type +
  no-any-return on the thread-fetch helper
Bumps pydantic-ai-slim / pydantic-graph to >=1.99.0 (resolved 1.101.0)
to escape the deprecated pydantic_graph.beta namespace and pick up the
typed-promotion ModelResponsePartsManager API. All six lightllm/graph/
intake/render modules now import from canonical pydantic_graph paths.

Adapters: Google and Perplexity are full UIAdapter subclasses for parity
with Anthropic/OpenAI; load_messages raises NotImplementedError since
both are outbound-only. Each adapter gains a render(req) classmethod
that takes an LLMRenderInput Protocol and returns wire bytes;
dispatch_dump_sync now routes through these.

Context owns typed IR state directly via five lazy-parsed slots
(_cached_messages, _cached_system, _cached_request_parameters,
_cached_settings, _cached_raw_extras); parse_sync returns None and
populates in-place. The previous ParsedRequest bridge is gone from the
production hot path. ParsedRequest survives in parsed.py as a frozen
LLMRenderInput stub used by tests and the inspector flow-enrichment
shim parse_request(); ParsedResponse was unused and removed.

graph_ext.py and its add_subgraph monkey-patch are deleted along with
the 5 covering tests — subgraph composition is the wrong granularity
for request-side dump methods (9-73 line ranges, no dispatch ladders)
and the canonical pydantic_graph.GraphBuilder has no add_subgraph
either. If response-side intake decomposition (Phase F Stages 2-5)
materializes later, it lands on canonical primitives.

Other 1.99 deprecation rebasing: BuiltinToolCallPart →
NativeToolCallPart in anthropic_intake/render;
ModelResponsePartsManager(model_request_parameters=...) threaded
through all four intake constructors; pydantic-ai-slim acquires the
[anthropic] optional group (no longer bundled). Ruff cleanup picks up
ListenerFormat → StrEnum and the SIM108/SIM102/RUF002 leftovers in
lightllm/.

docs/lightllm.md rewritten to reflect the post-refactor architecture,
HookResult discriminated union, LLMRenderInput Protocol, and adapter
walkthrough. 1659 tests pass (baseline 1664 minus the 5 graph_ext
tests); mypy + ruff clean tree-wide; inspector smoke
(claude --model haiku) succeeds end-to-end.
- The FSM pattern section used invented dump-side symbol names
  (AnthropicDumpState, parse_text, _DumpDone, apply_cache, _dump_graph,
  render_anthropic_dump) that don't exist in the codebase. Replaced with
  the real anthropic_intake.py shape (_AnthropicIntakeState,
  frame_next_event, handle_content_block_*, _FeedDone, _IgnoredEvent,
  _intake_graph, AnthropicResponseIntakeFSM.feed). Reframed to make clear
  the FSM idiom is response-side only; request side is procedural adapter
  classmethods.
- GoogleAdapter description claimed it wraps pydantic-ai's GoogleModel.
  It doesn't — it does direct generateContent wire construction
  (camelCase keys, base64 inline data, generationConfig hoist).
- Roundtrip test snippet showed AnthropicAdapter.load_messages returning
  a (messages, settings, raw_extras) tuple. Actual signature returns
  list[ModelMessage]; settings and raw_extras come from envelope helpers
  and are passed through via raw_extras kwarg.
- Visualization example imported _dump_graph from anthropic_dump (deleted
  module). Replaced with _intake_graph from anthropic_intake and listed
  the other graph names.
- Lossiness invariants section dropped the obsolete "pre-FSM wire.py
  predecessor" reference; rewrote to describe the current adapter
  contract instead.
- File map deduplicated the SSE pipeline row.
The aa20968 refactor moved Context's cached IR state from a single
``_parsed: ParsedRequest | None`` slot into five lazy-parsed fields
(``_cached_messages``, ``_cached_request_parameters``,
``_cached_settings``, ``_cached_raw_extras``, ``_cached_system``).
``Context.commit()`` re-renders the IR back to ``_body`` whenever ANY
of these are populated.

When an earlier outbound hook (``commitbee_compat``, which always reads
``ctx.system``) triggers ``parse_sync()``, all five slots get populated
from the pre-shape body. The shape hook then replaces ``ctx._body`` with
the captured Claude CLI envelope via ``apply_shape`` — but the cached
IR is now stale. ``commit()`` re-renders the IR back to bytes, clobbering
the shape's envelope: forwarded body ships only ``{model, messages,
max_tokens}`` with no ``system``, no ``metadata``, no billing header.

For Claude-CLI clients this still worked accidentally because their
own request body carries the right shape. For plain Anthropic-SDK
clients sending sentinel keys, Anthropic's anti-abuse path returns
429 ``rate_limit_error`` with empty ``message: "Error"`` when it sees
Claude-CLI headers attached to a bare SDK body.

Fix: ``apply_shape`` calls ``ctx.invalidate_parsed()`` after writing
``_body``, dropping the stale cache so ``commit()`` sees no cached state
and leaves ``_body`` (the shape) alone. Verified with
``docs/sdk/anthropic_sdk.py`` against the dev daemon — both simple and
streaming requests now return 200.

Tests still pass (1659).
Closes the deferred Phase F (per-step decomposition) and Phase H (typed
part promotion) items from next.md, plus fixes two pre-existing bugs the
work surfaced.

Phase F — subgraph composition via temporary GraphBuilder.add_subgraph
patch (lightllm/graph/_subgraph_patch.py) tracking upstream TODO at
pydantic_graph/graph_builder.py:1469. Perplexity's 142-line
_dispatch_one_event is gone — replaced by a per-event inner graph
(absorb_event → text_mirror → pop_next_block → {plan_arm →
bare_markdown_arm → diff_block_arm | flush}) that preserves the
cross-block has_plan_block invariant and the single end-of-event flush
via per-event scratch fields on _PerplexityIntakeState. Google's
handle_generate_chunk is gone — replaced by a per-chunk inner graph that
classifies parts via a typed-marker decision across five arms. Shared
StateT flows through unchanged so the inner graphs mutate the same
state instance the outer FSM owns.

Phase H — thread tool_kind through the listener parse boundary so
ModelResponsePartsManager auto-promotes ToolCallPart to its typed
subclass (e.g. ToolSearchCallPart for web_search_20250305). New
adapters/_tool_kinds.py maps wire `type` discriminators to ToolPartKind;
_parse_tools in both envelopes reads it. Regression test at
tests/test_lightllm_graph_intake_anthropic.py asserts the promotion.

pplx_stamp_headers — restores the Perplexity Pro browser-shape header
bundle (Cookie: __Secure-next-auth.session-token=…, Chrome UA, Origin,
Referer, x-perplexity-*, x-app-api*, sec-fetch-*) that the litellm
removal in 488c876 silently dropped along with
PerplexityProConfig.validate_environment. Without this, every
/rest/sse/perplexity_ask call returned 403. Also swaps perplexity_pro
auth.file to ~/.opnix/secrets/perplexity-pro-api-key to match the
production opnix convention.

commitbee_compat — guard against non-dict bodies (Anthropic /api/v2/logs
posts a list-shaped event batch) so the hook short-circuits cleanly
instead of crashing on ctx._body.get(). Regression test at
tests/issues/regression/test_commitbee_list_body.py.

Docs — align AGENTS.md project overview, lightllm subsection, hook
table, provider description, prompt-caching note, and stubs list to the
post-litellm-removal reality. docs/lightllm.md gains a Subgraph
composition section + Typed-part promotion section, refreshed module
layout, FSM-file table, mermaid section, and file map. docs/mcp.md,
docs/inspect.md, docs/configuration.md, docs/sdk/README.md get their
stale litellm references replaced.

Verified end-to-end: 1668 pytest passing (+9 new), mypy/ruff clean,
deprecation-warnings-as-errors gate clean, mermaid sanity clean, and
the live smoke matrix passes rows 1 (Claude CLI), 2 (SDK shape replay /
former 429 reproducer), 11 (Gemini CLI), 12 (Perplexity Pro).
Sonnet LSP audit confirmed three orphan symbols left over from the
litellm-removal refactor:

- ``PerplexityProConfig`` class in ``lightllm/pplx.py`` (zero external
  references — ``PerplexityAdapter.render`` goes directly to
  ``_build_pplx_payload``).
- ``lightllm/registry.py`` module entirely (``_LOCAL_CONFIGS`` and
  ``get_config`` referenced only by themselves and dead tests).
- Their exports from ``lightllm/__init__.py``.

Deleted plus ``tests/test_lightllm_registry.py`` and the three matching
test functions in ``tests/test_lightllm_pplx.py`` (registry resolver +
two ``transform_request`` tests). 1663 pytest still passing (was 1668;
5 deleted dead tests).

Also added ``web_search_20260209`` to ``_tool_kinds.ANTHROPIC_TYPED_TOOLS``
(per the Anthropic SDK's currently shipped dated variants) and documented
the scope constraint inline: pydantic-ai's ``ToolPartKind`` is
``Literal['tool-search']`` today, so only ``web_search_*`` variants map
until upstream registers more kinds (the bash / code_execution / computer
/ text_editor / web_fetch families have no ``ToolPartKind`` equivalents
yet). OpenAI Chat Completions ``tools[].type`` is ``Literal['function']``
only (verified against ``openai/types/chat/``), so ``OPENAI_TYPED_TOOLS``
stays empty until ccproxy adds a Responses API listener.

Doc cleanup: ``ParsedRequest`` is now correctly described as
**test-only**. The previous docstring + ``docs/lightllm.md`` claim that
the inspector used ``parse_request`` for "flow enrichment" was stale —
the inspector goes through ``Context.from_flow`` →
``Context.parse_sync`` → ``parse_request_into_fields`` (in-place
population), like all production code.
…nder

Three independent ergonomic improvements landed together; zero behavior
change.

- Naming pass. ListenerFormat -> InboundFormat (StrEnum) so the type name
  matches the canonical inbound/outbound axis used everywhere else.
  Provider.provider -> Provider.type so the field matches the
  AuthSource.type discriminator pattern. TransformMeta.provider ->
  .provider_type, TransformMeta.listener_format -> .inbound_format.
  Dispatch kwarg renames: upstream_provider/provider -> provider_type,
  listener_format -> inbound_format. Metadata key ccproxy.listener_format
  -> ccproxy.inbound_format. _select_listener_format ->
  _select_inbound_format. Nix-side YAML: providers.X.provider ->
  providers.X.type in nix/defaults.nix + bundled template.

- Context.extras. ~60 LOC typed accessor (.get/.set/.delete/.has) over
  ctx._body via glom, exposed as layer 3 of the three-layer access model
  alongside the header and typed-IR layers. Existing glom(ctx._body, ...)
  callers stay valid; migration is opportunistic.

- HookDAG.render(). Emits stateDiagram-v2 mermaid markup walking the
  topo-sorted execution order with [*] brackets for sources/sinks.
  ccproxy status --mermaid prints inbound + outbound DAGs as paste-ready
  output.

AGENTS.md + docs/lightllm.md updated to reflect the renames, the new
Context.extras layer, and the --mermaid CLI flag. phase4.md added as the
next-session plan for OpenAI Responses (Codex parity).

Verified: 1671 tests pass, mypy clean across 103 source files, grep for
ListenerFormat / listener_format / upstream_provider / _listener_format
returns zero matches in src/ tests/ docs/ AGENTS.md nix/.
Apply Tier 1+2+3 cuts from the removal-candidates plan:

- Delete pure duplicates: Marketplace Plugin Sync, Defaults Flow
  diagram, MCP tool enumeration, transport constants, FlowRecord
  field listing, historical commit references.
- Compress subsystem deep-dives with canonical homes elsewhere:
  lightllm (docs/lightllm.md), Perplexity Pro narrative
  (docs/pplx.md), oauth/sources prose, Anthropic billing two-phase
  signing (regenerate.py docstring), inspector + pipeline per-file
  enumerations, dev-vs-prod section.
- Selective trim: hook table Purpose column to single-sentence form,
  Configuration narrative dedupe, Smoke Test prose, SSL/Logging
  Implementation Notes entries.

Preserve all load-bearing content: both IMPERATIVE blocks (shape
replay; Perplexity docs gate), Triage Principle, three-layer access
model, hook table rows, sentinel-key concept, routing precedence,
Key Constants, Body metadata footgun, SSE streaming + namespace
localhost routing notes.
Enables bidirectional transform for OpenAI's Responses API (used by
Codex CLI). Handles 27-item discriminated union in input[], preserving
reasoning blocks and server-side tool calls via raw_extras for lossless
round-trip.
Implements listener-side rendering for InboundFormat.OPENAI_RESPONSES,
enabling ccproxy to serve OpenAI Codex CLI traffic via the /v1/responses
streaming protocol with per-item and per-content-part lifecycle events.
Introduces an interactive REPL for flow inspection and ships sanitized
default shapes for Anthropic and Gemini providers. User-captured shapes
override bundled defaults. Patch-series support allows incremental shape
modifications via quilt-style unified diffs.
Consolidates shape storage under a single shapes_dir, with provider
patch queues living as {provider}/series subdirectories instead of a
separate patches_dir. Simplifies configuration and aligns the on-disk
layout with the quilt-style patch workflow.

BREAKING CHANGE: removed ShapingConfig.patches_dir; patch queues now
  live under shapes_dir/{provider}/
Shape-backed fingerprint profiles (e.g., 'anthropic') now resolve
through provider .mflow metadata instead of requiring hardcoded
curl-cffi browser names. FingerprintCaptureAddon parses native TLS
ClientHello into JA3/JA4 material, ShapeCaptureAddon embeds it in shape
metadata, and transport dispatch replays it via curl-cffi custom
options.
Updates all documentation to reflect the supported metadata access
pattern (ctx.metadata / metadata_from_flow) instead of the internal
mitmproxy backing store (flow.metadata). The API itself was already in
place; this aligns docs with actual usage.
…anitizer

- Strip x-ccproxy-flow-id at capture time (shape_capturer)
- New EgressSanitizerAddon: drop ccproxy-internal correlation headers
  before mitmproxy forwards (x-ccproxy-flow-id, -hooks, -oauth-injected)
- Add diagnostics to anthropic content_fields so capturer's
  previous_message_id never replays onto another user's request
- New scripts/package-mflows.py: one-way distillation of personal
  captures into bundled templates (strips identifier headers,
  zeroes metadata.user_id, drops body messages/tools, trims system
  to first 2 entries, keeps only ccproxy.fingerprint.profile in
  flow metadata)
- Pre-commit hook: --verify mode rejects bundled shapes that carry
  capturer identity
- Re-derive src/ccproxy/templates/shapes/anthropic.mflow from a
  fresh capture using the new minimal scrubber
- Delete tests/test_shaping_defaults.py (contained author PII in
  literal hand-curated marker list — replaced by structural verify
  step in package-mflows.py)
- Apply HTTP_CONTENT_DECODING=0 in transport/dispatch +
  CapturedFingerprint.transport_kwargs so curl-cffi stops
  auto-decompressing and mitmproxy decodes Content-Encoding itself
- Extract _default_hooks() factory in config.py (resolves ty
  diagnostic on Field default_factory invariant mismatch)
starbaser added 2 commits May 25, 2026 13:32
…an conn state

- package-mflows.py: delete body.metadata.user_id and
  body.diagnostics.previous_message_id keys outright (no zero-UUID
  placeholder). Replace client_conn and server_conn with sanitized
  Connection stubs so the wireguard config path and capture-time IPs
  don't ship in the bundled artifact.
- Re-derive src/ccproxy/templates/shapes/anthropic.mflow from a fresh
  capture using the corrected scrubber (4201 bytes).
- Delete src/ccproxy/templates/shapes/gemini.mflow — its tnetstring
  encoding was corrupted by the history rewrite step; re-capture is
  required (see CODEX_HANDOFF.md).
- Add CODEX_HANDOFF.md documenting session state, what's been done,
  and the remaining tasks: re-capture gemini, add provider-SDK e2e
  tests against the dev daemon, plus open follow-ups.
The package-mflows.py scrubber duplicated the existing apply-time
shaping system. The right approach: bundled .mflow is a faithful
capture; selective application happens at runtime via content_fields,
shape_hooks, merge_strategies, and strip_headers/preserve_headers.

Reverted:
- scripts/package-mflows.py — deleted
- .pre-commit-config.yaml — package-mflows-verify hook removed
- docs/fingerprint.md — 'Bundled vs personal shapes' section removed
- src/ccproxy/templates/shapes/anthropic.mflow — deleted; needs
  re-capture (filter-repo corrupted the original)

Kept (real apply-time fixes from this session):
- shape_capturer.py strip of x-ccproxy-flow-id at capture time
- EgressSanitizerAddon for x-ccproxy-* on outbound
- diagnostics added to anthropic content_fields
- HTTP_CONTENT_DECODING=0 in transport_kwargs
- _default_hooks() factory (ty diagnostic fix)

CODEX_HANDOFF.md rewritten with the corrected plan: re-capture both
bundled .mflow files, extend shaping config (content_fields,
shape_hooks) to cover per-user fields, add provider-SDK e2e tests
against the dev daemon. Bundled scrubbing as a separate packaging
step is explicitly rejected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants