Skip to content

Commit 2ab09cd

Browse files
committed
Modularize oauth21, add gitignore-aware listing with pagination, enforce mypy strict mode
- Split monolithic oauth21.py into a package with _jwt, _pkce, _dpop, _resource, _server, and _types modules - Add offset pagination to workspace_list_files and workspace_search_text; return pagination cursor in response - Respect .gitignore and .agignore rules in list_files and search_text - Enable mypy disallow_untyped_defs globally with targeted overrides - Add Architecture Decision Records for P1 primitives, Code Mode sandbox, OAuth/DPoP, and MCP Streamable HTTP - Raise CI coverage threshold to 85%
1 parent ba6d83c commit 2ab09cd

18 files changed

Lines changed: 1157 additions & 878 deletions

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ jobs:
2727
pip install -e ".[dev]"
2828
2929
- name: Run tests
30-
run: pytest --cov=teaagent --cov-report=term-missing
30+
run: pytest --cov=teaagent --cov-report=term-missing --cov-fail-under=85
3131

3232
lint:
3333
runs-on: ubuntu-latest

docs/adr/0002-p1-primitives.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# ADR 0002: P1 Primitives
2+
3+
## Status
4+
5+
Accepted for P1 implementation.
6+
7+
## Decision
8+
9+
Include trace recording, execution context compaction, eval framework, in-memory RAG,
10+
skill review, and AI-BOM generation as P1 primitives on top of the P0 agent harness.
11+
12+
Each primitive follows the same no-external-dependency policy as P0 (stdlib only).
13+
14+
## Rationale
15+
16+
These primitives compose naturally with the agent harness without inventing new
17+
interfaces:
18+
19+
- **TraceRecorder** records the agent's observation stream for replay and debugging.
20+
- **ContextCompactor** compresses long observation lists into summaries, keeping
21+
prompts within model context windows.
22+
- **Eval framework** lets teams measure agent performance on representative tasks
23+
before shipping model/prompt changes.
24+
- **InMemoryRetriever** and **KnowledgeGraph** provide lightweight RAG so the
25+
agent can query project knowledge without a vector database.
26+
- **SkillReview** audits skill content for security and correctness.
27+
- **AIBOM** generates a bill-of-materials for the agent's dependencies.
28+
29+
## Consequences
30+
31+
- RAG components (`InMemoryRetriever`, `KnowledgeGraph`) are deliberately in-memory
32+
so that P2 can swap in GraphQLite-backed persistence without changing the agent
33+
contract.
34+
- Eval framework is deterministic and does not call live LLMs; it uses pre-recorded
35+
decision sequences.
36+
- These primitives add no mandatory dependencies beyond the Python standard library.
37+
38+
## Alternatives Considered
39+
40+
- **LangChain/LlamaIndex for RAG**: Rejected — adds 50+ transitive dependencies
41+
for what is essentially tf-idf similarity search.
42+
- **pytest for evals**: Rejected — eval is harness-level, not test-level. The eval
43+
framework is embedded so the agent can self-assess.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# ADR 0003: Code Mode Child-Process Sandbox
2+
3+
## Status
4+
5+
Accepted for P2 implementation.
6+
7+
## Decision
8+
9+
Execute LLM-generated Python code in a detached child process with AST allow-list
10+
validation, CPU-time limits, wall-clock timeouts, and best-effort memory limits.
11+
Reject container-level isolation, seccomp, and V8 isolates as P2 scope.
12+
13+
## Rationale
14+
15+
- AST allow-list validation (`ALLOWED_NODES`) prevents imports, attribute access,
16+
function definitions, and other dangerous constructs at parse time.
17+
- A `multiprocessing.Process` boundary isolates the child's address space from
18+
the parent. Even if `exec()` corrupts the child, the parent survives.
19+
- `RLIMIT_CPU` provides a hard CPU-time ceiling.
20+
- Wall-clock timeout via `process.join(timeout)` prevents hung code.
21+
- `RLIMIT_AS` provides advisory memory limits; it is silently no-op on macOS
22+
(macOS rejects `RLIMIT_AS` lowering) but the wall/CPU timeouts still bound
23+
the attack surface.
24+
25+
## Consequences
26+
27+
- Code Mode is safe for *advisory* local data manipulation but is **not**
28+
a production-grade sandbox. The child process shares the parent's filesystem
29+
view, network namespace, and process namespace.
30+
- Production deployment should layer containers, seccomp profiles, or a managed
31+
execution service on top of this implementation.
32+
- The `SAFE_BUILTINS` list is deliberately small (math, collection constructors).
33+
Expanding it requires an ADR and threat model review.
34+
35+
## Alternatives Considered
36+
37+
- **subinterpreters (PEP 554)**: Not available in Python 3.9; too bleeding-edge.
38+
- **Docker/container per execution**: Latency of ~1s per Code Mode call makes
39+
it impractical for the dozens of calls an agent may make in one run.
40+
- **V8/QuickJS isolate via PyMiniRacer or similar**: Adds a C extension dependency
41+
and a second language runtime; violates the "stdlib-only P0" policy.
42+
- **RestrictedPython**: Transforms source code but still executes in-process;
43+
offers no resource limits. Rejected in favor of the process boundary.

docs/adr/0004-oauth-dpop.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# ADR 0004: OAuth 2.1 + DPoP with Optional Dependencies
2+
3+
## Status
4+
5+
Accepted for P2 implementation.
6+
7+
## Decision
8+
9+
Implement OAuth 2.1 Authorization Server and Resource Server with DPoP
10+
proof-of-possession directly in TeaAgent, using a zero-dependency HMAC-SHA256
11+
core with optional `cryptography` library for asymmetric DPoP signature
12+
verification (ES256/RS256).
13+
14+
## Rationale
15+
16+
- MCP Streamable HTTP requires authentication for non-loopback binds.
17+
The minimal viable option is a bearer token, but bearer tokens are
18+
replayable by any network observer.
19+
- DPoP (RFC 9449) binds access tokens to a client-generated asymmetric key pair,
20+
preventing token replay. This is the only OAuth extension that materially
21+
improves MCP transport security without requiring TLS.
22+
- HS256 is the only JWT algorithm available without `cryptography`. When DPoP
23+
is disabled, HS256 access tokens are sufficient for the bearer token use case.
24+
- The optional-dependency design (`pip install teaagent[oauth]`) keeps the
25+
P0 zero-dependency posture intact while making DPoP available on demand.
26+
27+
## Consequences
28+
29+
- The module was split into `oauth21/` submodules (`_jwt.py`, `_dpop.py`,
30+
`_types.py`, `_server.py`, `_resource.py`, `_pkce.py`) at 851 lines for
31+
maintainability.
32+
- Internal state (clients, codes, nonces) is in-memory dicts. Multi-process
33+
or persistent deployments require an external store.
34+
- Key rotation, refresh tokens, and external client storage are deferred to
35+
a future production-hardening ADR.
36+
- The `cryptography` import is conditional (`HAS_CRYPTOGRAPHY` flag);
37+
38+
## Alternatives Considered
39+
40+
- **Authlib**: Adds 10+ dependencies; overkill for a single JWT sign/verify
41+
plus DPoP validation.
42+
- **PyJWT**: HMAC-only JWT is 20 lines; adding a dependency for it is waste.
43+
- **TLS-only (no DPoP)**: Rejected because the MCP server is designed to run
44+
behind a reverse proxy and DPoP protects against replay even after TLS
45+
termination.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# ADR 0005: MCP Streamable HTTP Transport
2+
3+
## Status
4+
5+
Accepted for P2 implementation.
6+
7+
## Decision
8+
9+
Expose the TeaAgent workspace tool pack to MCP clients over stdio JSON-RPC
10+
and Streamable HTTP (POST/GET/DELETE on `/mcp`) with `Mcp-Session-Id` session
11+
management, bearer-token/OAuth 2.1 guardrails, and Origin allowlisting.
12+
13+
## Rationale
14+
15+
- stdio JSON-RPC (`serve_mcp_stdio`) provides zero-config integration with
16+
MCP clients that launch the server as a subprocess (Claude Desktop, Zed, etc.).
17+
- Streamable HTTP (`serve_mcp_http`) enables remote IDE agents, web-based
18+
clients, and multi-session scenarios. It follows the MCP Streamable HTTP
19+
draft with SSE-based streaming on GET and JSON-RPC on POST.
20+
- Session management via `Mcp-Session-Id` header prevents cross-session
21+
confusion and supports session termination via DELETE.
22+
- Authentication is enforced at two layers:
23+
1. CLI layer: refuses non-loopback binds without `--auth-token` or OAuth.
24+
2. Library layer: `build_mcp_http_server()` raises `ValueError` on
25+
non-loopback binds without `auth_token` or `oauth_server`.
26+
27+
## Consequences
28+
29+
- The server uses `ThreadingHTTPServer` from stdlib with a simple in-memory
30+
`MCPSessionStore`. This is sufficient for single-machine use but not for
31+
high-concurrency production deployments.
32+
- TLS is not implemented natively; a reverse proxy (nginx, Caddy) must
33+
terminate TLS for external access.
34+
- DPoP nonce negotiation and OAuth metadata endpoints are served under the
35+
same HTTP handler, enabling fully featured MCP authorization.
36+
- Batch JSON-RPC requests are supported; notifications (no `id`) return HTTP 202.
37+
38+
## Alternatives Considered
39+
40+
- **WebSocket transport**: MCP specification favors Streamable HTTP over
41+
WebSocket for simpler HTTP semantics. SSE provides server-to-client push
42+
without the WebSocket upgrade handshake.
43+
- **gRPC**: Rejected — MCP is JSON-RPC-based; gRPC would require a separate
44+
protocol definition and client SDK.
45+
- **aiohttp/FastAPI**: Rejected — adds framework dependencies for what is
46+
a single-path HTTP handler. `ThreadingHTTPServer` is sufficient for the
47+
target scale (single-digit concurrent MCP clients).

pyproject.toml

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,31 @@ pythonpath = ["."]
4343
python_version = "3.9"
4444
ignore_missing_imports = true
4545
exclude = ["tests/"]
46-
disallow_untyped_defs = false
47-
disallow_incomplete_defs = false
46+
disallow_untyped_defs = true
47+
disallow_incomplete_defs = true
4848
check_untyped_defs = true
4949
warn_unused_ignores = false
5050
warn_redundant_casts = true
5151
no_implicit_optional = true
5252

53+
[[tool.mypy.overrides]]
54+
module = [
55+
"teaagent.cli._agent_parsers",
56+
"teaagent.cli._memory_parsers",
57+
"teaagent.cli._mcp_parsers",
58+
"teaagent.cli._misc_parsers",
59+
"teaagent.cli._model_parsers",
60+
"teaagent.code_mode",
61+
"teaagent.llm",
62+
"teaagent.llm_conformance",
63+
"teaagent.mcp_http",
64+
"teaagent.oauth21.*",
65+
"teaagent.telemetry",
66+
"teaagent.tui",
67+
"teaagent.workspace_tools",
68+
]
69+
disallow_untyped_defs = false
70+
disallow_incomplete_defs = false
71+
5372
[tool.ruff]
5473
extend-exclude = [".git", ".teaagent", "__pycache__", ".pytest_cache", ".venv", "build", "dist"]

teaagent/chat_agent.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ class ChatAgentConfig:
4646
approval_handler: Optional[ApprovalHandler] = None
4747

4848
@classmethod
49-
def from_root(cls, root: str | Path, **kwargs) -> 'ChatAgentConfig':
49+
def from_root(cls, root: str | Path, **kwargs: Any) -> 'ChatAgentConfig':
5050
return cls(root=Path(root).resolve(), **kwargs)
5151

5252

teaagent/heartbeat.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,21 @@
22

33
import threading
44
import time
5+
from collections.abc import Callable
56
from typing import Optional
67

78
from teaagent.audit import AuditLogger
89

910

1011
class Heartbeat:
11-
"""Periodic 'heartbeat' audit events for long-running runs."""
1212

1313
def __init__(
1414
self,
1515
audit: AuditLogger,
1616
run_id: str,
1717
*,
1818
interval_seconds: float,
19-
sleep=time.sleep,
19+
sleep: Callable[[float], None] = time.sleep,
2020
) -> None:
2121
if interval_seconds <= 0:
2222
raise ValueError('interval_seconds must be positive')
@@ -32,7 +32,7 @@ def __enter__(self) -> 'Heartbeat':
3232
self.start()
3333
return self
3434

35-
def __exit__(self, exc_type, exc, tb) -> None:
35+
def __exit__(self, exc_type: object, exc_val: object, exc_tb: object) -> None:
3636
self.stop()
3737

3838
def tick(self) -> None:

0 commit comments

Comments
 (0)