Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions packages/adapters/CONVENTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Conventions — `@agentskit/adapters`

The provider layer. Every file in this package maps one LLM or one embedding provider to AgentsKit's stable contracts.

## Scope

- **Chat adapters** — implement `AdapterFactory` per [ADR 0001](../../docs/architecture/adrs/0001-adapter-contract.md)
- **Embedders** — implement `EmbedFn` per [ADR 0003](../../docs/architecture/adrs/0003-memory-contract.md)
- **No UI.** No React, no Ink, no CLI here.
- **No runtime logic.** No loops, no tool execution. Just transport.

## Adding a new chat adapter

1. Create `src/<provider>.ts`. Export a factory function that returns `AdapterFactory`.
2. Accept configuration at construction time only: `apiKey`, `model`, `baseUrl` as needed.
3. In `createSource`, build the request but **do not fetch yet**. Defer all I/O to `stream()` — invariant A1.
4. In `stream()`, use the SSE utility from `src/utils.ts` if the provider speaks server-sent events. Otherwise write a parser that respects the chunk shape in `@agentskit/core`.
5. Always end with `{ type: 'done' }`, an error chunk, or iterator return on abort — invariant A3.
6. Yield `{ type: 'text', content }` for text deltas. Yield `{ type: 'tool_call', toolCall: { id, name, args } }` with **complete args** per invariant A5.
7. Put provider-specific data in `chunk.metadata` (usage counts, raw response, reasoning). Consumers must not depend on its shape — A8.
8. Re-export from `src/index.ts`.

## Adding a new embedder

1. Create `src/embedders/<provider>.ts`. Export a factory returning `EmbedFn`.
2. Accept `apiKey`, `model` at construction.
3. Return a function of `(text: string) => Promise<number[]>`.
4. Must be stable: same input + same model = same vector. No randomness — invariant E1.
5. Re-export from `src/embedders/index.ts` and from `src/index.ts`.

## Naming

- File name matches the provider: `openai.ts`, `anthropic.ts`, `gemini.ts`, etc.
- Factory function matches the provider lowercase: `openai(opts)`, `anthropic(opts)`.
- Options interface: `OpenAIAdapterOptions`, `AnthropicAdapterOptions`.
- Types internal to one adapter live in the same file; shared types go in `src/types.ts`.

## Testing

For every new adapter:

- **Contract test** using the shared `AdapterContractSuite` (when it lands — for now, at minimum run the ten invariants mentally against your implementation)
- **Stream parsing test** with a recorded fixture (JSON file of SSE chunks) so tests are fast and deterministic
- **Error path test** — what happens on 401, 429, 500, malformed response
- **Abort test** — `stream()` iteration terminates when `abort()` is called mid-flight

Tests live in `tests/<provider>.test.ts`.

## Common pitfalls

| Pitfall | What to do instead |
|---|---|
| Calling `fetch` from `createSource` | Defer to `stream()` |
| Mutating the input `messages` array | Copy if you need to transform for the wire format |
| Throwing from `stream()` on a provider error | Emit `{ type: 'error', metadata: { error } }` |
| Streaming partial tool-call args across multiple chunks | v1 requires complete args in one chunk. Buffer internally. |
| Exposing provider SDK types in your public API | Keep the public surface limited to `AdapterFactory` |

## Review checklist for this package

- [ ] Implements all ten invariants A1–A10
- [ ] Bundle size under 20KB gzipped (tightens over time)
- [ ] Coverage threshold holds (60% lines; aiming for 80%)
- [ ] Contract-tested against the ten invariants
- [ ] SSE parsing uses `src/utils.ts` helpers where possible
- [ ] README updated if the public export surface changed
54 changes: 54 additions & 0 deletions packages/cli/CONVENTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Conventions — `@agentskit/cli`

The `agentskit` command-line interface. The entry point for people who want to try AgentsKit without writing code first.

## Scope

- `agentskit chat` — interactive Ink chat with any provider
- `agentskit init` — scaffold a new project
- `agentskit run` — execute runtime agents from the terminal
- Future: `agentskit doctor`, `agentskit dev`, `agentskit tunnel` (tracked in Phase 1)

## Adding a new command

1. Create `src/commands/<name>.ts`.
2. Export a function that takes parsed arguments and runs the command — no classes.
3. Wire the command in `src/bin.ts` using the existing argv parser.
4. Print help output that fits on one screen (`--help` reads as documentation).
5. Exit cleanly with `process.exit(code)` only at the top level. Never in a library function.

## Output conventions

- Keep terminal output terse. One line per meaningful event.
- Use `chalk` or Ink for color. Do not hardcode ANSI codes.
- Respect `--quiet` and `--json` flags where applicable.
- Errors go to stderr; structured output goes to stdout.

## Flag conventions

- Short form (`-p`) for frequent flags, long form (`--provider`) always present.
- Defaults shown in `--help`.
- Mutually-exclusive flags fail fast with a clear error.

## Testing

- Use `vitest` with child-process spawns for e2e coverage of the `bin.ts` entry.
- Unit-test individual commands with mocked adapters.
- Test fixtures live in `tests/fixtures/`.

## Common pitfalls

| Pitfall | What to do instead |
|---|---|
| Using `process.exit` in a library function | Return an exit code from the command function; only `bin.ts` calls `process.exit` |
| Reading `process.argv` outside `bin.ts` | Pass parsed args down |
| Hardcoding provider names | Accept `--provider <name>` and route to the right adapter |
| Emitting unstructured text with `--json` set | Emit JSON; add `--format=json` if both are needed |

## Review checklist for this package

- [ ] Bundle size under 20KB gzipped
- [ ] Coverage threshold holds (30%, climbing)
- [ ] `--help` output is one screen and accurate
- [ ] Spawn-based e2e test for the new command
- [ ] Exit codes: 0 success, 1 expected failure, 2 usage error
62 changes: 62 additions & 0 deletions packages/core/CONVENTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Conventions — `@agentskit/core`

The sacred package. Every rule here is stricter than the rest of the monorepo.

## Non-negotiables

- **Zero runtime dependencies.** `dependencies` in `package.json` is empty and stays empty. Never add one, not even "small".
- **Under 10KB gzipped.** CI (`size-limit`) enforces. If you're pushing the limit, the change is too big.
- **Contracts first.** Public types and interfaces for every contract live here — Adapter, Tool, Memory, Retriever, Skill, Runtime. Implementations live in other packages.
- **Named exports only.** No default exports, anywhere, ever.
- **No `any`.** Use `unknown` and narrow with type guards.

## What belongs here

- **Types and interfaces** for the six core contracts (ADRs 0001–0006)
- **Shared primitives** reused by multiple packages: `createEventEmitter`, `safeParseArgs`, `consumeStream`, message-building helpers
- **The chat controller** (`controller.ts`) — headless state machine for a chat session
- **The agent loop core** (`agent-loop.ts`) — the substrate the runtime builds on

## What does NOT belong here

- Any provider SDK or API client → `@agentskit/adapters`
- Any React hook or component → `@agentskit/react`
- Any Ink component → `@agentskit/ink`
- Any file I/O → `@agentskit/memory` or a package that's not zero-dep
- Any `node:*` import that's not available on every runtime we target (edge, Deno, browser)

## Adding a new primitive

1. Is the thing a **contract type**? Put it in `src/types/*.ts` and re-export from `src/types/index.ts`. Write an ADR if it's cross-package.
2. Is it a **reusable helper** used by 2+ packages? Put it in `src/primitives.ts` or a dedicated file, export from `src/index.ts`.
3. Write unit tests that exercise only the public export. Do **not** reach into internals.

Every addition raises the bundle size. Run `pnpm size` in the repo root and verify the core budget still holds.

## Testing

- Pure unit tests with `vitest`. Environment is `node`.
- Avoid mocks — test real functions with real inputs.
- Mocked adapters for stream-related tests are acceptable since the Adapter contract is the seam.

## Files you can edit without an ADR

- Bug fixes that don't change exported types
- New internal helpers (not exported)
- JSDoc improvements
- Test additions

## Files that require an ADR first

- Any `src/types/*.ts` change that alters an exported type
- Any new exported function or class
- Anything that touches the bundle size beyond ~500 bytes gzipped

## Review checklist for this package

- [ ] No new runtime dependency (check `package.json`)
- [ ] Bundle size under 10KB gzipped (`pnpm size`)
- [ ] Coverage threshold holds (75% lines)
- [ ] No `any` introduced
- [ ] Named exports only
- [ ] ADR linked if a contract changed
56 changes: 56 additions & 0 deletions packages/eval/CONVENTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Conventions — `@agentskit/eval`

Agent evaluation and benchmarking. Treats agents like production systems — scored, regressed-against, tracked over time.

## Stability tier: `beta`

Core `runEval(dataset)` is stable. Reporters, metrics, dataset shape may gain fields in minor bumps.

## Scope

- `runEval({ runtime, dataset, concurrency })` — runs a dataset, returns a report
- Scoring helpers (exact-match, regex, LLM-as-judge)
- Reporters (console, JSON file; more coming)
- Types: `EvalCase`, `EvalReport`, `ScoreFn`

## Design principles

- **Evaluation is testing for non-determinism.** Consumers should use `vitest` or similar as the runner; this package provides the primitives.
- **Scores are numbers in `[0, 1]`.** Boolean outcomes coerce (`true` → 1, `false` → 0).
- **Every metric is optional**. Latency, cost, tokens — report if available, skip otherwise.
- **Replay-first** (future): when deterministic replay lands, eval runs should be reproducible from a recorded trace.

## Adding a metric

1. Add the field to `EvalReport` in `src/types.ts`.
2. Compute it in `runEval`'s aggregation loop.
3. Make it optional — some runtimes/adapters won't have it.
4. Document in the package README.

## Adding a reporter

1. Create `src/reporters/<name>.ts`.
2. Export a factory: `export function jsonReporter(opts): Reporter`.
3. `Reporter` has `onCase(case, result)` and `onComplete(report)` events.
4. Keep it synchronous where possible; non-blocking where not.

## Testing

- Unit tests for scorers and aggregation with deterministic fixtures.
- Integration test that runs a tiny dataset against a mock runtime end-to-end.

## Common pitfalls

| Pitfall | What to do instead |
|---|---|
| Blocking tests on real model calls | Use deterministic mock adapters |
| Assuming every result has `tokensUsed` | Make metrics optional |
| Scoring via string equality on LLM outputs | Use LLM-as-judge for fuzzy outputs |
| Mutating input dataset | Treat `EvalCase[]` as read-only |

## Review checklist for this package

- [ ] Bundle size under 10KB gzipped
- [ ] Coverage threshold holds (95% lines — mostly pure logic)
- [ ] New metric documented in README
- [ ] No hard dependency on any one adapter or reporter
64 changes: 64 additions & 0 deletions packages/ink/CONVENTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Conventions — `@agentskit/ink`

Terminal UI components for AgentsKit. Mirrors `@agentskit/react`'s surface but for Ink.

## Scope

- **Ink components** — `ChatContainer`, `Message`, `InputBar`, `ThinkingIndicator`, `ToolCallView`
- **Ink hooks** — thin wrappers around `@agentskit/core` primitives for Ink-friendly consumption
- Input handling that respects terminal raw-mode semantics

## What does NOT belong here

- React DOM components → `@agentskit/react`
- Autonomous runtime → `@agentskit/runtime`
- Anything requiring a DOM

## Adding a new component

1. Create `src/components/<Name>.tsx`. PascalCase.
2. Use only `ink` primitives — `Box`, `Text`, `useInput`, `useFocus`, etc.
3. No ANSI escape codes in component logic; let `ink` handle rendering.
4. Re-export from `src/components/index.ts` and from `src/index.ts`.

## Input handling

- Use `ink`'s `useInput` hook. Do not read stdin directly.
- Gate input on `chat.status` — block input while `streaming`.
- Respect the `disabled` prop everywhere a component accepts user input.

## Testing

- `ink-testing-library@4` does **not** route stdin through `ink@7`'s input pipeline. Keyboard-input tests must mock `useInput` directly:

```tsx
let captured: ((input: string, key: Key) => void) | undefined
vi.mock('ink', async () => {
const actual = await vi.importActual<typeof import('ink')>('ink')
return {
...actual,
useInput: (handler) => { captured = handler },
}
})

// In tests, call captured!(input, key) directly.
```

- Rendering-only tests work fine with `ink-testing-library`.

## Common pitfalls

| Pitfall | What to do instead |
|---|---|
| Writing ANSI codes manually | Use `Text color={...}` |
| Reading stdin directly | Use `useInput` |
| Forgetting to gate input on `streaming` | Check `chat.status !== 'streaming'` before every action |
| Assuming 80 columns | Use `useStdout` and `rows`/`columns` from it |

## Review checklist for this package

- [ ] Bundle size under 15KB gzipped
- [ ] Coverage threshold holds (60% lines)
- [ ] Uses `ink` primitives only (no raw ANSI)
- [ ] Keyboard tests mock `useInput` per the pattern above
- [ ] Works in narrow terminals (test at 40 columns)
63 changes: 63 additions & 0 deletions packages/memory/CONVENTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Conventions — `@agentskit/memory`

Memory backends implementing the two contracts from [ADR 0003](../../docs/architecture/adrs/0003-memory-contract.md): `ChatMemory` and `VectorMemory`.

## Scope

- **ChatMemory implementations**: `fileChatMemory`, `sqliteChatMemory`, `redisChatMemory`
- **VectorMemory implementations**: `fileVectorMemory`, `redisVectorMemory`
- Shared client helpers where reuse is genuine (`redis-client.ts`, `vector-store.ts`)

## Adding a new ChatMemory backend

1. Create `src/<name>-chat.ts`.
2. Export a factory: `export function sqliteChatMemory(opts): ChatMemory`.
3. Implement the six invariants CM1–CM6:
- `load()` returns a snapshot
- `save()` is **replace-all**, not append
- Ordering preserved, atomic from consumer view
- Empty state returns `[]`
- `clear` optional
4. Re-export from `src/index.ts`.

## Adding a new VectorMemory backend

1. Create `src/<name>-vector.ts`.
2. Export a factory: `export function fileVectorMemory(opts): VectorMemory`.
3. Implement the eight invariants VM1–VM8:
- `store` is **upsert by id**
- Dimensionality is a constructor concern — reject mismatches
- `search` returns descending-scored
- `threshold` is exclusive from below
- `topK` is an upper bound, not a floor
4. Re-export from `src/index.ts`.

## Configuration

- Connection details (file path, URL, credentials) taken at construction.
- Do not open connections until first use — defer to `load()` / `save()` / `store()` / `search()`.
- Provide a `close()` escape hatch for long-lived processes; the contracts don't require it but consumers appreciate it.

## Testing

- **In-memory fake** per contract for fast tests of consumers (`memory/fakes.ts` — not yet present, welcome to add).
- **Integration tests** for each backend using real storage (SQLite file, Redis testcontainer).
- **Invariant tests**: a shared test suite that every backend must pass (`MemoryContractSuite` — tracked for future).

## Common pitfalls

| Pitfall | What to do instead |
|---|---|
| Implementing `save` as append | Replace-all (CM2). Consumers send full state. |
| Returning `null` from `load` on empty | Return `[]` |
| Mixing embedding dimensions in one vector store | Reject mismatches at `store()` time |
| Padding `search` results to reach `topK` | Return fewer documents; `topK` is an upper bound |
| Opening connections at import time | Defer to first method call |

## Review checklist for this package

- [ ] Bundle size under 15KB gzipped
- [ ] Coverage threshold holds (80% lines)
- [ ] New backend tested against all relevant invariants
- [ ] Config accepted at construction; no env reads in the factory
- [ ] Documentation for the backend's quirks in package README
Loading
Loading