Summary
TinyClaw is moving quickly and adoption is growing, but the repository currently has:
No automated tests
A release workflow that builds only on Ubuntu
No automated verification that the CLI, queue processor, or API continue to work across operating systems
Because TinyClaw runs locally on user machines , the biggest risk is not logic bugs alone but environment regressions such as:
install script behavior
Node native modules (better-sqlite3)
filesystem paths and HOME directory assumptions
WSL2 compatibility
queue and API wiring
This proposal introduces a small, high-value test suite designed to:
Prevent releases that break the core message → queue → response flow
Ensure TinyClaw installs and runs on macOS, Linux, and Windows (WSL2)
Keep tests deterministic and fast by avoiding real LLM calls
The goal is confidence without slowing development velocity .
Testing Strategy
The proposed strategy uses two complementary layers.
Layer 1: Local Integration Tests
These tests validate TinyClaw’s core runtime behavior using the actual queue, SQLite database, and HTTP API.
They do not depend on external providers .
Key Principles
Run TinyClaw in a temporary HOME directory
Use the real queue processor
Use the real HTTP API
Replace the LLM provider with a deterministic fake
This ensures the entire pipeline is tested:
flowchart TD
A[HTTP Message] --> B[Queue Write]
B --> C[Queue Processor]
C --> D[Agent Routing]
D --> E["Provider Call (mocked)"]
E --> F[Response Persisted]
F --> G[API Response Retrieval]
Loading
Deterministic Provider
Integration tests should not call Claude/Codex/OpenAI/etc .
Instead, add a simple test provider.
Example concept:
export async function fakeProvider ( prompt : string ) {
return `FAKE_RESPONSE:${ prompt } `
}
Configuration example:
Benefits:
No API keys
No network
Fully deterministic
Fast CI runs
Integration Test Cases
A small number of high-signal tests will provide strong protection.
1. Core Message Flow
Goal
Ensure message → response pipeline works.
Steps
Start TinyClaw server
POST /api/message
Wait for queue processor
GET /api/responses
Expected
Response exists
Response contains fake provider output
2. Agent Routing
Goal
Verify agent mention routing works.
Steps
POST message:
Expected
Response attributed to coder agent
Routing logic triggered
3. Queue State Transitions
Verify queue states transition correctly.
Expected flow:
pending → processing → completed
Test asserts:
message state transitions
response record created
4. Retry / Dead Letter Handling
Force provider failure.
Example fake provider:
throw new Error("simulated failure")
Expected:
retry attempts increment
message eventually marked "dead"
Then test dead-letter retry endpoint.
5. SSE Event Stream
Connect to:
Send a message.
Verify events received:
message_received
response_ready
Ensures UI and integrations remain stable.
6. Persistence Across Restart
Send message
Restart server
Verify queue state persists
Confirms SQLite persistence behavior.
Integration Test Implementation
Example structure:
tests/
integration/
core-message.test.ts
routing.test.ts
queue-state.test.ts
dead-letter.test.ts
sse-events.test.ts
Test environment setup:
HOME=$(mktemp -d)
TINYCLAW_PROVIDER=fake
PORT=3777
Startup command example:
Then tests interact only through the HTTP API.
Layer 2: Cross-OS Release Smoke Tests
Integration tests verify functionality.
But releases must also verify installation works on real user systems .
A CI matrix ensures TinyClaw runs on:
ubuntu-latest
macos-latest
windows-latest (WSL)
These tests should mimic how users actually install TinyClaw .
Smoke Test Workflow
Each OS runner should:
Step 1: Install TinyClaw
Example:
npm install
npm run build
or test the installer script if present.
Step 2: Start TinyClaw
Verify server starts on port 3777.
Step 3: Basic API Flow
Run a minimal message test:
POST /api/message
GET /api/responses
Verify response exists.
Step 4: CLI Sanity
Check commands:
tinyclaw --help
tinyclaw agents
tinyclaw queue status
Ensures CLI packaging works.
Example GitHub Actions Workflow
name: smoke-tests
on:
pull_request:
push:
branches: [main]
jobs:
smoke:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm install
- run: npm run build
- run: node dist/server.js &
- run: sleep 5
- run: |
curl -X POST http://localhost:3777/api/message \
-H "Content-Type: application/json" \
-d '{"text":"hello test"}'
- run: |
curl http://localhost:3777/api/responses
Benefits
This approach provides:
Release Safety
Prevents regressions in:
message routing
queue processing
persistence
API contracts
Cross-Platform Confidence
Verifies TinyClaw works on:
macOS
Linux
Windows (WSL)
Fast CI
Tests are:
deterministic
offline
quick
Minimal Maintenance
The suite focuses only on core runtime guarantees , not exhaustive coverage.
Suggested Repository Changes
tests/
integration/
src/
providers/
fake-provider.ts
.github/workflows/
smoke-tests.yml
Add script:
Expected Outcome
After implementing this suite:
Releases cannot break the message pipeline
Installation failures across OS are caught before release
Contributors can run integration tests locally
Maintainers gain confidence to ship quickly
This aligns with TinyClaw’s rapid development style while ensuring the core system remains stable .
Summary
TinyClaw is moving quickly and adoption is growing, but the repository currently has:
Because TinyClaw runs locally on user machines, the biggest risk is not logic bugs alone but environment regressions such as:
better-sqlite3)This proposal introduces a small, high-value test suite designed to:
The goal is confidence without slowing development velocity.
Testing Strategy
The proposed strategy uses two complementary layers.
Layer 1: Local Integration Tests
These tests validate TinyClaw’s core runtime behavior using the actual queue, SQLite database, and HTTP API.
They do not depend on external providers.
Key Principles
This ensures the entire pipeline is tested:
Deterministic Provider
Integration tests should not call Claude/Codex/OpenAI/etc.
Instead, add a simple test provider.
Example concept:
Configuration example:
Benefits:
Integration Test Cases
A small number of high-signal tests will provide strong protection.
1. Core Message Flow
Goal
Ensure message → response pipeline works.
Steps
/api/message/api/responsesExpected
2. Agent Routing
Goal
Verify agent mention routing works.
Steps
POST message:
Expected
3. Queue State Transitions
Verify queue states transition correctly.
Expected flow:
Test asserts:
4. Retry / Dead Letter Handling
Force provider failure.
Example fake provider:
Expected:
Then test dead-letter retry endpoint.
5. SSE Event Stream
Connect to:
Send a message.
Verify events received:
Ensures UI and integrations remain stable.
6. Persistence Across Restart
Confirms SQLite persistence behavior.
Integration Test Implementation
Example structure:
Test environment setup:
Startup command example:
Then tests interact only through the HTTP API.
Layer 2: Cross-OS Release Smoke Tests
Integration tests verify functionality.
But releases must also verify installation works on real user systems.
A CI matrix ensures TinyClaw runs on:
These tests should mimic how users actually install TinyClaw.
Smoke Test Workflow
Each OS runner should:
Step 1: Install TinyClaw
Example:
or test the installer script if present.
Step 2: Start TinyClaw
Verify server starts on port 3777.
Step 3: Basic API Flow
Run a minimal message test:
Verify response exists.
Step 4: CLI Sanity
Check commands:
Ensures CLI packaging works.
Example GitHub Actions Workflow
Benefits
This approach provides:
Release Safety
Prevents regressions in:
Cross-Platform Confidence
Verifies TinyClaw works on:
Fast CI
Tests are:
Minimal Maintenance
The suite focuses only on core runtime guarantees, not exhaustive coverage.
Suggested Repository Changes
Add script:
Expected Outcome
After implementing this suite:
This aligns with TinyClaw’s rapid development style while ensuring the core system remains stable.