Conversation
Benchmark framework for comparing bcq vs raw-API approaches: Infrastructure: - harness/ with matrix.sh, run.sh, triage.sh (VERSION=1) - 12 task definitions (canonical: Task 12 overdue sweep) - inject-proxy.sh for deterministic 429/401 testing - Neutral validation via validate.sh (no bcq in validation path) Skills for benchmark conditions: - .claude-plugin/skills/bcq-basecamp/ (uses bcq CLI) - .claude-plugin/skills/raw-api-basecamp/ (curl + jq only) Canonical results (baseline_soft_anchor_env_today, N=5): - bcq: 100% pass rate across all models - raw: 100% for Claude models, 0% for GPT models - bcq is 16× cheaper for Sonnet, 70× cheaper for Haiku Policy: - BENCHMARKING.md defines quality gates (Smoke/Regression/Refresh) - reports/baseline.json is machine-readable with audit metadata - results/ is gitignored (ephemeral) bcq library changes: - Add pagination support via api_get_all - Add `bcq todos sweep` for bulk overdue processing
Give WebFetch access to consult official docs instead of pre-documenting all endpoints. Fairer benchmark comparison.
- Default to https://raw.githubusercontent.com/basecamp/bc3-api/... - Cache to ~/.cache/bcq/api-docs (BCQ_API_DOCS_CACHE_DIR override) - Prefer local clone if present for dev - Soften efficiency contract to anchor (correctness > speed)
Outputs README path: local clone if present, else cached from remote. Supports BCQ_API_DOCS_URL and BCQ_API_DOCS_CACHE_DIR overrides.
Standalone skill for 'what endpoint?' questions. Uses scripts/api-docs.sh, doesn't require execution. raw-api-basecamp remains self-sufficient.
- Remove hardcoded Documentation Structure table - Use ripgrep to find sections dynamically - Remove Common Questions cheat-sheet - Keep skill focused on how to fetch/navigate docs
jeremy
added a commit
that referenced
this pull request
Feb 19, 2026
Add benchmark harness and baseline results
jeremy
added a commit
that referenced
this pull request
Mar 5, 2026
Pin to v0.0.0-20260305004813-bc5ad283b855 (bc5ad28, main HEAD after PR #1 merge). Provides output, credstore, pkce, and oauthcallback packages extracted from this repo.
jeremy
added a commit
that referenced
this pull request
Mar 5, 2026
…cli module (#192) * Add github.com/basecamp/cli shared module dependency Pin to v0.0.0-20260305004813-bc5ad283b855 (bc5ad28, main HEAD after PR #1 merge). Provides output, credstore, pkce, and oauthcallback packages extracted from this repo. * Migrate internal/output to consume shared cli/output package Re-export exit codes, error codes, ExitCodeFor, NormalizeData, TruncationNotice, and TruncationNoticeWithTotal from the shared module. Type-alias Error for zero-cost compatibility with errors.As. ErrAuth and ErrForbiddenScope stay local (app-specific hint strings). Deletes ~330 lines of duplicated code including NormalizeData helpers, unmarshalPreservingNumbers, normalizeUnmarshaled, and the corresponding BenchmarkNormalizeUnmarshaled (now covered by shared module tests). * Migrate internal/auth to consume shared credstore, pkce, oauthcallback Replace keyring.go implementation (~230 lines of keyring probing, file I/O, atomic writes, Windows workarounds) with a typed wrapper around credstore.Store (~70 lines). Credentials struct and Store API unchanged. Replace PKCE helpers (generateCodeVerifier/Challenge/State) with pkce.GenerateVerifier/Challenge/State. Replace waitForCallback with inline listener creation + oauthcallback.WaitForCallback. Delete tests for removed unexported functions (TestGenerateCodeVerifier, TestGenerateCodeChallenge, TestGenerateState, TestKeyFunction). Update remaining tests to construct Store via newTestStore helper. * Pin github.com/basecamp/cli to v0.1.0 release tag Replaces pseudo-version v0.0.0-20260305004813-bc5ad283b855. Same code (bc5ad28), proper semver tag. * Use wrapper functions instead of var for re-exported symbols Mutable vars allow accidental reassignment from other packages in the module. Thin wrapper functions preserve immutability while delegating to the shared module.
4 tasks
jeremy
added a commit
that referenced
this pull request
Mar 9, 2026
* Document output modes and CLI introspection in SKILL.md Rewrite Agent Invariants #1 and #5 to guide agents toward --md for human-facing output and --json for parsing. Replace the flat output modes code block with a goal-oriented table and add a CLI Introspection section documenting --agent --help for command discovery. * Add --md flag to root help output Surface the Markdown output flag in the curated FLAGS section of basecamp --help, alongside --json and --quiet. * Address PR review feedback on SKILL.md - Narrow invariant #5: only messages/comments convert Markdown to HTML; todos, documents, and cards send --content as-is - Fix --agent/--quiet description: errors still emit {ok:false,...} object - Remove misleading "default when piped" claim; advise explicit --json/--md - Add long, default, and usage fields to --agent --help JSON example
10 tasks
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Benchmark framework for comparing
bcqCLI vs raw curl+jq API approaches.Infrastructure
harness/withmatrix.sh,run.sh,triage.sh(VERSION=1)inject-proxy.shfor deterministic 429/401 testingvalidate.sh(no bcq in validation path)Skills for benchmark conditions
.claude-plugin/skills/bcq-basecamp/(uses bcq CLI).claude-plugin/skills/raw-api-basecamp/(curl + jq only)Canonical results (baseline_soft_anchor_env_today, N=5)
Reliability
Efficiency
bcq is 16× cheaper for Sonnet, 70× cheaper for Haiku.
Policy
BENCHMARKING.mddefines quality gates (Smoke/Regression/Refresh)reports/baseline.jsonis machine-readable with audit metadataresults/is gitignored (ephemeral)bcq library changes
api_get_allbcq todos sweepfor bulk overdue processingTest plan
./test/run.shpasses (315 tests)