Feat/remote embeddings by VioletCranberry · Pull Request #34 · VioletCranberry/coco-search

VioletCranberry · 2026-02-24T12:29:59Z

Summary

Add support for remote embedding providers (OpenAI, OpenRouter) alongside the default Ollama backend, controlled via COCOSEARCH_EMBEDDING_PROVIDER env
var or embedding.provider config field
Make the Ollama Docker Compose service opt-in via profiles: ["ollama"] — when using remote providers, Ollama is no longer required, saving startup
time and resources
Track embedding provider/model in index metadata with mismatch detection on reindex

Details

Multi-provider embeddings:

EmbeddingSection schema gains a provider field with validation (ollama, openai, openrouter)
Each provider has sensible model defaults: ollama → nomic-embed-text, openai → text-embedding-3-small, openrouter → openai/text-embedding-3-small
PROVIDER_MAP in embedder.py maps provider names to CocoIndex API types
Preflight checks validate API key for remote providers instead of checking Ollama connectivity
config check CLI command shows provider-specific diagnostics

Docker Compose profiles:

ollama service moved behind profiles: ["ollama"]
app service's ollama dependency set to required: false (Compose v2.20+)
docker compose up -d now starts only PostgreSQL
docker compose --profile ollama up -d starts PostgreSQL + Ollama
docker compose --profile app --profile ollama up starts full stack

Metadata tracking:

Index metadata records embedding_provider and embedding_model
Warning logged when reindexing with a different provider/model than the original

Move the ollama service behind `profiles: ["ollama"]` so it only starts when explicitly requested. This avoids unnecessary startup time and resource usage when using remote embedding providers (OpenAI/OpenRouter). The app service's ollama dependency is now `required: false` (Compose v2.20+), so `--profile app` works without `--profile ollama`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Support remote embedding providers alongside the default Ollama backend. Provider selection via COCOSEARCH_EMBEDDING_PROVIDER env var or the embedding.provider config field. Each provider has sensible model defaults (ollama→nomic-embed-text, openai→text-embedding-3-small, openrouter→openai/text-embedding-3-small). Key changes: - EmbeddingSection schema gains provider field with validation - PROVIDER_MAP in embedder.py maps provider names to CocoIndex API types - Preflight checks API key for remote providers instead of Ollama - Index metadata tracks embedding_provider and embedding_model - Mismatch detection warns when reindexing with different provider/model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Propagate embedding_provider and embedding_model from index metadata through IndexStats to the /api/stats response and web dashboard. Displayed in the terminal status line next to STATUS and UPTIME. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CocoIndex doesn't recognize OpenRouter-prefixed model names (e.g. openai/text-embedding-3-small), causing "unknown model" errors. Add a known-dimensions map and resolve output_dimension from env var or map before passing to EmbedText. Also add outputDimension config field for user override with custom models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cocosearch.yaml embedding.provider and embedding.model values were silently ignored during indexing because runtime code read os.environ directly. Add bridge_embedding_config() to ConfigResolver that resolves through the full precedence chain (env > config > default) and writes into env vars for downstream code. Also update config check command to use resolver for accurate source display. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use status-ok class to match the ONLINE highlight style. Provider is uppercased for consistency; model retains original case for readability. Values are HTML-escaped for defense-in-depth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace "no external APIs — everything runs locally" with "local by default, remote optional" framing across README, CLAUDE.md, and docs. Ollama remains the default; OpenAI/OpenRouter are opt-in. Clarifies that even with remote providers, only chunk text leaves the machine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When switching embedding providers (e.g. Ollama→OpenAI), the dimension change could cause the data table to be dropped during migration while CocoIndex metadata still referenced it. flow.setup() would then skip table creation and flow.update() would silently fail per-row in the Rust engine without raising a Python exception, preventing stale-state recovery from triggering. - Add table existence check after flow.setup() to detect metadata/table mismatch and trigger automatic recovery - Make --fresh use _clean_stale_flow_state() for robust cleanup even when flow.drop() fails - Always close in-memory flow registration before opening (proactive, not reactive to KeyError) - Include tracking table in _clean_stale_flow_state() cleanup - Make clear_index() fully clean CocoIndex metadata, deps, and tracking tables so re-index starts fresh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…panel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix TestCsLogFallback test isolation: add autouse fixture to reset global _log_buffer so fallback-to-Python-logging path is exercised even when earlier tests leave the singleton set - Fix TestHybridSearchGracefulDegradation: update 3 tests to mock _get_cs_log() instead of logger.warning, matching the structured logging migration (cs_log.infra replaces logger.warning) - Fix RichLogHandler deadlock: accept optional file parameter so setup_log_capture() can pass the original stderr (before StderrCapture wrapping), preventing re-entrant LogBuffer._lock acquisition - Add test_custom_file_bypasses_stderr test for RichLogHandler file param - Apply ruff formatting to 11 files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…eaders Browser-cached JS modules caused the log panel to show entries without category badges and structured fields. Fix with three layers: - Cache-Control: no-cache on /dashboard HTML and /static/* responses - Version query params (?v=x.y.z) on CSS/JS entry point URLs - Import map injection to cache-bust all ES module sub-imports Also includes: thread-safe LogBuffer subscriber fan-out, RichLogHandler Rich markup escaping fix, and removal of premature filter UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Plugin users need to know that the MCP server inherits env vars from their shell, so COCOSEARCH_EMBEDDING_PROVIDER and API key must be exported in their shell profile for remote embeddings to work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

get_flow_full_name() triggers a cocoindex DB connection via setting.get_app_namespace(). When COCOINDEX_DATABASE_URL is set (side effect of get_database_url()) but the database is unreachable (CI), this hangs for ~34s causing 5 test timeouts. Use the flow name directly since the cocoindex namespace is always empty for cocosearch. Also clean up the leaked env var between tests in conftest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

VioletCranberry and others added 24 commits February 24, 2026 12:08

docs: add remote embedding provider config to MCP guide

b17a7aa

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): extend LogEntry with category and fields

8048991

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: tighten LogEntry fields type annotation to dict[str, Any]

131095d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): add structured domain logger (cs_log) with categories

a4180e9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): add Rich terminal handler and rotating file handler

2ebff8c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): add logging.file config option

be2ecff

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): add @log_mcp_tool decorator to all MCP tools

b9d2515

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): instrument search and cache with structured logging

a38ba41

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): instrument indexing, infrastructure, and deps

6cd3315

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): instrument server lifecycle with structured logging

0d211ad

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(logging): add category rendering and filter UI to dashboard log …

9f351da

…panel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: add .playwright-mcp/ to gitignore

48e7eda

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/remote embeddings#34

Feat/remote embeddings#34
VioletCranberry wants to merge 24 commits intomainfrom
feat/remote-embeddings

VioletCranberry commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

VioletCranberry commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant