Bubble bullshit #205

m1rl0k · 2026-01-26T10:05:18Z

No description provided.

Invoke garbage collection after processing each file in index_repo to prevent memory accumulation. Also update FakeClient in tests to support on_disk_payload parameter in create_collection.

Introduces a new embedding_service with Dockerfile, FastAPI server, and requirements for serving ONNX-based embeddings as a shared service. Updates docker-compose.yml to include the embedding service and its cache volume. Modifies reset.py to always include the embedding service in container management and updates help text. Enhances qdrant.py to support both local and remote embedding providers, with concurrency control and HTTP fallback. Adds an empty __init__.py for embedding_service package initialization.

augmentcode · 2026-01-26T10:11:05Z

🤖 Augment PR Summary

Summary: Introduces a shared embedding microservice and an optional remote-embedding mode to reduce per-indexer ONNX memory usage.

Changes:

Adds embedding_service/ (FastAPI + FastEmbed) exposing /embed and /health, with semaphore-based concurrency and internal batching.
Updates the ingest pipeline to route embed_batch() through either local ONNX or an HTTP embedding service via EMBEDDING_PROVIDER.
Extends Docker Compose to run/scale the embedding service and introduces an embedding_cache volume.
Adds Kubernetes and Helm manifests for the embedding service (Deployment/Service/HPA + probes) and wires new env vars (EMBEDDING_SERVICE_URL, INDEX_WORKERS).
Updates ctx reset to start the embedding service early, wait for readiness, and scale replicas.
Adds remote warm-start behavior and documents new embedding/ONNX tuning env vars in .env.example.

Technical Notes: Remote mode uses requests and supports client-side round-robin via EMBEDDING_SERVICE_URLS; local mode throttles ONNX inference with EMBED_MAX_CONCURRENT.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 4 suggestions posted.

Comment augment review to trigger a new review at any time.

docker-compose.yml

embedding_service/main.py

scripts/ingest/qdrant.py

Updated docker-compose.yml to support 2 replicas of the embedding service with increased memory limits and improved healthcheck. Added remote embedding provider detection and stub in embedder.py to avoid loading ONNX models when using a remote service, saving memory. Requirements updated to include requests. Concurrency environment variable parsing hardened in main.py and qdrant.py.

Adds detailed ONNX and embedding model configuration options to .env.example and docker-compose.yml, including threading and batching controls. Embedding service now supports ONNX CPU optimizations and improved batching for efficiency. The reset CLI command starts multiple embedding service replicas and waits for readiness, improving reliability for indexing workflows.

Introduces the _wait_for_embedding function to check if the embedding service is available before proceeding. This improves reliability by ensuring dependent services are ready before use.

Updated Dockerfile to allow embedding model selection via build argument, defaulting to nomic-ai/nomic-embed-text-v1.5 for improved performance. Modified reset.py to support overriding the default tokenizer URL using the TOKENIZER_URL environment variable.

m1rl0k · 2026-01-26T11:27:36Z

auggie review

augmentcode

Review completed. 5 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-26T11:31:18Z

scripts/embedder.py

 _ARCTIC_V2_REGISTER_LOCK = threading.Lock()

+# Remote embedding provider detection
+_EMBEDDING_PROVIDER = os.environ.get("EMBEDDING_PROVIDER", "local").strip().lower()


_EMBEDDING_PROVIDER is read once at import time, so changing EMBEDDING_PROVIDER later (e.g., via tests/CLI env loading) won’t switch between local/remote and may unexpectedly load ONNX in-process. Consider ensuring get_embedding_model() re-evaluates the env/config so remote mode can’t become stale.

_{🤖 Was this useful? React with 👍 or 👎}

augmentcode · 2026-01-26T11:31:18Z

scripts/ingest/qdrant.py

+_EMBED_SEMAPHORE = threading.Semaphore(_EMBED_MAX_CONCURRENT)
+
+# Remote embedding service configuration
+_EMBEDDING_PROVIDER = os.environ.get("EMBEDDING_PROVIDER", "local").strip().lower()


_EMBEDDING_PROVIDER/_EMBEDDING_SERVICE_URL(_S) are captured at module import, so runtime env changes (common in tests) won’t affect embed_batch() routing or target URL. This can lead to embedding calls going to the wrong provider/endpoint despite updated env vars.

_{🤖 Was this useful? React with 👍 or 👎}

augmentcode · 2026-01-26T11:31:18Z

embedding_service/Dockerfile

+# Default: nomic (faster + better quality than BGE)
+# Override at build time: --build-arg EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
+ARG EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v1.5
+RUN python -c "from fastembed import TextEmbedding; TextEmbedding('${EMBEDDING_MODEL}')"


The build-time TextEmbedding('${EMBEDDING_MODEL}') pre-download will use the default cache location, but runtime config points HF_HOME/FASTEMBED_CACHE_PATH at /tmp/..., so this layer may not actually speed startup and can just increase image size. Worth verifying cache paths are consistent between build and runtime.

_{🤖 Was this useful? React with 👍 or 👎}

augmentcode · 2026-01-26T11:31:18Z

docker-compose.yml

+      - MKL_NUM_THREADS=${ONNX_THREADS:-4}
+      # Cache paths
+      - HF_HOME=/tmp/huggingface
+      - FASTEMBED_CACHE_PATH=/tmp/fastembed


For the embedding service, FASTEMBED_CACHE_PATH is set to /tmp/fastembed but only /tmp/huggingface is volume-backed, so FastEmbed/ONNX downloads may not persist and could be re-fetched after container recreation. If persistence is intended, the cache path and mounted volume likely need to align.

_{🤖 Was this useful? React with 👍 or 👎}

augmentcode · 2026-01-26T11:31:18Z

embedding_service/main.py

+
+# ONNX runtime optimizations
+ONNX_THREADS = int(os.environ.get("ONNX_THREADS", "0") or 0)  # 0 = auto (1 per physical core)
+ONNX_DISABLE_SPINNING = os.environ.get("ONNX_DISABLE_SPINNING", "0").strip().lower() in {"1", "true", "yes"}


ONNX_DISABLE_SPINNING is parsed and reported in /health, but it doesn’t appear to be applied to FastEmbed/onnxruntime anywhere (only threads is passed into TextEmbedding). If operators rely on this env var, the service may not actually change spinning behavior.

_{🤖 Was this useful? React with 👍 or 👎}

Align embedding cache paths in Docker and compose files, default to quantized model for lower memory, and add aggressive GC and memory reporting in embedding_service. Refactor embedding provider and service URL logic for runtime flexibility and improved testability. Add remote embedding support and load balancing in hybrid/embed.py and ingest/qdrant.py. Update Neo4j status check to use HTTP API. Ensure remote embedding dimension probing in mcp_memory_server. These changes improve memory efficiency, remote deployment, and testability.

m1rl0k added 2 commits January 26, 2026 04:27

Add gc.collect to free memory after file indexing

f81f186

Invoke garbage collection after processing each file in index_repo to prevent memory accumulation. Also update FakeClient in tests to support on_disk_payload parameter in create_collection.

augmentcode bot reviewed Jan 26, 2026

View reviewed changes

docker-compose.yml Outdated Show resolved Hide resolved

docker-compose.yml Outdated Show resolved Hide resolved

embedding_service/main.py Outdated Show resolved Hide resolved

scripts/ingest/qdrant.py Show resolved Hide resolved

m1rl0k added 4 commits January 26, 2026 05:16

Add wait function for embedding service readiness

abb3cd4

Introduces the _wait_for_embedding function to check if the embedding service is available before proceeding. This improves reliability by ensuring dependent services are ready before use.

augmentcode bot reviewed Jan 26, 2026

View reviewed changes

m1rl0k merged commit 3853760 into test Jan 26, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bubble bullshit #205

Bubble bullshit #205

Uh oh!

m1rl0k commented Jan 26, 2026

Uh oh!

augmentcode bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

augmentcode bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

m1rl0k commented Jan 26, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Jan 26, 2026

Uh oh!

augmentcode bot Jan 26, 2026

Uh oh!

augmentcode bot Jan 26, 2026

Uh oh!

augmentcode bot Jan 26, 2026

Uh oh!

augmentcode bot Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bubble bullshit #205

Bubble bullshit #205

Uh oh!

Conversation

m1rl0k commented Jan 26, 2026

Uh oh!

augmentcode bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

m1rl0k commented Jan 26, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

augmentcode bot commented Jan 26, 2026 •

edited

Loading