|
| 1 | +# Vector Store Adapters |
| 2 | + |
| 3 | +VectorPin ships thin adapters for the major vector databases. Adapters do two things: |
| 4 | + |
| 5 | +1. **Walk records** — Iterate the collection yielding `(id, vector, metadata, pin)` tuples for verification. |
| 6 | +2. **Attach pins** — Write a pin into the record's metadata in whichever shape the backend prefers. |
| 7 | + |
| 8 | +The adapter protocol lives at [`src/vectorpin/adapters/base.py`](https://github.com/ThirdKeyAI/VectorPin/blob/main/src/vectorpin/adapters/base.py) and is intentionally thin. Community contributions for new backends are welcome. |
| 9 | + |
| 10 | +--- |
| 11 | + |
| 12 | +## Status |
| 13 | + |
| 14 | +| Backend | Status | Install | Notes | |
| 15 | +|---|---|---|---| |
| 16 | +| LanceDB *(default)* | Alpha | `pip install 'vectorpin[default]'` | Embedded, file-based, no daemon. Recommended. | |
| 17 | +| Chroma | Alpha | `pip install 'vectorpin[chroma]'` | Both persistent and HTTP modes. | |
| 18 | +| Qdrant | Alpha | `pip install 'vectorpin[qdrant]'` | Server-side payload filtering. | |
| 19 | +| Pinecone | Alpha | `pip install 'vectorpin[pinecone]'` | Hosted only. | |
| 20 | +| pgvector | Planned | — | | |
| 21 | +| FAISS | Planned | Use `LanceDBAdapter` (embedded, has metadata column natively). | | |
| 22 | + |
| 23 | +All adapters present the same `iter_records()` / `attach_pin()` interface. The backend differences are limited to where the pin physically lives in the underlying record. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Storage Convention |
| 28 | + |
| 29 | +By convention, pins are stored under the metadata key `vectorpin`. Specifically: |
| 30 | + |
| 31 | +| Backend | Pin lives at | |
| 32 | +|---|---| |
| 33 | +| LanceDB | A typed schema column literally named `vectorpin` (string-valued, holding the pin JSON). | |
| 34 | +| Chroma | The `metadata` dict, under key `vectorpin`. | |
| 35 | +| Qdrant | The `payload` dict, under key `vectorpin`. | |
| 36 | +| Pinecone | The `metadata` dict, under key `vectorpin`. | |
| 37 | + |
| 38 | +Backends without free-form metadata fields are out of scope — provenance must travel with the data, not in a sidecar. |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## LanceDB (default) |
| 43 | + |
| 44 | +LanceDB is the recommended default: embedded, file-based, no daemon, with a typed schema column that holds the Pin natively. It matches the [Symbiont runtime's](https://github.com/thirdkeyai/symbiont) default vector backend. |
| 45 | + |
| 46 | +### Pin a corpus |
| 47 | + |
| 48 | +```python |
| 49 | +from vectorpin import Signer |
| 50 | +from vectorpin.adapters import LanceDBAdapter |
| 51 | + |
| 52 | +adapter = LanceDBAdapter.connect("./data/vector_db", "rag-corpus") |
| 53 | +signer = Signer.generate(key_id="prod-2026-05") |
| 54 | + |
| 55 | +for record in adapter.iter_records(): |
| 56 | + pin = signer.pin( |
| 57 | + source=record.metadata["text"], |
| 58 | + model="text-embedding-3-large", |
| 59 | + vector=record.vector, |
| 60 | + ) |
| 61 | + adapter.attach_pin(record.id, pin) |
| 62 | +``` |
| 63 | + |
| 64 | +### Verify a corpus |
| 65 | + |
| 66 | +```python |
| 67 | +from vectorpin import Verifier |
| 68 | +from vectorpin.adapters import LanceDBAdapter |
| 69 | + |
| 70 | +adapter = LanceDBAdapter.connect("./data/vector_db", "rag-corpus") |
| 71 | +verifier = Verifier({"prod-2026-05": public_key_bytes}) |
| 72 | + |
| 73 | +failed = 0 |
| 74 | +for record in adapter.iter_records(): |
| 75 | + if record.pin is None: |
| 76 | + continue |
| 77 | + result = verifier.verify( |
| 78 | + record.pin, |
| 79 | + source=record.metadata["text"], |
| 80 | + vector=record.vector, |
| 81 | + ) |
| 82 | + if not result.ok: |
| 83 | + print(f"FAIL {record.id} [{result.error.value}] {result.detail}") |
| 84 | + failed += 1 |
| 85 | + |
| 86 | +assert failed == 0, f"{failed} records failed verification" |
| 87 | +``` |
| 88 | + |
| 89 | +### Connection options |
| 90 | + |
| 91 | +`LanceDBAdapter.connect` accepts a URI (directory path, `s3://`, `gs://`, or LanceDB Cloud connection string), a table name, and optional column overrides: |
| 92 | + |
| 93 | +```python |
| 94 | +adapter = LanceDBAdapter.connect( |
| 95 | + uri="s3://my-bucket/vector_db", |
| 96 | + table_name="rag-corpus", |
| 97 | + id_column="id", # default: "id" |
| 98 | + vector_column="vector", # default: "vector" |
| 99 | +) |
| 100 | +``` |
| 101 | + |
| 102 | +### Symbiont schema |
| 103 | + |
| 104 | +For Symbiont deployments: Symbiont's source text lives in the `content` column. Symbiont's column literally named `source` is upstream provenance (a URL), not VectorPin's `source` argument. Pass `source=record.metadata["content"]` when calling `signer.pin`. See [`tests/test_adapter_lancedb_symbiont.py`](https://github.com/ThirdKeyAI/VectorPin/blob/main/tests/test_adapter_lancedb_symbiont.py) for an end-to-end example. |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## Chroma |
| 109 | + |
| 110 | +Chroma offers both an embedded persistent client and a remote HTTP client. The adapter supports both. |
| 111 | + |
| 112 | +### Persistent (embedded) |
| 113 | + |
| 114 | +```python |
| 115 | +from vectorpin.adapters import ChromaAdapter |
| 116 | + |
| 117 | +adapter = ChromaAdapter.connect_persistent("./chroma_db", "my-rag") |
| 118 | +``` |
| 119 | + |
| 120 | +### HTTP |
| 121 | + |
| 122 | +```python |
| 123 | +adapter = ChromaAdapter.connect_http( |
| 124 | + host="chroma.internal", |
| 125 | + port=8000, |
| 126 | + collection_name="my-rag", |
| 127 | + ssl=False, |
| 128 | +) |
| 129 | +``` |
| 130 | + |
| 131 | +### Pinning |
| 132 | + |
| 133 | +```python |
| 134 | +for record in adapter.iter_records(): |
| 135 | + pin = signer.pin( |
| 136 | + source=record.metadata["text"], |
| 137 | + model="text-embedding-3-large", |
| 138 | + vector=record.vector, |
| 139 | + ) |
| 140 | + adapter.attach_pin(record.id, pin) |
| 141 | +``` |
| 142 | + |
| 143 | +The pin is stored as a JSON string under `metadata["vectorpin"]`. Chroma metadata is `dict[str, str | int | float | bool]`, so the pin survives the JSON-string round trip without loss. |
| 144 | + |
| 145 | +--- |
| 146 | + |
| 147 | +## Qdrant |
| 148 | + |
| 149 | +Qdrant supports both local and Qdrant Cloud deployments. Pins are written into the `payload` dict. |
| 150 | + |
| 151 | +```python |
| 152 | +from vectorpin.adapters import QdrantAdapter |
| 153 | + |
| 154 | +adapter = QdrantAdapter.connect( |
| 155 | + url="http://localhost:6333", |
| 156 | + collection_name="my-rag", |
| 157 | + api_key=None, # set for Qdrant Cloud |
| 158 | +) |
| 159 | + |
| 160 | +for record in adapter.iter_records(batch_size=256): |
| 161 | + pin = signer.pin( |
| 162 | + source=record.metadata["text"], |
| 163 | + model="text-embedding-3-large", |
| 164 | + vector=record.vector, |
| 165 | + ) |
| 166 | + adapter.attach_pin(record.id, pin) |
| 167 | +``` |
| 168 | + |
| 169 | +Qdrant's payload filtering means you can query for unpinned records server-side: |
| 170 | + |
| 171 | +```python |
| 172 | +# Pseudo — exact API depends on qdrant-client version |
| 173 | +unpinned = client.scroll( |
| 174 | + collection_name="my-rag", |
| 175 | + scroll_filter={"must_not": [{"key": "vectorpin", "match": {"any": ["*"]}}]}, |
| 176 | +) |
| 177 | +``` |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Pinecone |
| 182 | + |
| 183 | +Pinecone is hosted-only. Pins are stored under `metadata["vectorpin"]` as a JSON string. |
| 184 | + |
| 185 | +```python |
| 186 | +from vectorpin.adapters import PineconeAdapter |
| 187 | + |
| 188 | +adapter = PineconeAdapter.connect( |
| 189 | + api_key="...", |
| 190 | + index_name="my-rag", |
| 191 | +) |
| 192 | + |
| 193 | +for record in adapter.iter_records(): |
| 194 | + pin = signer.pin( |
| 195 | + source=record.metadata["text"], |
| 196 | + model="text-embedding-3-large", |
| 197 | + vector=record.vector, |
| 198 | + ) |
| 199 | + adapter.attach_pin(record.id, pin) |
| 200 | +``` |
| 201 | + |
| 202 | +Pinecone metadata values are size-limited (40 KiB per record). VectorPin pins are well under 1 KiB at typical sizes, so you'll never hit the limit — but if you stuff large `extra` payloads in, double-check. |
| 203 | + |
| 204 | +--- |
| 205 | + |
| 206 | +## Choosing a Backend |
| 207 | + |
| 208 | +| If you... | Use | |
| 209 | +|---|---| |
| 210 | +| Just want pinning without standing up a server | **LanceDB** (default) | |
| 211 | +| Already run Chroma | Chroma | |
| 212 | +| Need server-side payload filtering | Qdrant | |
| 213 | +| Are on Pinecone today | Pinecone | |
| 214 | +| Run Symbiont | LanceDB (matches Symbiont's default backend) | |
| 215 | + |
| 216 | +LanceDB also gives you a typed `vectorpin` column, which is more grep-able than a JSON blob in a metadata dict — useful when reasoning about partial backfills. |
| 217 | + |
| 218 | +--- |
| 219 | + |
| 220 | +## Writing a New Adapter |
| 221 | + |
| 222 | +The adapter protocol is two methods plus a record dataclass. Sketch: |
| 223 | + |
| 224 | +```python |
| 225 | +from dataclasses import dataclass |
| 226 | +from typing import Iterator |
| 227 | +import numpy as np |
| 228 | +from vectorpin import Pin |
| 229 | + |
| 230 | +@dataclass |
| 231 | +class PinnedRecord: |
| 232 | + id: str |
| 233 | + vector: np.ndarray |
| 234 | + metadata: dict |
| 235 | + pin: Pin | None |
| 236 | + |
| 237 | +class MyBackendAdapter: |
| 238 | + @classmethod |
| 239 | + def connect(cls, ...) -> "MyBackendAdapter": |
| 240 | + ... |
| 241 | + |
| 242 | + def iter_records(self, batch_size: int = 256) -> Iterator[PinnedRecord]: |
| 243 | + ... |
| 244 | + |
| 245 | + def attach_pin(self, record_id: str, pin: Pin) -> None: |
| 246 | + ... |
| 247 | +``` |
| 248 | + |
| 249 | +See [`src/vectorpin/adapters/base.py`](https://github.com/ThirdKeyAI/VectorPin/blob/main/src/vectorpin/adapters/base.py) for the canonical protocol and the existing adapters for working examples. |
| 250 | + |
| 251 | +--- |
| 252 | + |
| 253 | +## See Also |
| 254 | + |
| 255 | +- [CLI Guide](cli-guide.md#audit-commands) — Command-line equivalents to programmatic auditing |
| 256 | +- [Getting Started](getting-started.md) — End-to-end pinning + verification walkthrough |
| 257 | +- [Pin Protocol](pin-protocol.md) — Wire format and verification order |
0 commit comments