Skip to content

Commit 8afa8ac

Browse files
committed
Add Zensical documentation site
Mirrors the structure of the AgentPin docs: index, getting-started, pin-protocol, CLI guide, adapters, detectors, deployment, security, troubleshooting. Wired up via zensical.toml at the repo root. The existing docs/spec.md remains the normative protocol reference (the new pin-protocol.md is the readable user-facing walkthrough). Fixes vs. the working draft: - Repository URL casing aligned to github.com/ThirdKeyAI/VectorPin. - pin-protocol.md cross-language anchor link points at testvectors/v2.json rather than v1.json (matches current main). Site rendering is handled by the Zensical static site generator; this patch only adds the source content and config.
1 parent a9fc248 commit 8afa8ac

10 files changed

Lines changed: 2090 additions & 0 deletions

docs/adapters.md

Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
# Vector Store Adapters
2+
3+
VectorPin ships thin adapters for the major vector databases. Adapters do two things:
4+
5+
1. **Walk records** — Iterate the collection yielding `(id, vector, metadata, pin)` tuples for verification.
6+
2. **Attach pins** — Write a pin into the record's metadata in whichever shape the backend prefers.
7+
8+
The adapter protocol lives at [`src/vectorpin/adapters/base.py`](https://github.com/ThirdKeyAI/VectorPin/blob/main/src/vectorpin/adapters/base.py) and is intentionally thin. Community contributions for new backends are welcome.
9+
10+
---
11+
12+
## Status
13+
14+
| Backend | Status | Install | Notes |
15+
|---|---|---|---|
16+
| LanceDB *(default)* | Alpha | `pip install 'vectorpin[default]'` | Embedded, file-based, no daemon. Recommended. |
17+
| Chroma | Alpha | `pip install 'vectorpin[chroma]'` | Both persistent and HTTP modes. |
18+
| Qdrant | Alpha | `pip install 'vectorpin[qdrant]'` | Server-side payload filtering. |
19+
| Pinecone | Alpha | `pip install 'vectorpin[pinecone]'` | Hosted only. |
20+
| pgvector | Planned || |
21+
| FAISS | Planned | Use `LanceDBAdapter` (embedded, has metadata column natively). | |
22+
23+
All adapters present the same `iter_records()` / `attach_pin()` interface. The backend differences are limited to where the pin physically lives in the underlying record.
24+
25+
---
26+
27+
## Storage Convention
28+
29+
By convention, pins are stored under the metadata key `vectorpin`. Specifically:
30+
31+
| Backend | Pin lives at |
32+
|---|---|
33+
| LanceDB | A typed schema column literally named `vectorpin` (string-valued, holding the pin JSON). |
34+
| Chroma | The `metadata` dict, under key `vectorpin`. |
35+
| Qdrant | The `payload` dict, under key `vectorpin`. |
36+
| Pinecone | The `metadata` dict, under key `vectorpin`. |
37+
38+
Backends without free-form metadata fields are out of scope — provenance must travel with the data, not in a sidecar.
39+
40+
---
41+
42+
## LanceDB (default)
43+
44+
LanceDB is the recommended default: embedded, file-based, no daemon, with a typed schema column that holds the Pin natively. It matches the [Symbiont runtime's](https://github.com/thirdkeyai/symbiont) default vector backend.
45+
46+
### Pin a corpus
47+
48+
```python
49+
from vectorpin import Signer
50+
from vectorpin.adapters import LanceDBAdapter
51+
52+
adapter = LanceDBAdapter.connect("./data/vector_db", "rag-corpus")
53+
signer = Signer.generate(key_id="prod-2026-05")
54+
55+
for record in adapter.iter_records():
56+
pin = signer.pin(
57+
source=record.metadata["text"],
58+
model="text-embedding-3-large",
59+
vector=record.vector,
60+
)
61+
adapter.attach_pin(record.id, pin)
62+
```
63+
64+
### Verify a corpus
65+
66+
```python
67+
from vectorpin import Verifier
68+
from vectorpin.adapters import LanceDBAdapter
69+
70+
adapter = LanceDBAdapter.connect("./data/vector_db", "rag-corpus")
71+
verifier = Verifier({"prod-2026-05": public_key_bytes})
72+
73+
failed = 0
74+
for record in adapter.iter_records():
75+
if record.pin is None:
76+
continue
77+
result = verifier.verify(
78+
record.pin,
79+
source=record.metadata["text"],
80+
vector=record.vector,
81+
)
82+
if not result.ok:
83+
print(f"FAIL {record.id} [{result.error.value}] {result.detail}")
84+
failed += 1
85+
86+
assert failed == 0, f"{failed} records failed verification"
87+
```
88+
89+
### Connection options
90+
91+
`LanceDBAdapter.connect` accepts a URI (directory path, `s3://`, `gs://`, or LanceDB Cloud connection string), a table name, and optional column overrides:
92+
93+
```python
94+
adapter = LanceDBAdapter.connect(
95+
uri="s3://my-bucket/vector_db",
96+
table_name="rag-corpus",
97+
id_column="id", # default: "id"
98+
vector_column="vector", # default: "vector"
99+
)
100+
```
101+
102+
### Symbiont schema
103+
104+
For Symbiont deployments: Symbiont's source text lives in the `content` column. Symbiont's column literally named `source` is upstream provenance (a URL), not VectorPin's `source` argument. Pass `source=record.metadata["content"]` when calling `signer.pin`. See [`tests/test_adapter_lancedb_symbiont.py`](https://github.com/ThirdKeyAI/VectorPin/blob/main/tests/test_adapter_lancedb_symbiont.py) for an end-to-end example.
105+
106+
---
107+
108+
## Chroma
109+
110+
Chroma offers both an embedded persistent client and a remote HTTP client. The adapter supports both.
111+
112+
### Persistent (embedded)
113+
114+
```python
115+
from vectorpin.adapters import ChromaAdapter
116+
117+
adapter = ChromaAdapter.connect_persistent("./chroma_db", "my-rag")
118+
```
119+
120+
### HTTP
121+
122+
```python
123+
adapter = ChromaAdapter.connect_http(
124+
host="chroma.internal",
125+
port=8000,
126+
collection_name="my-rag",
127+
ssl=False,
128+
)
129+
```
130+
131+
### Pinning
132+
133+
```python
134+
for record in adapter.iter_records():
135+
pin = signer.pin(
136+
source=record.metadata["text"],
137+
model="text-embedding-3-large",
138+
vector=record.vector,
139+
)
140+
adapter.attach_pin(record.id, pin)
141+
```
142+
143+
The pin is stored as a JSON string under `metadata["vectorpin"]`. Chroma metadata is `dict[str, str | int | float | bool]`, so the pin survives the JSON-string round trip without loss.
144+
145+
---
146+
147+
## Qdrant
148+
149+
Qdrant supports both local and Qdrant Cloud deployments. Pins are written into the `payload` dict.
150+
151+
```python
152+
from vectorpin.adapters import QdrantAdapter
153+
154+
adapter = QdrantAdapter.connect(
155+
url="http://localhost:6333",
156+
collection_name="my-rag",
157+
api_key=None, # set for Qdrant Cloud
158+
)
159+
160+
for record in adapter.iter_records(batch_size=256):
161+
pin = signer.pin(
162+
source=record.metadata["text"],
163+
model="text-embedding-3-large",
164+
vector=record.vector,
165+
)
166+
adapter.attach_pin(record.id, pin)
167+
```
168+
169+
Qdrant's payload filtering means you can query for unpinned records server-side:
170+
171+
```python
172+
# Pseudo — exact API depends on qdrant-client version
173+
unpinned = client.scroll(
174+
collection_name="my-rag",
175+
scroll_filter={"must_not": [{"key": "vectorpin", "match": {"any": ["*"]}}]},
176+
)
177+
```
178+
179+
---
180+
181+
## Pinecone
182+
183+
Pinecone is hosted-only. Pins are stored under `metadata["vectorpin"]` as a JSON string.
184+
185+
```python
186+
from vectorpin.adapters import PineconeAdapter
187+
188+
adapter = PineconeAdapter.connect(
189+
api_key="...",
190+
index_name="my-rag",
191+
)
192+
193+
for record in adapter.iter_records():
194+
pin = signer.pin(
195+
source=record.metadata["text"],
196+
model="text-embedding-3-large",
197+
vector=record.vector,
198+
)
199+
adapter.attach_pin(record.id, pin)
200+
```
201+
202+
Pinecone metadata values are size-limited (40 KiB per record). VectorPin pins are well under 1 KiB at typical sizes, so you'll never hit the limit — but if you stuff large `extra` payloads in, double-check.
203+
204+
---
205+
206+
## Choosing a Backend
207+
208+
| If you... | Use |
209+
|---|---|
210+
| Just want pinning without standing up a server | **LanceDB** (default) |
211+
| Already run Chroma | Chroma |
212+
| Need server-side payload filtering | Qdrant |
213+
| Are on Pinecone today | Pinecone |
214+
| Run Symbiont | LanceDB (matches Symbiont's default backend) |
215+
216+
LanceDB also gives you a typed `vectorpin` column, which is more grep-able than a JSON blob in a metadata dict — useful when reasoning about partial backfills.
217+
218+
---
219+
220+
## Writing a New Adapter
221+
222+
The adapter protocol is two methods plus a record dataclass. Sketch:
223+
224+
```python
225+
from dataclasses import dataclass
226+
from typing import Iterator
227+
import numpy as np
228+
from vectorpin import Pin
229+
230+
@dataclass
231+
class PinnedRecord:
232+
id: str
233+
vector: np.ndarray
234+
metadata: dict
235+
pin: Pin | None
236+
237+
class MyBackendAdapter:
238+
@classmethod
239+
def connect(cls, ...) -> "MyBackendAdapter":
240+
...
241+
242+
def iter_records(self, batch_size: int = 256) -> Iterator[PinnedRecord]:
243+
...
244+
245+
def attach_pin(self, record_id: str, pin: Pin) -> None:
246+
...
247+
```
248+
249+
See [`src/vectorpin/adapters/base.py`](https://github.com/ThirdKeyAI/VectorPin/blob/main/src/vectorpin/adapters/base.py) for the canonical protocol and the existing adapters for working examples.
250+
251+
---
252+
253+
## See Also
254+
255+
- [CLI Guide](cli-guide.md#audit-commands) — Command-line equivalents to programmatic auditing
256+
- [Getting Started](getting-started.md) — End-to-end pinning + verification walkthrough
257+
- [Pin Protocol](pin-protocol.md) — Wire format and verification order

0 commit comments

Comments
 (0)