This repo demonstrates how to:
-
Generate embeddings (float32, pre-quantized int8, and 1-bit packed) using an embedding API (Voyage AI).
-
Store them in MongoDB using the Java Sync Driver.
-
Build Atlas Vector Search indexes for:
- Baseline float32 (no quantization)
- Automatic Scalar Quantization
- Automatic Binary Quantization
- Pre-quantized int8 ingestion
- Pre-quantized int1 (packed bit) ingestion
-
Run vector search queries across all paths to compare recall, latency, and memory trade-offs.
- MongoDB Atlas cluster (an M0 Free Tier is fine)
- Java 21
- Maven 3.9.x+
- Voyage AI API key (or another provider that supports pre-quantized outputs)
- Network access to Atlas (IP allowlist configured)
Mainclass: orchestrates embed → insert → index → query.ResponseDouble: model for float embeddings (List<Double>).ResponseBytes: model for pre-quantized byte embeddings (int8 and 1-bit).- HTTP via OkHttp; JSON via Jackson/
org.json. - Vector index creation via
SearchIndexModel(Atlas Vector Search). - Query via aggregation
vectorSearchwith either float vectors orBinaryVectorwrappers.
-
Automatic quantization: Store float vectors (as doubles); let Atlas quantize at index time (
scalarorbinary). -
Pre-quantized ingestion: Store model-returned
int8andint1vectors directly as BSONbinData, using:BinaryVector.int8Vector(byte[])→binData(int8)BinaryVector.packedBitVector(byte[], padding)→binData(int1)
-
Side-by-side queries: Run the same query across all five paths to see score/recall differences and understand the trade-offs between fidelity and resource usage.
Set the following environment variables:
export VOYAGE_API_KEY=YOUR_VOYAGE_AI_API_KEY
export MONGODB_URI="mongodb+srv://<user>:<pass>@<cluster>/<db>?retryWrites=true&w=majority"The sample uses:
-
Database:
test -
Collection:
demo -
Index name:
vector_index -
Embedding dimensions:
1024 -
Similarity:
- float/auto-scalar/auto-binary/int8:
dotProduct - int1:
euclidean(required for 1-bit)
- float/auto-scalar/auto-binary/int8:
Defined in pom.xml:
mongodb-driver-syncokhttpjackson-databindorg.jsonslf4j-api(andslf4j-simplefor tests)junit(tests)
-
Clone and build
mvn clean compile
-
Run the demo
mvn exec:java -Dexec.mainClass="com.timkelly.Main" -
What it does
-
Calls the embedding API three ways:
- float32 (stored as doubles)
- pre-quantized int8
- pre-quantized int1 (packed bits)
-
Inserts documents with all three representations.
-
Creates a Vector Search index with five fields:
embeddings_float32(baseline)embeddings_auto_scalar(auto scalar)embeddings_auto_binary(auto binary)embeddings_int8(pre-quantized int8)embeddings_int1(pre-quantized 1-bit)
-
Runs a vector search over each field and prints results.
-
-
Expected output
- Console prints “Outputting results:” blocks for each index path with
textandvectorSearchScore. - Scores differ slightly by method, illustrating the recall/latency/memory trade-offs.
- Console prints “Outputting results:” blocks for each index path with
- Dimensions: Keep
output_dimensionin the embedding request aligned with the indexnumDimensions(1024 in the sample). - Similarity:
int1(packed bit) vectors useeuclidean.int8supportscosine,euclidean, ordotProduct. - Normalization: If you choose
dotProduct, use L2-normalized embeddings (many providers already return normalized vectors for dot-product/cosine equivalence). - Index rebuilds: Changing the index definition (e.g., switching quantization) triggers a rebuild. If you previously created the index, drop or update it before recreating to avoid duplicate errors.
- Network/auth: Ensure your Atlas IP allowlist and credentials are correct if connections fail.
If you want to reset the dataset:
- Drop the
democollection, or - Drop the
vector_indexsearch index and recreate it.