Skip to content

feat(ColBERT): Create, update, batch insert, and query objects with multi-vector indices #356

Merged
bevzzz merged 10 commits intomainfrom
feat/colbert
Mar 3, 2025
Merged

feat(ColBERT): Create, update, batch insert, and query objects with multi-vector indices #356
bevzzz merged 10 commits intomainfrom
feat/colbert

Conversation

@bevzzz
Copy link
Collaborator

@bevzzz bevzzz commented Feb 28, 2025

What's changed

This PR adds supports for ColBERT embeddings / multi-vectors.
Notable changes:

  • Collection has a new parameter MultiVectorConfig
  • GRPC util methods extended to handle Float[][] vectors (byteops package as a reference implementation)
  • Custom JsonSerializer + JsonDeserializer for WeaviateObject which merges named vectors and named "multi-vectors" to "vectors" field

Note, that WeaviateObject does not have a Float[][] vector field, as Weaviate only expects 1d-vectors under "vector".

How is this tested:

  • Unit tests for float arrays (de-)serialization
  • Integration tests for:
    • collection configuration
    • creating/updating multi-vectors on a collection
    • batch insertion

This commit:
- Provides MultiVectorConfig to enable multi-vector indices in
  collections
- Registers custom deserializer for WeaviateObject with Gson
- Extends GRPC utilities to (de-)serialize 2-dimensional float arrays
  to/from ByteString
Extended WeaviateObject.Adapter to also implement JsonSerializer.
Fixed some bugs in the custom serializer introduced previously.
Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

@bevzzz bevzzz requested a review from a team as a code owner March 3, 2025 11:46
Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 1   low 0   info 0 View in Orca
☢️ The following Vulnerabilities (CVEs) have been detected
PACKAGE FILE CVE ID INSTALLED VERSION FIXED VERSION
high net.minidev:json-smart ./pom.xml CVE-2024-57699 2.5.1 2.5.2 View in code

@bevzzz bevzzz merged commit 9de71fe into main Mar 3, 2025
5 checks passed
@bevzzz bevzzz deleted the feat/colbert branch March 3, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants