Skip to content

Add similarity search backend to Retrieval tasks #3406

@Samoed

Description

@Samoed

Description of the feature

As #3192 added possible different backends to the cache wrapper, we can add similar backends to retrieval tasks.

It can have interface like

class SearchBackend:
    def add_document(
        self,
        embeddings: Array,
        idxs: list[str],
    ) -> None:
        """Add documents to the search backend.
        
        Args:
            embeddings: Embeddings of the documents to add.
            idxs: IDs of the documents to add.
        """
        
    def search(
        self,
        embeddings: Array,
    ) -> list[tuple[str, float]]:
        """Search the backend for the given embeddings.

        Args:
            embeddings: Embeddings to search for.

        Returns:
            List of tuples containing document IDs and their relevance scores.
        """

    def save(self, **kwargs):
        ...

    def load(self, **kwargs):
        ...

And then this can be used with SearchEncoderWrapper as:

model = SearchEncoderWrapper(
    mteb.get_model(...),
    backend=SearchBackend(),
)

This method wouldn't require changes for other search models (pylate), but propose change for the encoders

CC @orionw

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions