Skip to content

llama : support reranking API endpoint and models #8555

Closed
@ciekawy

Description

@ciekawy

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Support reranking API and models.

Motivation

Reranking is currently very common techniques used along with embeddings in RAG systems. Also there are models where same model instance can be used for both embeddings and reranking - that is great resource optimisation.

Possible Implementation

Reranking is relatively close to embeddings and there are models for both embed/rerank like bge-m3 - supported by llama.cpp with --embed. I'm guessing that one possible challenge/dilemma is that for inference and embed the OpenAI API schema is being used and OpenAI does not offer rerank API. I think currently there is Jina rerank API commonly used in other projects.
I think that in terms of actual reranking there should not be very complex as it is quite related to embedding calls.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions