Skip to content

[MODULE] - GPT-based cross-encoder #357

@jhoetter

Description

@jhoetter

Please describe the module you would like to add to bricks
Information retrieval is a 2-step process:

  • basic similarity search (bi-encoders) provide top-100 candidates out of millions of documents extremely fast, but top-100 candidates is still too much
  • to figure out the top-5 candidates, you can apply binary classification: "Is this fact relevant for the query?", and then score by confidence.

I would like to enable users to do this via GPT-3.5-Turbo. We found that this prompt is good:

Take a breath. You are assessing the relevance of question-fact pairs.
If a fact is directly related to the topic of the question (e.g. directly or even by implying consequences), it is "Relevant".
If there is no connection, it is "Irrelevant". In case of doubt, the fact is "Irrelevant".

        Fact: {fact}
        Question: {question}

Determine the relevance. Give a score from 0 to 100 for this (100 would be a straight answer to the question). 
Answer ONLY with the score itself (i.e. a number between 0 and 100).
If you answer with more than one number between 0 and 100, I will not process your output!

Do you already have an implementation?
See above, for openai/azure

Additional context
This requires an API key; for cognition, this is super relevant, but not part of the wizard setup if the user doesn't provide an API key.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions