Skip to content

Use ColBERT to re-chunk content on the fly #14

Open
@bdb-dd

Description

Description

Currently, we are truncating long articles. This was a temporary measure to reduce the context length required but has some significant drawbacks.

Now that ColBERT is demonstrably fast enough for online tasks, we should consider chunking all provided articles and re-ranking the chunks instead of the full articles. This can further reduce our input token budget while resolving the issue of some articles being too long.

There are some concerns regarding the chunking methods available. Will the LLM be able to compensate for missing breadcrumb-like title context?

Tasks

Preview Give feedback
No tasks being tracked yet.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions