Description
Today, if we ingest a large piece of text into a Knowledge base entry, only the first 512 word pieces are used for creating the embeddings that ELSER uses to match on during semantic search.
This means that if the relevant parts for the query is not that the "start" of this big text, it won't match even though there may be critical information at the end of this text.
We should attempt to apply chunking to all documents ingested into the Knowledge base so that the recall search has a better chance of finding relevant hits, regardless of their size.
As a stretch, it would also be valuable if it was possible to extract only the relevant chunk (512 word pieces?) from the matched document in order to send less (and only relevant) text to the LLM.
AC
- Large texts imported into the Knowledge base get embeddings that cover the full text
- The Ingest pipeline used to apply the chunking is shared in docs so users can apply it to their
search-*
indices as well - Recall is able to search across small Knowledge base documents ("single" embedding) and large documents ("multiple" embeddings) in a seamless manner
- (Stretch) Only the relevant part of a "multiple embeddings" document is passed to the LLM
More resources on chunking https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/document-chunking