Closed
Description
openedon Jul 10, 2024
Currently we do a character/word based chunking that is very simple. We should enhance our chunking strategies to possibly include:
- Recursive Character Chunking
- Token Based Chunking
- Document Specific Chunking (HTML, MD, Python, CPP, etc)
- Semantic Chunking
Here is some possible literature:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment