Ever wished you could grill your documents like a seasoned detective? Well, grab your magnifying glass and your favorite beverage, because Insight Engine is here to turn you into the Sherlock Holmes of your digital library!
Insight Engine is your personal document confidant, powered by the mystical LLAMA3 (running on Ollama, because we like our LLAMAs local and caffeinated). It's here to help you uncover the secrets hidden in your PDFs, DOCXs, and TXTs faster than you can say "elementary, my dear Watson!"
- 📁 Document Whispering: Upload PDFs, DOCXs, and TXTs. It doesn't discriminate against file types (yet).
- 💬 Question Time: Ask your documents anything. They've been dying to spill their secrets!
- 📚 Instant Citations: It don't just give answers, it show the work. Your high school teachers would be proud.
- 🔍 Semantic Sorcery: The search is so smart, it probably aced its SATs.
- 🦙 LLAMA3 on Ollama: All the power of AI, none of the cloud-based trust issues. Your data stays at home, like a good little secret.
- Chunk-a-licious: It slice and dice your docs into bite-sized pieces. Yum!
- Vector Voodoo: These chunks get turned into magic numbers (vectors) and stored in a super-secret clubhouse (index).
- Question Quests: Your burning questions get the same vector treatment.
- Matchmaking: It play matchmaker between your question and the most relevant document chunks.
- LLAMA Time: Your local LLAMA3 (on Ollama) works its magic, turning those chunks into coherent answers.
- Show and Tell: It don't just give you answers, it show you where it found them. Trust issues? It's got you covered!
- Python 3.8+ (because we're not savages)
- Poetry (for corralling our dependencies like a boss)
- Ollama - and pull llama(x) model (because we like our LLAMAs local and free-range)
- Clone the repository 📂
git clone https://github.com/goldenglorys/insight_engine
cd insight_engine
- Let Poetry work its magic 🔨
poetry install
poetry shell
- Run the Streamlit server 🚀
cd engine
streamlit run main.py
-
Point your favorite browser to http://localhost:8501 and watch the fireworks!
-
Upload a document, ask it your deepest, darkest questions, and prepare to be amazed!
- 🕸️ Web page support (because PDFs shouldn't have all the fun)
- 📊 PPTX parsing (death by PowerPoint, no more!)
- 🔦 Citation highlighting (like a textual disco ball)
- 📝 OCR powers (because sometimes PDFs are just glorified images)
- 🔗 Mix and match your own AI cocktail (chain types for the adventurous)
- 📏 Adjustable chunk sizes (for those who like their information bite-sized or super-sized)
- 🦙 Model Menagerie: Choose your AI companion! Whether you prefer your Llama extra fluffy (70B), bite-sized (7B), or with a side of Alpaca, we've got you covered. Soon you'll be able to swap models faster than a chameleon changes colors. Fancy a chat with Mistral? Or perhaps you're in the mood for some Orca-stration? Your wish is our command! Remember, in the world of Ollama, variety is the spice of AI life. Just don't expect them to do your laundry... yet.
This project is licensed under the MIT License - see the LICENSE file for the legal mumbo-jumbo.
- Built on the shoulders of Streamlit giants
- Powered by LLAMA3 (the AI, not the animal)
- Inspired by every student who's ever said, "I don't want to read the whole thing!"
- A nod to our knowledge-hungry cousin, KnowledgeGPT