A powerful, logic-driven Streamlit app for extracting, analyzing, and interacting with data from forms and documents. Upload images of forms, extract text using OCR, analyze with AI, chat with your data, and visualize analytics—all in one seamless interface.
- Document Upload: Upload images (JPG, PNG) of forms or documents.
- OCR Extraction: Extracts text from images using Mistral OCR API.
- AI Analysis: Analyzes extracted text for key-value pairs, completeness, and generates summaries using Llama 3 via Groq API.
- Chatbot: Chat with your processed data for instant Q&A and insights.
- Analytics Dashboard: Visualize document stats, type distribution, and processing reports.
- Export: Download analytics and summaries as JSON or TXT.
- Session Management: All data is managed in the current session for privacy and easy clearing.
- Frontend: Streamlit
- Backend: Python
- OCR: Mistral API
- LLM: Llama 3 via Groq API
-
Clone the repository:
git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name -
Install dependencies:
pip install -r requirements.txt
-
Configure API keys:
- Add your API keys to
.streamlit/secrets.toml:MISTRAL_API_KEY = "your-mistral-key" GROQ_API_KEY = "your-groq-key"
- Or set them in the Streamlit Cloud secrets UI.
- Add your API keys to
-
Run the app:
streamlit run streamlit_app.py
- Upload a document image (JPG, PNG) in the "Document Processing" tab.
- Process the document to extract text and analyze with AI.
- View results: Extracted text, key-value pairs, and AI-generated summary.
- Chat with your data in the "Data Chatbot" tab for instant Q&A.
- Explore analytics in the "Analytics Dashboard" tab.
- Export analytics or summaries as JSON/TXT for further use.
- Clear all data at any time from the sidebar.
- API Keys Required: The app will not run without valid Mistral and Groq API keys.
- Session-based: All data is stored in the current Streamlit session and is cleared when you use the "Clear All Data" button.
- Image Only: This version supports image uploads (JPG, PNG). For PDF/DOCX/text support, see future plans.
- No Database: This version does not use a persistent database; all processing is in-memory for privacy and simplicity.
- Support for PDF, DOCX, and text file uploads.
- Persistent user/session storage and authentication.
- More robust error handling and logging.
- Multi-user support and production-ready data management.
MIT License
Built with ❤️ using Streamlit, Mistral OCR, and Groq Llama 3.