Skip to content

Commit e597969

Browse files
authored
Merge pull request #182 from namanvirk18/groundx-doc-pipeline
Add GroundX doc processing pipeline files
2 parents 1db6bc0 + 421f797 commit e597969

File tree

12 files changed

+7296
-0
lines changed

12 files changed

+7296
-0
lines changed

groundX-doc-pipeline/.env.example

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
GROUNDX_API_KEY=your-groundx-api-key-here
2+
OPENAI_API_KEY=your-openai-api-key-here
3+
COMET_API_KEY=your-comet-api-key-here

groundX-doc-pipeline/README.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# World-class Document Processing Pipeline with Ground X
2+
3+
This application demonstrates how to build a Document Processing Pipeline that processes complex documents with tables, figures, and dense text using GroundX's state-of-the-art parsing technology. Users can upload documents and receive comprehensive insights including extracted text, semantic analysis, key insights, and interactive AI-powered document queries.
4+
5+
We use:
6+
7+
- Ground X for SOTA document processing and X-Ray analysis
8+
- Streamlit for the UI
9+
- Ollama for serving LLM locally
10+
11+
---
12+
13+
## Setup and Installation
14+
15+
Ensure you have Python 3.8.1 or later installed on your system.
16+
17+
Install dependencies:
18+
19+
```bash
20+
uv sync
21+
```
22+
23+
Copy `.env.example` to `.env` and configure the following environment variables:
24+
25+
```
26+
GROUNDX_API_KEY=your_groundx_api_key_here
27+
```
28+
29+
```bash
30+
# Install Ollama from https://ollama.ai/
31+
# Pull the required model
32+
ollama pull phi3:mini
33+
# Start Ollama service
34+
ollama serve
35+
```
36+
37+
Run the Streamlit app:
38+
39+
```bash
40+
streamlit run app.py
41+
```
42+
43+
## Project Structure
44+
45+
```
46+
groundX-doc-pipeline/
47+
├── app.py # Main Streamlit application (uses groundx_utils.py)
48+
├── groundx_utils.py # Utility functions for Ground X operations
49+
├── .env # Environment variables (create from .env.example)
50+
├── file/ # Folder containing files for running evaluation
51+
└── README.md # This file
52+
53+
📁 Evaluation Tools:
54+
├── evaluation_geval.py # GEval framework evaluation
55+
└── run_evaluation_cli.py # CLI evaluation runner
56+
```
57+
58+
## Usage
59+
60+
1. Upload a document using the sidebar (supports PDF, PNG, JPG, JPEG, DOCX)
61+
2. Wait for the document to be processed by Ground X
62+
3. Explore the X-Ray analysis results in different tabs:
63+
- JSON Output: Raw analysis data
64+
- Narrative Summary: Extracted narratives
65+
- File Summary: Document overview
66+
- Suggested Text: AI-suggested content
67+
- Extracted Text: Raw text extraction
68+
- Keywords: Document keywords
69+
4. Use the chat interface to ask questions about your document
70+
71+
## Features
72+
73+
The app implements a world-class document processing workflow:
74+
75+
- **Ground X Bucket Management**: Automatic bucket creation and document organization
76+
- **Document Ingestion**: Support for PDF, Word docs, images, and more
77+
- **X-Ray Analysis**: Rich structured data with summaries, page chunks, keywords, and metadata
78+
- **Context Engineering**: Intelligent context preparation for LLM queries
79+
- **AI Chat Interface**: Interactive Q&A powered by local LLM
80+
81+
---
82+
83+
## 📬 Stay Updated with Our Newsletter!
84+
85+
**Get a FREE Data Science eBook** 📖 with 150+ essential lessons in Data Science when you subscribe to our newsletter! Stay in the loop with the latest tutorials, insights, and exclusive resources. [Subscribe now!](https://join.dailydoseofds.com)
86+
87+
[![Daily Dose of Data Science Newsletter](https://github.com/patchy631/ai-engineering/blob/main/resources/join_ddods.png)](https://join.dailydoseofds.com)
88+
89+
---
90+
91+
## Contribution
92+
93+
Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

0 commit comments

Comments
 (0)