-
-
Notifications
You must be signed in to change notification settings - Fork 743
Add Local RAG Document Q&A Agent Notebook #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Local RAG Document Q&A Agent Notebook #749
Conversation
|
Warning Rate limit exceeded@Dhivya-Bharathy has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 12 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThree new Jupyter notebooks are introduced, each demonstrating an advanced AI agent for a specific task: AI Enrollment Counselor for university admissions support, AI Data Analysis Agent for interactive data exploration and visualization, and Local RAG Document QA Agent for local document-based question answering using vector databases and local LLMs. Each notebook defines custom tools, helper functions, and integrates with relevant libraries for their respective domains. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Notebook
participant Agent/Tools
User->>Notebook: Uploads data / document or asks a question
Notebook->>Agent/Tools: Preprocesses input (data/documents/questions)
Agent/Tools->>Notebook: Returns processed data, analysis, or answer
Notebook->>User: Displays results, insights, or answers
Possibly related PRs
Suggested labels
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Dhivya-Bharathy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly expands the examples available for the PraisonAI framework by adding three new, distinct AI agent notebooks. These additions demonstrate the versatility of the framework in automating tasks ranging from university admissions support and comprehensive data analysis to local, privacy-focused document question-answering using Retrieval-Augmented Generation.
Highlights
- New AI Agent Notebooks: This pull request introduces three new example AI agent notebooks: an AI Enrollment Counselor, an AI Data Analysis Agent, and a Local RAG Document Q&A Agent, showcasing diverse applications of the PraisonAI framework.
- AI Enrollment Counselor: A new notebook (
AI_Enrollment_Counselor.ipynb) demonstrates an AI agent designed to automate university admissions, capable of answering applicant questions and validating application completeness using PraisonAI agents. - AI Data Analysis Agent: A new notebook (
ai_data_analysis_agent.ipynb) adds an intelligent agent for data analysis, supporting CSV/Excel file processing, statistical analysis, and automated visualization generation based on natural language queries. It includes custom tools for data visualization, preprocessing, and statistical analysis. - Local RAG Document Q&A Agent: A significant addition is the
local_rag_document_qa_agent.ipynbnotebook, which implements a Retrieval-Augmented Generation (RAG) agent. This agent processes multi-format documents (PDF, TXT, MD, CSV), performs intelligent text chunking, stores data in a local ChromaDB vector database, and answers questions using local Ollama LLM models, eliminating the need for external API calls for core RAG operations.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
examples/cookbooks/local_rag_document_qa_agent.ipynb (1)
49-49: Remove redundant PDF library dependencyBoth
pypdfandPyPDF2are installed, but they are essentially the same library. PyPDF2 is the older name, and pypdf is the newer version. Since the code importsPyPDF2(line 114), you should removepypdffrom the installation list to avoid confusion.-!pip install praisonai streamlit qdrant-client ollama pypdf PyPDF2 chromadb sentence-transformers +!pip install praisonai streamlit qdrant-client ollama PyPDF2 chromadb sentence-transformersexamples/cookbooks/ai_data_analysis_agent.ipynb (1)
177-178: Improve date column detection logicThe current date detection only checks if 'date' is in the lowercase column name. This might miss columns like "Created_At", "Timestamp", "DOB", etc.
-if 'date' in col.lower(): +if any(keyword in col.lower() for keyword in ['date', 'time', 'created', 'updated', 'dob', 'timestamp']): df[col] = pd.to_datetime(df[col], errors='coerce')
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/cookbooks/AI_Enrollment_Counselor.ipynb(1 hunks)examples/cookbooks/ai_data_analysis_agent.ipynb(1 hunks)examples/cookbooks/local_rag_document_qa_agent.ipynb(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
examples/cookbooks/AI_Enrollment_Counselor.ipynb (1)
Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-ts/.cursorrules:0-0
Timestamp: 2025-06-30T10:05:51.843Z
Learning: Applies to src/praisonai-ts/src/agents/autoagents.ts : The 'AutoAgents' class in 'src/agents/autoagents.ts' should provide high-level convenience for automatically generating agent/task configuration from user instructions, using 'aisdk' to parse config.
🔇 Additional comments (4)
examples/cookbooks/local_rag_document_qa_agent.ipynb (2)
82-93: Clarify API key configuration for local model usageThe notebook states it uses local Ollama models (line 93) but configures an OpenAI API key (lines 84-86). This is confusing since local models shouldn't require an OpenAI API key. Either:
- Remove the OpenAI API key setup if only using local models
- Clarify why the OpenAI key is needed despite using local models
- Make the key optional with proper documentation
120-202: LGTM! Well-structured document processing toolThe
DocumentProcessingToolclass is well-implemented with:
- Proper error handling for each file type
- Support for multiple formats
- Clean separation of processing logic per format
- Appropriate use of context managers for file handling
examples/cookbooks/ai_data_analysis_agent.ipynb (1)
180-183: Avoid bare except clauseUsing a bare
except:can hide unexpected errors and make debugging difficult.try: df[col] = pd.to_numeric(df[col]) -except (ValueError, TypeError): +except (ValueError, TypeError): passLikely an incorrect or invalid review comment.
examples/cookbooks/AI_Enrollment_Counselor.ipynb (1)
407-417: LGTM! Clean helper function implementationThe
ask_enrollment_agentfunction is well-structured with:
- Clear parameter handling
- Appropriate prompt construction for document checking
- Clean fallback for general queries
| " sentences = re.split(r'[.!?]+', text)\n", | ||
| " sentences = [s.strip() for s in sentences if s.strip()]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve sentence splitting to handle edge cases
The current regex pattern r'[.!?]+' for sentence splitting is too simplistic and will incorrectly split on:
- Abbreviations (Dr., Mr., Ms., etc.)
- Decimal numbers (3.14, 99.9)
- Ellipses (...)
Consider using a more robust sentence tokenizer like NLTK's sent_tokenize or spaCy.
-# Split into sentences first
-sentences = re.split(r'[.!?]+', text)
+# Use a more robust sentence splitter
+import nltk
+nltk.download('punkt', quiet=True)
+sentences = nltk.sent_tokenize(text)Or add the NLTK dependency to the installation:
-!pip install praisonai streamlit qdrant-client ollama PyPDF2 chromadb sentence-transformers
+!pip install praisonai streamlit qdrant-client ollama PyPDF2 chromadb sentence-transformers nltkCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In examples/cookbooks/local_rag_document_qa_agent.ipynb around lines 282 to 283,
the current sentence splitting using re.split(r'[.!?]+', text) is too simplistic
and fails on abbreviations, decimals, and ellipses. Replace this with a more
robust sentence tokenizer such as NLTK's sent_tokenize by importing it and using
sent_tokenize(text) instead. Also, update the installation instructions to
include NLTK as a dependency.
| " elif chart_type == 'heatmap':\n", | ||
| " corr_matrix = df.corr()\n", | ||
| " fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add validation for heatmap to prevent errors on non-numeric data
The heatmap visualization calls df.corr() without checking if the dataframe contains numeric columns. This will raise an error if all columns are non-numeric.
elif chart_type == 'heatmap':
+ numeric_cols = df.select_dtypes(include=[np.number]).columns
+ if len(numeric_cols) < 2:
+ return "Heatmap requires at least 2 numeric columns"
corr_matrix = df.corr()
fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| " elif chart_type == 'heatmap':\n", | |
| " corr_matrix = df.corr()\n", | |
| " fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu')\n", | |
| elif chart_type == 'heatmap': | |
| numeric_cols = df.select_dtypes(include=[np.number]).columns | |
| if len(numeric_cols) < 2: | |
| return "Heatmap requires at least 2 numeric columns" | |
| corr_matrix = df.corr() | |
| fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu') |
🤖 Prompt for AI Agents
In examples/cookbooks/ai_data_analysis_agent.ipynb around lines 136 to 138, the
code calls df.corr() for the heatmap without verifying if the dataframe has
numeric columns, which can cause errors if none exist. Add a validation step
before computing the correlation matrix to check if the dataframe contains any
numeric columns, and only proceed with the heatmap generation if numeric data is
present; otherwise, handle the case gracefully, such as by skipping the heatmap
or showing a warning.
| "import os\n", | ||
| "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\" # <-- Replace with your actual OpenAI API key" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add API key validation
The placeholder API key should be validated to ensure users replace it with their actual key.
import os
-os.environ["OPENAI_API_KEY"] = "sk-..." # <-- Replace with your actual OpenAI API key
+
+api_key = "sk-..." # <-- Replace with your actual OpenAI API key
+if api_key == "sk-..." or not api_key.startswith("sk-"):
+ raise ValueError("Please replace the placeholder with your actual OpenAI API key")
+os.environ["OPENAI_API_KEY"] = api_key📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "import os\n", | |
| "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\" # <-- Replace with your actual OpenAI API key" | |
| ] | |
| import os | |
| api_key = "sk-..." # <-- Replace with your actual OpenAI API key | |
| if api_key == "sk-..." or not api_key.startswith("sk-"): | |
| raise ValueError("Please replace the placeholder with your actual OpenAI API key") | |
| os.environ["OPENAI_API_KEY"] = api_key |
🤖 Prompt for AI Agents
In examples/cookbooks/AI_Enrollment_Counselor.ipynb around lines 67 to 69, the
code sets a placeholder OpenAI API key without validation. Add a check after
setting the environment variable to verify that the API key is not the
placeholder value. If it is, raise an error or print a clear message instructing
the user to replace the placeholder with their actual OpenAI API key before
proceeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request adds three new Jupyter notebooks implementing AI agents for enrollment counseling, data analysis, and local RAG document Q&A. The code includes custom tools for data processing, statistical analysis, vector database operations, and text chunking. The notebooks provide complete workflows from data ingestion to interactive Q&A with source attribution. I have provided feedback to improve error handling and input validation.
| "openai_key = \"sk-..\"\n", | ||
| "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " else:\n", | ||
| " return \"Unsupported chart type\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " else:\n", | ||
| " return None, None, None, \"Unsupported file format\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " results['high_correlations'] = self._find_high_correlations(results['correlation_matrix'])\n", | ||
| "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results dictionary is returned even if analysis_type does not match any of the handled types. Consider returning an error message or raising an exception to indicate that the analysis type is invalid.
return results
except Exception as e:
return {'error': f"Analysis error: {str(e)}"}
else:
return {'error': f"Invalid analysis type: {analysis_type}"}
| "openai_key = \"sk-..\"\n", | ||
| "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " else:\n", | ||
| " return {\"error\": f\"Unsupported file format: {file_ext}\"}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #749 +/- ##
=======================================
Coverage 14.23% 14.23%
=======================================
Files 25 25
Lines 2571 2571
Branches 367 367
=======================================
Hits 366 366
Misses 2189 2189
Partials 16 16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
User description
A Retrieval-Augmented Generation agent that processes documents and answers questions using local LLM models without external API calls.
Features include multi-format document processing (PDF/TXT/MD/CSV), intelligent text chunking, vector database storage with ChromaDB, and context-aware Q&A with source attribution.
Built with PraisonAI, supports document upload, similarity search, and provides accurate answers based on document content with local Ollama models.
PR Type
Enhancement
Description
• Added Local RAG Document Q&A Agent with comprehensive document processing capabilities (PDF/TXT/MD/CSV)
• Implemented AI Data Analysis Agent with interactive data visualization and statistical analysis tools
• Added AI Enrollment Counselor Agent for university admissions automation and document validation
• Introduced Intelligent Travel Planning Agent notebook (additional file)
• All agents built with PraisonAI framework supporting local LLM models via Ollama
• Features include vector database storage with ChromaDB, text chunking, similarity search, and context-aware responses
• Provides complete workflows from data ingestion to interactive Q&A with source attribution
Changes walkthrough 📝
ai_data_analysis_agent.ipynb
AI Data Analysis Agent Jupyter Notebook Implementationexamples/cookbooks/ai_data_analysis_agent.ipynb
• Added a complete Jupyter notebook implementing an AI Data Analysis
Agent
• Includes data preprocessing, visualization, and statistical
analysis tools
• Features file upload functionality, automatic chart
generation, and comprehensive data insights
• Provides interactive
data analysis capabilities with support for CSV/Excel files
local_rag_document_qa_agent.ipynb
Local RAG Document Q&A Agent Implementationexamples/cookbooks/local_rag_document_qa_agent.ipynb
• Added comprehensive Jupyter notebook implementing a Local RAG
Document Q&A Agent
• Includes custom tools for document processing
(PDF/TXT/MD/CSV), vector database operations with ChromaDB, and text
chunking
• Features interactive Q&A session with document upload,
vector search, and AI-powered responses using local Ollama models
•
Provides complete workflow from document ingestion to question
answering with source attribution
AI_Enrollment_Counselor.ipynb
AI Enrollment Counselor Agent for University Admissionsexamples/cookbooks/AI_Enrollment_Counselor.ipynb
• Added Jupyter notebook for AI Enrollment Counselor agent for
university admissions automation
• Implements document validation,
application completeness checking, and personalized guidance
• Uses
PraisonAI Agents framework with role-based prompting for admissions
counseling
• Includes examples for missing document detection and
general admissions questions
Summary by CodeRabbit