Add Local RAG Document Q&A Agent Notebook #749

Dhivya-Bharathy · 2025-07-08T05:21:08Z

User description

A Retrieval-Augmented Generation agent that processes documents and answers questions using local LLM models without external API calls.
Features include multi-format document processing (PDF/TXT/MD/CSV), intelligent text chunking, vector database storage with ChromaDB, and context-aware Q&A with source attribution.
Built with PraisonAI, supports document upload, similarity search, and provides accurate answers based on document content with local Ollama models.

PR Type

Enhancement

Description

• Added Local RAG Document Q&A Agent with comprehensive document processing capabilities (PDF/TXT/MD/CSV)
• Implemented AI Data Analysis Agent with interactive data visualization and statistical analysis tools
• Added AI Enrollment Counselor Agent for university admissions automation and document validation
• Introduced Intelligent Travel Planning Agent notebook (additional file)
• All agents built with PraisonAI framework supporting local LLM models via Ollama
• Features include vector database storage with ChromaDB, text chunking, similarity search, and context-aware responses
• Provides complete workflows from data ingestion to interactive Q&A with source attribution

Changes walkthrough 📝

Relevant files

Enhancement

ai_data_analysis_agent.ipynb `AI Data Analysis Agent Jupyter Notebook Implementation` examples/cookbooks/ai_data_analysis_agent.ipynb • Added a complete Jupyter notebook implementing an AI Data Analysis Agent • Includes data preprocessing, visualization, and statistical analysis tools • Features file upload functionality, automatic chart generation, and comprehensive data insights • Provides interactive data analysis capabilities with support for CSV/Excel files	+1032/-0
local_rag_document_qa_agent.ipynb `Local RAG Document Q&A Agent Implementation` examples/cookbooks/local_rag_document_qa_agent.ipynb • Added comprehensive Jupyter notebook implementing a Local RAG Document Q&A Agent • Includes custom tools for document processing (PDF/TXT/MD/CSV), vector database operations with ChromaDB, and text chunking • Features interactive Q&A session with document upload, vector search, and AI-powered responses using local Ollama models • Provides complete workflow from document ingestion to question answering with source attribution	+922/-0
AI_Enrollment_Counselor.ipynb `AI Enrollment Counselor Agent for University Admissions` examples/cookbooks/AI_Enrollment_Counselor.ipynb • Added Jupyter notebook for AI Enrollment Counselor agent for university admissions automation • Implements document validation, application completeness checking, and personalized guidance • Uses PraisonAI Agents framework with role-based prompting for admissions counseling • Includes examples for missing document detection and general admissions questions	+444/-0

Additional files

intelligent_travel_planning_agent.ipynb	+3939/-0

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

Summary by CodeRabbit

New Features
- Added an AI Enrollment Counselor notebook that assists with university admissions by answering applicant questions, checking document completeness, and providing personalized guidance.
- Introduced an AI Data Analysis Agent notebook enabling interactive data analysis, preprocessing, statistical insights, and visualizations for uploaded datasets.
- Added a Local RAG Document QA Agent notebook for document-based question answering using local LLMs and vector databases, supporting multiple document formats and interactive Q&A.

coderabbitai · 2025-07-08T05:21:14Z

Warning

Rate limit exceeded

@Dhivya-Bharathy has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 12 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 643c291 and e501da8.

📒 Files selected for processing (1)

examples/cookbooks/AI_Enrollment_Counselor.ipynb (2 hunks)

Walkthrough

Three new Jupyter notebooks are introduced, each demonstrating an advanced AI agent for a specific task: AI Enrollment Counselor for university admissions support, AI Data Analysis Agent for interactive data exploration and visualization, and Local RAG Document QA Agent for local document-based question answering using vector databases and local LLMs. Each notebook defines custom tools, helper functions, and integrates with relevant libraries for their respective domains.

Changes

File(s)	Change Summary
examples/cookbooks/AI_Enrollment_Counselor.ipynb	New notebook: Implements an AI Enrollment Counselor agent for university admissions Q&A, document completeness checks, and applicant guidance. Adds helper function `ask_enrollment_agent`.
examples/cookbooks/ai_data_analysis_agent.ipynb	New notebook: Implements an AI Data Analysis Agent with tools for data preprocessing, statistical analysis, and visualization. Adds classes `DataVisualizationTool`, `DataPreprocessingTool`, and `StatisticalAnalysisTool` with relevant methods for analysis and visualization.
examples/cookbooks/local_rag_document_qa_agent.ipynb	New notebook: Implements a Local RAG Document QA Agent for document-based Q&A using local LLMs and ChromaDB. Adds classes `DocumentProcessingTool`, `VectorDatabaseTool`, and `TextChunkingTool` for document ingestion, vector storage, and text chunking, with associated methods for each.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Notebook
    participant Agent/Tools

    User->>Notebook: Uploads data / document or asks a question
    Notebook->>Agent/Tools: Preprocesses input (data/documents/questions)
    Agent/Tools->>Notebook: Returns processed data, analysis, or answer
    Notebook->>User: Displays results, insights, or answers

Possibly related PRs

Add Ai Enrollment Counselor Notebook #732: Adds the same AI Enrollment Counselor notebook with identical functionality and helper function for university admissions assistance.

Suggested labels

Review effort 4/5, Possible security concern

Poem

In the garden of code, three seeds were sown,
AI agents for tasks, each skillfully grown.
Data to analyze, documents to scan,
Enrollment advice for every young fan.
With paws on the keyboard, this rabbit delights—
New tools for your notebooks, and data insights!
🐰📊📄

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gemini-code-assist

Summary of Changes

Hello @Dhivya-Bharathy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the examples available for the PraisonAI framework by adding three new, distinct AI agent notebooks. These additions demonstrate the versatility of the framework in automating tasks ranging from university admissions support and comprehensive data analysis to local, privacy-focused document question-answering using Retrieval-Augmented Generation.

Highlights

New AI Agent Notebooks: This pull request introduces three new example AI agent notebooks: an AI Enrollment Counselor, an AI Data Analysis Agent, and a Local RAG Document Q&A Agent, showcasing diverse applications of the PraisonAI framework.
AI Enrollment Counselor: A new notebook (AI_Enrollment_Counselor.ipynb) demonstrates an AI agent designed to automate university admissions, capable of answering applicant questions and validating application completeness using PraisonAI agents.
AI Data Analysis Agent: A new notebook (ai_data_analysis_agent.ipynb) adds an intelligent agent for data analysis, supporting CSV/Excel file processing, statistical analysis, and automated visualization generation based on natural language queries. It includes custom tools for data visualization, preprocessing, and statistical analysis.
Local RAG Document Q&A Agent: A significant addition is the local_rag_document_qa_agent.ipynb notebook, which implements a Retrieval-Augmented Generation (RAG) agent. This agent processes multi-format documents (PDF, TXT, MD, CSV), performs intelligent text chunking, stores data in a local ChromaDB vector database, and answers questions using local Ollama LLM models, eliminating the need for external API calls for core RAG operations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

qodo-merge-pro · 2025-07-08T05:21:51Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns Sensitive information exposure: The notebook contains a hardcoded API key placeholder pattern "sk-.." on line 85 which could encourage users to hardcode actual API keys directly in the notebook. This creates a risk of accidental exposure in version control or shared notebooks. Additionally, the file upload functionality processes user files without explicit validation of file contents, which could potentially be exploited if malicious files are uploaded.
⚡ Recommended focus areas for review Security Risk Hardcoded API key placeholder "sk-.." is exposed in the notebook code, which could lead to accidental exposure of real API keys when users copy this pattern. This should use environment variables or secure input methods. `"openai_key = \"sk-..\"\n", "\n",` Error Handling Multiple try-catch blocks have generic exception handling that may mask important errors. The statistical analysis and visualization creation could fail silently or provide unclear error messages to users. " try:\n", " results = {}\n", "\n", " if analysis_type == 'descriptive':\n", " results['summary'] = df.describe()\n", " results['info'] = {\n", " 'rows': len(df),\n", " 'columns': len(df.columns),\n", " 'missing_values': df.isnull().sum().sum(),\n", " 'duplicates': len(df[df.duplicated()])\n", " }\n", "\n", " elif analysis_type == 'correlation':\n", " numeric_df = df.select_dtypes(include=[np.number])\n", " if len(numeric_df.columns) > 1:\n", " results['correlation_matrix'] = numeric_df.corr()\n", " results['high_correlations'] = self._find_high_correlations(results['correlation_matrix'])\n", "\n", " elif analysis_type == 'outliers':\n", " numeric_df = df.select_dtypes(include=[np.number])\n", " results['outliers'] = self._detect_outliers(numeric_df)\n", "\n", " elif analysis_type == 'trends':\n", " date_cols = df.select_dtypes(include=['datetime64']).columns\n", " if len(date_cols) > 0:\n", " results['time_series'] = self._analyze_trends(df, date_cols[0])\n", "\n", " return results\n", " except Exception as e:\n", " return {'error': f\"Analysis error: {str(e)}\"}\n", Resource Management Temporary files are created but there's no explicit cleanup mechanism. The tempfile.NamedTemporaryFile is created with delete=False but no cleanup code is provided, potentially leading to disk space issues. `" # Create temporary file\n", " with tempfile.NamedTemporaryFile(delete=False, suffix=\".csv\") as temp_file:\n", " temp_path = temp_file.name\n", " df.to_csv(temp_path, index=False, quoting=csv.QUOTE_ALL)\n",`

qodo-merge-pro · 2025-07-08T05:22:49Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Security	Secure API key handling Hardcoded API keys in notebooks pose a security risk as they can be accidentally committed to version control. Use environment variables or secure input methods instead to protect sensitive credentials. examples/cookbooks/intelligent_travel_planning_agent.ipynb [74-75] -OPENAI_API_KEY = "sk-..." # <-- Replace with your OpenAI API key -SERP_API_KEY = "..." # <-- Replace with your SerpAPI key (optional) +import getpass +# Secure way to input API keys +OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API key: ") +SERP_API_KEY = getpass.getpass("Enter your SerpAPI key (optional): ") + `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a security risk by pointing out that hardcoding API key placeholders encourages bad practice, and it proposes a more secure method using `getpass`.	Medium
	Secure API key handling The hardcoded API key placeholder should be replaced with a secure method to obtain the key. Consider using environment variables or user input to avoid exposing sensitive credentials in the notebook. examples/cookbooks/local_rag_document_qa_agent.ipynb [84-86] -openai_key = "sk-.." +import getpass +# Get API key securely +openai_key = getpass.getpass("Enter your OpenAI API key: ") os.environ["OPENAI_API_KEY"] = openai_key `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a hardcoded placeholder API key and proposes a more secure method using `getpass`, which is a best practice for example notebooks.	Medium
	Secure API key input The hardcoded API key placeholder creates a security risk and should be replaced with a secure input method. Using `getpass` prevents the key from being visible in the notebook output. examples/cookbooks/AI_Enrollment_Counselor.ipynb [68] -os.environ["OPENAI_API_KEY"] = "sk-..." # <-- Replace with your actual OpenAI API key +import getpass +os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ") `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out the security risk of a hardcoded placeholder API key and recommends using `getpass` for secure input, which is a good practice for example code.	Medium
	Replace hardcoded API key The hardcoded API key should be replaced with a placeholder or environment variable reference. Exposing actual API keys in code examples poses a security risk and may lead to unauthorized usage. examples/cookbooks/ai_data_analysis_agent.ipynb [85] -openai_key = "sk-.." +openai_key = "your-openai-api-key-here" `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly points out a security risk with hardcoded keys, and replacing the placeholder `sk-..` with a more explicit one improves clarity for users of this example notebook.	Low
General	Handle NaN values properly The regex replacement modifies the original DataFrame in-place without handling potential NaN values properly. This could cause issues when NaN values are converted to string "nan" before replacement. examples/cookbooks/ai_data_analysis_agent.ipynb [172-173] for col in df.select_dtypes(include=['object']): - df[col] = df[col].astype(str).replace({r'\"': '\"\"'}, regex=True) + df[col] = df[col].fillna('').astype(str).replace({r'\"': '\"\"'}, regex=True) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that converting columns with `NaN` values to string can lead to the literal string "nan", and the proposed change to use `fillna('')` is a robust way to prevent this.	Medium
	Handle file cleanup errors The temporary file cleanup should be wrapped in a try-except block to handle potential file deletion errors gracefully. This prevents the application from crashing if the file is already deleted or locked. examples/cookbooks/local_rag_document_qa_agent.ipynb [775-776] # Clean up temp file -os.unlink(temp_path) +try: + os.unlink(temp_path) +except OSError: + pass # File already deleted or inaccessible `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: This is a valid suggestion to improve robustness by adding error handling for file deletion, preventing potential crashes if the temporary file cannot be unlinked.	Low
	Add temporary file cleanup The temporary file is created with `delete=False` but there's no cleanup mechanism to remove it later. This can lead to accumulation of temporary files and potential disk space issues. examples/cookbooks/ai_data_analysis_agent.ipynb [186-188] with tempfile.NamedTemporaryFile(delete=False, suffix=".csv") as temp_file: temp_path = temp_file.name df.to_csv(temp_path, index=False, quoting=csv.QUOTE_ALL) +# Note: Remember to clean up temp_path when no longer needed +# os.unlink(temp_path) + `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies a potential resource leak from temporary files not being deleted, which is a valid concern for robust code, although the improved code only adds a comment.	Low
Update

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

examples/cookbooks/local_rag_document_qa_agent.ipynb (1)
49-49: Remove redundant PDF library dependency

Both pypdf and PyPDF2 are installed, but they are essentially the same library. PyPDF2 is the older name, and pypdf is the newer version. Since the code imports PyPDF2 (line 114), you should remove pypdf from the installation list to avoid confusion.
-!pip install praisonai streamlit qdrant-client ollama pypdf PyPDF2 chromadb sentence-transformers
+!pip install praisonai streamlit qdrant-client ollama PyPDF2 chromadb sentence-transformers
examples/cookbooks/ai_data_analysis_agent.ipynb (1)
177-178: Improve date column detection logic

The current date detection only checks if 'date' is in the lowercase column name. This might miss columns like "Created_At", "Timestamp", "DOB", etc.
-if 'date' in col.lower():
+if any(keyword in col.lower() for keyword in ['date', 'time', 'created', 'updated', 'dob', 'timestamp']):
     df[col] = pd.to_datetime(df[col], errors='coerce')

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7638a17 and 643c291.

📒 Files selected for processing (3)

examples/cookbooks/AI_Enrollment_Counselor.ipynb (1 hunks)
examples/cookbooks/ai_data_analysis_agent.ipynb (1 hunks)
examples/cookbooks/local_rag_document_qa_agent.ipynb (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

examples/cookbooks/AI_Enrollment_Counselor.ipynb (1)

Learnt from: CR
PR: MervinPraison/PraisonAI#0
File: src/praisonai-ts/.cursorrules:0-0
Timestamp: 2025-06-30T10:05:51.843Z
Learning: Applies to src/praisonai-ts/src/agents/autoagents.ts : The 'AutoAgents' class in 'src/agents/autoagents.ts' should provide high-level convenience for automatically generating agent/task configuration from user instructions, using 'aisdk' to parse config.

🔇 Additional comments (4)

examples/cookbooks/local_rag_document_qa_agent.ipynb (2)

82-93: Clarify API key configuration for local model usage

The notebook states it uses local Ollama models (line 93) but configures an OpenAI API key (lines 84-86). This is confusing since local models shouldn't require an OpenAI API key. Either:

Remove the OpenAI API key setup if only using local models

Clarify why the OpenAI key is needed despite using local models

Make the key optional with proper documentation

120-202: LGTM! Well-structured document processing tool

The DocumentProcessingTool class is well-implemented with:

Proper error handling for each file type

Support for multiple formats

Clean separation of processing logic per format

Appropriate use of context managers for file handling
examples/cookbooks/ai_data_analysis_agent.ipynb (1)

180-183: Avoid bare except clause

Using a bare except: can hide unexpected errors and make debugging difficult.
 try:
     df[col] = pd.to_numeric(df[col])
-except (ValueError, TypeError):
+except (ValueError, TypeError):
     pass
Likely an incorrect or invalid review comment.
examples/cookbooks/AI_Enrollment_Counselor.ipynb (1)

407-417: LGTM! Clean helper function implementation

The ask_enrollment_agent function is well-structured with:

Clear parameter handling

Appropriate prompt construction for document checking

Clean fallback for general queries

coderabbitai · 2025-07-08T05:24:46Z

examples/cookbooks/local_rag_document_qa_agent.ipynb

+    "            sentences = re.split(r'[.!?]+', text)\n",
+    "            sentences = [s.strip() for s in sentences if s.strip()]\n",


🛠️ Refactor suggestion

Improve sentence splitting to handle edge cases

The current regex pattern r'[.!?]+' for sentence splitting is too simplistic and will incorrectly split on:

Abbreviations (Dr., Mr., Ms., etc.)

Decimal numbers (3.14, 99.9)

Ellipses (...)

Consider using a more robust sentence tokenizer like NLTK's sent_tokenize or spaCy.

-# Split into sentences first -sentences = re.split(r'[.!?]+', text) +# Use a more robust sentence splitter +import nltk +nltk.download('punkt', quiet=True) +sentences = nltk.sent_tokenize(text)

Or add the NLTK dependency to the installation:

-!pip install praisonai streamlit qdrant-client ollama PyPDF2 chromadb sentence-transformers +!pip install praisonai streamlit qdrant-client ollama PyPDF2 chromadb sentence-transformers nltk

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In examples/cookbooks/local_rag_document_qa_agent.ipynb around lines 282 to 283, the current sentence splitting using re.split(r'[.!?]+', text) is too simplistic and fails on abbreviations, decimals, and ellipses. Replace this with a more robust sentence tokenizer such as NLTK's sent_tokenize by importing it and using sent_tokenize(text) instead. Also, update the installation instructions to include NLTK as a dependency.

coderabbitai · 2025-07-08T05:24:46Z

examples/cookbooks/ai_data_analysis_agent.ipynb

+    "            elif chart_type == 'heatmap':\n",
+    "                corr_matrix = df.corr()\n",
+    "                fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu')\n",


⚠️ Potential issue

Add validation for heatmap to prevent errors on non-numeric data

The heatmap visualization calls df.corr() without checking if the dataframe contains numeric columns. This will raise an error if all columns are non-numeric.

elif chart_type == 'heatmap': + numeric_cols = df.select_dtypes(include=[np.number]).columns + if len(numeric_cols) < 2: + return "Heatmap requires at least 2 numeric columns" corr_matrix = df.corr() fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

" elif chart_type == 'heatmap':\n",

" corr_matrix = df.corr()\n",

" fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu')\n",

elif chart_type == 'heatmap':

numeric_cols = df.select_dtypes(include=[np.number]).columns

if len(numeric_cols) < 2:

return "Heatmap requires at least 2 numeric columns"

corr_matrix = df.corr()

fig = px.imshow(corr_matrix, title=title, color_continuous_scale='RdBu')

🤖 Prompt for AI Agents

In examples/cookbooks/ai_data_analysis_agent.ipynb around lines 136 to 138, the code calls df.corr() for the heatmap without verifying if the dataframe has numeric columns, which can cause errors if none exist. Add a validation step before computing the correlation matrix to check if the dataframe contains any numeric columns, and only proceed with the heatmap generation if numeric data is present; otherwise, handle the case gracefully, such as by skipping the heatmap or showing a warning.

coderabbitai · 2025-07-08T05:24:46Z

examples/cookbooks/AI_Enrollment_Counselor.ipynb

+    "import os\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"  # <-- Replace with your actual OpenAI API key"
+   ]


🛠️ Refactor suggestion

Add API key validation

The placeholder API key should be validated to ensure users replace it with their actual key.

import os -os.environ["OPENAI_API_KEY"] = "sk-..." # <-- Replace with your actual OpenAI API key + +api_key = "sk-..." # <-- Replace with your actual OpenAI API key +if api_key == "sk-..." or not api_key.startswith("sk-"): + raise ValueError("Please replace the placeholder with your actual OpenAI API key") +os.environ["OPENAI_API_KEY"] = api_key

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"import os\n",

"os.environ[\"OPENAI_API_KEY\"] = \"sk-...\" # <-- Replace with your actual OpenAI API key"

]

import os

api_key = "sk-..." # <-- Replace with your actual OpenAI API key

if api_key == "sk-..." or not api_key.startswith("sk-"):

raise ValueError("Please replace the placeholder with your actual OpenAI API key")

os.environ["OPENAI_API_KEY"] = api_key

🤖 Prompt for AI Agents

In examples/cookbooks/AI_Enrollment_Counselor.ipynb around lines 67 to 69, the code sets a placeholder OpenAI API key without validation. Add a check after setting the environment variable to verify that the API key is not the placeholder value. If it is, raise an error or print a clear message instructing the user to replace the placeholder with their actual OpenAI API key before proceeding.

gemini-code-assist

Code Review

The pull request adds three new Jupyter notebooks implementing AI agents for enrollment counseling, data analysis, and local RAG document Q&A. The code includes custom tools for data processing, statistical analysis, vector database operations, and text chunking. The notebooks provide complete workflows from data ingestion to interactive Q&A with source attribution. I have provided feedback to improve error handling and input validation.

gemini-code-assist · 2025-07-08T05:27:52Z

examples/cookbooks/ai_data_analysis_agent.ipynb

+    "openai_key = \"sk-..\"\n",
+    "\n",


Consider adding a check to ensure openai_key is not empty before setting the environment variable. This will prevent potential errors if the key is not properly configured.

if openai_key: os.environ["OPENAI_API_KEY"] = openai_key

gemini-code-assist · 2025-07-08T05:27:52Z

examples/cookbooks/ai_data_analysis_agent.ipynb

+    "            else:\n",
+    "                return \"Unsupported chart type\"\n",


Consider raising an exception with a more descriptive error message to provide better feedback to the user when an unsupported chart type is specified.

else: raise ValueError("Unsupported chart type: {}".format(chart_type))

gemini-code-assist · 2025-07-08T05:27:52Z

examples/cookbooks/ai_data_analysis_agent.ipynb

+    "            else:\n",
+    "                return None, None, None, \"Unsupported file format\"\n",


Consider raising an exception with a more descriptive error message to provide better feedback to the user when an unsupported file format is specified.

else: raise ValueError("Unsupported file format: {}".format(file.name.split('.')[-1]))

gemini-code-assist · 2025-07-08T05:27:52Z

examples/cookbooks/ai_data_analysis_agent.ipynb

+    "                    results['high_correlations'] = self._find_high_correlations(results['correlation_matrix'])\n",
+    "\n",


The results dictionary is returned even if analysis_type does not match any of the handled types. Consider returning an error message or raising an exception to indicate that the analysis type is invalid.

return results except Exception as e: return {'error': f"Analysis error: {str(e)}"} else: return {'error': f"Invalid analysis type: {analysis_type}"}

gemini-code-assist · 2025-07-08T05:27:53Z

examples/cookbooks/local_rag_document_qa_agent.ipynb

+    "openai_key = \"sk-..\"\n",
+    "\n",


Consider adding a check to ensure openai_key is not empty before setting the environment variable. This will prevent potential errors if the key is not properly configured.

if openai_key: os.environ["OPENAI_API_KEY"] = openai_key

gemini-code-assist · 2025-07-08T05:27:53Z

examples/cookbooks/local_rag_document_qa_agent.ipynb

+    "            else:\n",
+    "                return {\"error\": f\"Unsupported file format: {file_ext}\"}\n",


Consider raising an exception with a more descriptive error message to provide better feedback to the user when an unsupported file format is specified.

else: raise ValueError("Unsupported file format: {}".format(file_ext))

codecov · 2025-07-08T06:18:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 14.23%. Comparing base (a80bc74) to head (e501da8).
Report is 35 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #749   +/-   ##
=======================================
  Coverage   14.23%   14.23%           
=======================================
  Files          25       25           
  Lines        2571     2571           
  Branches      367      367           
=======================================
  Hits          366      366           
  Misses       2189     2189           
  Partials       16       16

Flag	Coverage Δ
quick-validation	`0.00% <ø> (ø)`
unit-tests	`14.23% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Dhivya-Bharathy and others added 4 commits July 4, 2025 17:20

Add files via upload

c1b37f3

Add files via upload

333eccf

Add files via upload

e0510b2

Add local_rag_document_qa_agent notebook

643c291

gemini-code-assist bot reviewed Jul 8, 2025

View reviewed changes

qodo-merge-pro bot added Review effort 4/5 Possible security concern labels Jul 8, 2025

coderabbitai bot reviewed Jul 8, 2025

View reviewed changes

gemini-code-assist bot reviewed Jul 8, 2025

View reviewed changes

Merge branch 'main' into add-local_rag_document_qa_agent

e501da8

Dhivya-Bharathy closed this Jul 8, 2025

		" sentences = re.split(r'[.!?]+', text)\n",
		" sentences = [s.strip() for s in sentences if s.strip()]\n",

-    "import os\n",
-    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"  # <-- Replace with your actual OpenAI API key"
-   ]
+import os
+api_key = "sk-..."  # <-- Replace with your actual OpenAI API key
+if api_key == "sk-..." or not api_key.startswith("sk-"):
+    raise ValueError("Please replace the placeholder with your actual OpenAI API key")
+os.environ["OPENAI_API_KEY"] = api_key

		" else:\n",
		" return None, None, None, \"Unsupported file format\"\n",

		" results['high_correlations'] = self._find_high_correlations(results['correlation_matrix'])\n",
		"\n",

		" else:\n",
		" return {\"error\": f\"Unsupported file format: {file_ext}\"}\n",

Uh oh!

Add Local RAG Document Q&A Agent Notebook #749

Add Local RAG Document Q&A Agent Notebook #749

Uh oh!

Conversation

Dhivya-Bharathy commented Jul 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

qodo-merge-pro bot commented Jul 8, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Dhivya-Bharathy commented Jul 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

qodo-merge-pro bot commented Jul 8, 2025 •

edited

Loading

codecov bot commented Jul 8, 2025 •

edited

Loading