-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Refactor GitHub RAG application: added URL validation, improved error… #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… handling, and enhanced logging. Introduced custom exception handling and streamlined repository processing logic. Updated chat functionality for better user experience.
WalkthroughThis pull request refactors and extends the GitHub RAG application functionality. The changes add a custom exception class and new functions to validate GitHub URLs, extract repository names, process repository data, and create a query engine. Additionally, the repository loading logic has been restructured to include a progress spinner, improved logging, and enhanced session state management. The chat input handling has been updated to catch and log processing errors, ensuring that exceptions are managed more gracefully throughout the application. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant A as App (app.py)
participant S as Spinner
participant L as Logger
U->>A: Submit GitHub URL
A->>A: validate_github_url(url)
A->>A: get_repo_name(url)
A->>S: Start progress spinner
A->>A: process_with_gitingets(github_url)
alt Processing Successful
A->>A: create_query_engine(content_path, repo_name)
A->>L: Log success message
A->>U: Return successful response
else Processing Fails
A->>A: Raise GitHubRAGError
A->>L: Log error message
A->>U: Return error response
end
sequenceDiagram
participant U as User
participant A as App (app.py)
participant L as Logger
U->>A: Submit Chat Input
A->>A: Process input with error handling
alt Processing Successful
A->>U: Return processed response
else Exception Occurs
A->>L: Log exception details
A->>U: Return error message
end
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
github-rag/app.py (5)
6-7: Remove unused imports fromtyping.Static analysis indicates that
OptionalandDictare never referenced in this file. Consider removing these if no future usage is planned.Here's a diff to address this:
- from typing import Optional, Dict, Any + from typing import Any🧰 Tools
🪛 Ruff (0.8.2)
6-6:
typing.Optionalimported but unusedRemove unused import
(F401)
6-6:
typing.Dictimported but unusedRemove unused import
(F401)
10-10: Remove unused importSettings.
Settingsappears unused according to static analysis. You can remove it to keep the import list clean.- from llama_index.core import Settings, PromptTemplate, VectorStoreIndex, SimpleDirectoryReader + from llama_index.core import PromptTemplate, VectorStoreIndex, SimpleDirectoryReader🧰 Tools
🪛 Ruff (0.8.2)
10-10:
llama_index.core.Settingsimported but unusedRemove unused import:
llama_index.core.Settings(F401)
22-23: Unused constants detected.Although defining constants for maximum repository size and supported repo types is forward-thinking, they appear unused in this file. Consider removing them or using them if planned for future validations.
29-32: Improve GitHub URL validation.Currently, the check only confirms the URL starts with "https://github.com/" or "http://github.com/". Optional improvement: incorporate a more robust parsing or pattern matching to ensure the URL references a valid GitHub repo.
132-160: Merge nested context managers.Lines 133–134 use nested
withstatements. You can simplify by merging them into a single statement for clarity.- with st.spinner("Processing your repository..."): - with tempfile.TemporaryDirectory() as temp_dir: + with st.spinner("Processing your repository..."), tempfile.TemporaryDirectory() as temp_dir:🧰 Tools
🪛 Ruff (0.8.2)
133-134: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
github-rag/app.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
github-rag/app.py (1)
github-rag/app_local.py (2)
reset_chat(31-34)process_with_gitingets(36-39)
🪛 Ruff (0.8.2)
github-rag/app.py
6-6: typing.Optional imported but unused
Remove unused import
(F401)
6-6: typing.Dict imported but unused
Remove unused import
(F401)
10-10: llama_index.core.Settings imported but unused
Remove unused import: llama_index.core.Settings
(F401)
38-38: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
49-49: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
60-60: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
100-100: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
133-134: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
🔇 Additional comments (13)
github-rag/app.py (13)
16-17: Logging configuration looks good.The logging setup with
INFOlevel andloggerinitialization is clear and standard.
25-28: Custom exception definition looks fine.The
GitHubRAGErrorclass clearly communicates application-specific errors.
102-109: Session state initialization looks correct.Storing a unique UUID and preparing caches is well-structured.
112-118: GitHub URL input and header are fine.The UI elements appear straightforward with a helpful placeholder and prompt.
120-120: Load repository button setup is clear.No issues found with this logic.
124-127: URL validation feedback.Positive approach to guard invalid URLs early. Looks good.
129-130: Repository name extraction is consistent.Handles creation of a unique key for caching with minimal overhead.
162-164: Exception handling for repository loading is appropriate.Catching unexpected errors and logging them is a good practice.
166-166: Section heading is well-noted.No issues identified here.
170-170: Header styling is acceptable.Clear indicator for the chat section.
172-173: Chat reset button usage is clean.Ties nicely to the
reset_chatfunction.
175-175: Comment for displaying chat history.Nothing problematic found.
181-226: Main chat logic is well-structured.The code gracefully handles user input, updates the session state, and uses the query engine. Adequate exception handling is present.
| def get_repo_name(url: str) -> str: | ||
| """Extract repository name from URL""" | ||
| try: | ||
| return url.split('/')[-1].replace('.git', '') | ||
| except Exception as e: | ||
| raise GitHubRAGError(f"Invalid repository URL: {str(e)}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Raise original traceback for clarity.
When re-raising an exception in line 38, add from e to preserve the original traceback context.
except Exception as e:
- raise GitHubRAGError(f"Invalid repository URL: {str(e)}")
+ raise GitHubRAGError(f"Invalid repository URL: {str(e)}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_repo_name(url: str) -> str: | |
| """Extract repository name from URL""" | |
| try: | |
| return url.split('/')[-1].replace('.git', '') | |
| except Exception as e: | |
| raise GitHubRAGError(f"Invalid repository URL: {str(e)}") | |
| def get_repo_name(url: str) -> str: | |
| """Extract repository name from URL""" | |
| try: | |
| return url.split('/')[-1].replace('.git', '') | |
| except Exception as e: | |
| raise GitHubRAGError(f"Invalid repository URL: {str(e)}") from e |
🧰 Tools
🪛 Ruff (0.8.2)
38-38: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
| def process_with_gitingets(github_url: str) -> tuple: | ||
| """Process GitHub repository using gitingest""" | ||
| try: | ||
| summary, tree, content = ingest(github_url) | ||
| if not all([summary, tree, content]): | ||
| raise GitHubRAGError("Failed to process repository: Missing data") | ||
| return summary, tree, content | ||
| except Exception as e: | ||
| logger.error(f"Error processing repository: {str(e)}") | ||
| raise GitHubRAGError(f"Failed to process repository: {str(e)}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Exception chaining recommended.
At line 60, re-throw your custom error with from e to chain the exception properly.
except Exception as e:
logger.error(f"Error processing repository: {str(e)}")
- raise GitHubRAGError(f"Failed to process repository: {str(e)}")
+ raise GitHubRAGError(f"Failed to process repository: {str(e)}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def process_with_gitingets(github_url: str) -> tuple: | |
| """Process GitHub repository using gitingest""" | |
| try: | |
| summary, tree, content = ingest(github_url) | |
| if not all([summary, tree, content]): | |
| raise GitHubRAGError("Failed to process repository: Missing data") | |
| return summary, tree, content | |
| except Exception as e: | |
| logger.error(f"Error processing repository: {str(e)}") | |
| raise GitHubRAGError(f"Failed to process repository: {str(e)}") | |
| def process_with_gitingets(github_url: str) -> tuple: | |
| """Process GitHub repository using gitingest""" | |
| try: | |
| summary, tree, content = ingest(github_url) | |
| if not all([summary, tree, content]): | |
| raise GitHubRAGError("Failed to process repository: Missing data") | |
| return summary, tree, content | |
| except Exception as e: | |
| logger.error(f"Error processing repository: {str(e)}") | |
| raise GitHubRAGError(f"Failed to process repository: {str(e)}") from e |
🧰 Tools
🪛 Ruff (0.8.2)
60-60: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
| """Reset chat session and clean up resources""" | ||
| try: | ||
| st.session_state.messages = [] | ||
| st.session_state.context = None | ||
| gc.collect() | ||
| logger.info("Chat session reset successfully") | ||
| except Exception as e: | ||
| logger.error(f"Error resetting chat: {str(e)}") | ||
| raise GitHubRAGError("Failed to reset chat session") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Raise from the original exception.
At line 49, consider chaining the original exception with from e to provide more debugging insight.
except Exception as e:
logger.error(f"Error resetting chat: {str(e)}")
- raise GitHubRAGError("Failed to reset chat session")
+ raise GitHubRAGError("Failed to reset chat session") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """Reset chat session and clean up resources""" | |
| try: | |
| st.session_state.messages = [] | |
| st.session_state.context = None | |
| gc.collect() | |
| logger.info("Chat session reset successfully") | |
| except Exception as e: | |
| logger.error(f"Error resetting chat: {str(e)}") | |
| raise GitHubRAGError("Failed to reset chat session") | |
| """Reset chat session and clean up resources""" | |
| try: | |
| st.session_state.messages = [] | |
| st.session_state.context = None | |
| gc.collect() | |
| logger.info("Chat session reset successfully") | |
| except Exception as e: | |
| logger.error(f"Error resetting chat: {str(e)}") | |
| raise GitHubRAGError("Failed to reset chat session") from e |
🧰 Tools
🪛 Ruff (0.8.2)
49-49: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
| def create_query_engine(content_path: str, repo_name: str) -> Any: | ||
| """Create and configure query engine""" | ||
| try: | ||
| loader = SimpleDirectoryReader(input_dir=content_path) | ||
| docs = loader.load_data() | ||
| node_parser = MarkdownNodeParser() | ||
| index = VectorStoreIndex.from_documents( | ||
| documents=docs, | ||
| transformations=[node_parser], | ||
| show_progress=True | ||
| ) | ||
|
|
||
| qa_prompt_tmpl_str = """ | ||
| You are an AI assistant specialized in analyzing GitHub repositories. | ||
| Repository structure: | ||
| {tree} | ||
| --------------------- | ||
| Context information from the repository: | ||
| {context_str} | ||
| --------------------- | ||
| Given the repository structure and context above, provide a clear and precise answer to the query. | ||
| Focus on the repository's content, code structure, and implementation details. | ||
| If the information is not available in the context, respond with 'I don't have enough information about that aspect of the repository.' | ||
| Query: {query_str} | ||
| Answer: """ | ||
|
|
||
| qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str) | ||
| query_engine = index.as_query_engine(streaming=True) | ||
| query_engine.update_prompts( | ||
| {"response_synthesizer:text_qa_template": qa_prompt_tmpl} | ||
| ) | ||
| return query_engine | ||
| except Exception as e: | ||
| logger.error(f"Error creating query engine: {str(e)}") | ||
| raise GitHubRAGError(f"Failed to create query engine: {str(e)}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve exception transparency.
At line 100, add from e to preserve traceback details when this custom exception is raised.
except Exception as e:
logger.error(f"Error creating query engine: {str(e)}")
- raise GitHubRAGError(f"Failed to create query engine: {str(e)}")
+ raise GitHubRAGError(f"Failed to create query engine: {str(e)}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def create_query_engine(content_path: str, repo_name: str) -> Any: | |
| """Create and configure query engine""" | |
| try: | |
| loader = SimpleDirectoryReader(input_dir=content_path) | |
| docs = loader.load_data() | |
| node_parser = MarkdownNodeParser() | |
| index = VectorStoreIndex.from_documents( | |
| documents=docs, | |
| transformations=[node_parser], | |
| show_progress=True | |
| ) | |
| qa_prompt_tmpl_str = """ | |
| You are an AI assistant specialized in analyzing GitHub repositories. | |
| Repository structure: | |
| {tree} | |
| --------------------- | |
| Context information from the repository: | |
| {context_str} | |
| --------------------- | |
| Given the repository structure and context above, provide a clear and precise answer to the query. | |
| Focus on the repository's content, code structure, and implementation details. | |
| If the information is not available in the context, respond with 'I don't have enough information about that aspect of the repository.' | |
| Query: {query_str} | |
| Answer: """ | |
| qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str) | |
| query_engine = index.as_query_engine(streaming=True) | |
| query_engine.update_prompts( | |
| {"response_synthesizer:text_qa_template": qa_prompt_tmpl} | |
| ) | |
| return query_engine | |
| except Exception as e: | |
| logger.error(f"Error creating query engine: {str(e)}") | |
| raise GitHubRAGError(f"Failed to create query engine: {str(e)}") | |
| def create_query_engine(content_path: str, repo_name: str) -> Any: | |
| """Create and configure query engine""" | |
| try: | |
| loader = SimpleDirectoryReader(input_dir=content_path) | |
| docs = loader.load_data() | |
| node_parser = MarkdownNodeParser() | |
| index = VectorStoreIndex.from_documents( | |
| documents=docs, | |
| transformations=[node_parser], | |
| show_progress=True | |
| ) | |
| qa_prompt_tmpl_str = """ | |
| You are an AI assistant specialized in analyzing GitHub repositories. | |
| Repository structure: | |
| {tree} | |
| --------------------- | |
| Context information from the repository: | |
| {context_str} | |
| --------------------- | |
| Given the repository structure and context above, provide a clear and precise answer to the query. | |
| Focus on the repository's content, code structure, and implementation details. | |
| If the information is not available in the context, respond with 'I don't have enough information about that aspect of the repository.' | |
| Query: {query_str} | |
| Answer: """ | |
| qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str) | |
| query_engine = index.as_query_engine(streaming=True) | |
| query_engine.update_prompts( | |
| {"response_synthesizer:text_qa_template": qa_prompt_tmpl} | |
| ) | |
| return query_engine | |
| except Exception as e: | |
| logger.error(f"Error creating query engine: {str(e)}") | |
| raise GitHubRAGError(f"Failed to create query engine: {str(e)}") from e |
🧰 Tools
🪛 Ruff (0.8.2)
100-100: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
… handling, and enhanced logging. Introduced custom exception handling and streamlined repository processing logic. Updated chat functionality for better user experience.
Summary by CodeRabbit
New Features
Bug Fixes