-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor LLM model selection and attack surface analysis #233
base: release/2.2.0
Are you sure you want to change the base?
Conversation
The changes involve refactoring the codebase to replace references to "GPT" with "LLM" for generating vulnerability reports and related functionalities. This includes renaming functions, classes, variables, and configuration settings to reflect the new terminology. The update affects various components such as tasks, views, models, templates, and configuration files across the application.
- Introduced a new modal for selecting LLM models for attack surface analysis, allowing users to choose from a list of available models. - Added functionality to update the selected model in the database before proceeding with analysis. - Implemented a new API endpoint to fetch available LLM models and the currently selected model. - Refactored the show_attack_surface_modal function to include model selection and handle errors more robustly. - Removed hardcoded LLM definitions and prompts from definitions.py and moved them to a new configuration file. - Added new classes and methods for managing LLM configurations and generating vulnerability reports and attack suggestions. - Updated tests to cover new functionalities and removed obsolete tests related to LLM attack suggestions.
Reviewer's Guide by SourceryThis PR enhances the LLM (Language Learning Model) integration by introducing a new model selection modal and implementing a more robust API endpoint. The changes include refactoring the GPT-specific code into a more generic LLM framework, improving error handling, and adding support for multiple LLM providers including OpenAI and Ollama. Updated Class Diagram for LLM Vulnerability ReportclassDiagram
class LLMVulnerabilityReport {
+String url_path
+String title
+Text description
+Text impact
+Text remediation
+Text references
+String formatted_description()
+String formatted_impact()
+String formatted_remediation()
+String formatted_references()
}
class Vulnerability {
+String name
+String http_url
+Text description
+Text impact
+Text remediation
+Text references
+boolean is_llm_used
+String formatted_description()
+String formatted_impact()
+String formatted_remediation()
+String formatted_references()
}
LLMVulnerabilityReport --|> Vulnerability : updates
note for LLMVulnerabilityReport "Replaces GPTVulnerabilityReport with LLM support"
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
- Enhanced logging in tasks.py to include both the title and path of vulnerabilities when exceptions occur. - Updated variable naming from llm to vuln for clarity in vulnerability handling. - Minor text punctuation corrections in custom.js to ensure consistency in user messages. - Remove unused import
- Improved the UI of the LLM toolkit by updating the layout and adding badges for model status and capabilities. - Refactored the model management logic to use a centralized API call for fetching model data, improving error handling and reducing code duplication. - Updated the model requirements configuration to enhance readability and consistency in the description of model capabilities. - Adjusted the modal size for displaying model options to provide a better user experience.
- Introduced a method to convert markdown to HTML with added Bootstrap classes in the LLMAttackSuggestion class. - Implemented HTML sanitization using DOMPurify in various JavaScript functions to prevent XSS vulnerabilities.
…on features - Updated the URL validation function to correctly escape backslashes in the regex pattern. - Enhanced the send_llm__attack_surface_api_request function to support additional parameters for model selection and analysis options. - Introduced new functions regenerateAttackSurface, deleteAttackSurfaceAnalysis, and showAttackSurfaceModal to manage attack surface analysis lifecycle, including regeneration and deletion. - Refactored the modal handling logic to improve user interaction for model selection and analysis display. - Added input validation for model names in the LLMAttackSuggestionGenerator class and updated the attack suggestion generation process to accommodate model-specific requests.
- Enhanced markdown rendering by adding new extensions for better list handling and definition lists support. - Improved HTML formatting by converting ordered lists to unordered and cleaning up line breaks. - Removed "Beta" label from the LLM Toolkit page title and modal dialog title in the UI.
- Replaced the get_vulnerability_llm_report function with llm_vulnerability_report for generating and storing vulnerability reports using LLM. - Enhanced the LLM vulnerability report generation process by splitting it into distinct sections: technical description, business impact, remediation steps, and references. - Updated the data model to store references as text fields instead of using a separate VulnerabilityReference model. - Improved the HTML rendering of vulnerability descriptions, impacts, and remediations by converting markdown to HTML with proper styling. - Refactored the LLM response handling to use a dictionary format for easier manipulation and storage. - Removed redundant code and streamlined the process of updating vulnerabilities with LLM-generated data. - Adjusted the configuration and prompts for LLM to support more detailed and structured report generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @psyray - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- Avoid exposing raw exception messages in API responses (link)
Overall Comments:
- Consider adding more comprehensive test coverage for the new LLM functionality, particularly around error handling and edge cases.
- The key LLM interface methods would benefit from more detailed docstrings documenting expected inputs, outputs, and error scenarios.
Here's what I looked at during the review
- 🟡 General issues: 3 issues found
- 🔴 Security: 1 blocking issue, 1 other issue
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if query.strip(): | ||
qs = qs & self.special_lookup(query.strip()) | ||
elif '|' in search_value: | ||
qs = Subdomain.objects.none() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Incorrect model class used for empty queryset initialization
This should be EndPoint.objects.none() to maintain type consistency with the rest of the method.
return Response({"dorks": serializer.data}) | ||
def get(self, request, format=None): | ||
req = self.request | ||
scan_id = safe_int_cast(req.query_params.get('scan_id')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Hoist repeated code outside conditional statement [×2] (
hoist-statement-from-if
)
scan_id = safe_int_cast(req.query_params.get('scan_id')) | ||
if scan_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression
)
scan_id = safe_int_cast(req.query_params.get('scan_id')) | |
if scan_id: | |
if scan_id := safe_int_cast(req.query_params.get('scan_id')): |
def get(self, request, format=None): | ||
req = self.request | ||
scan_id = safe_int_cast(req.query_params.get('scan_id')) | ||
type = req.query_params.get('type') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Don't assign to builtin variable type
(avoid-builtin-shadow
)
Explanation
Python has a number ofbuiltin
variables: functions and constants thatform a part of the language, such as
list
, getattr
, and type
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
list = [1, 2, 3]
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as integers
.
In a pinch, my_list
and similar names are colloquially-recognized
placeholders.
scan_id = safe_int_cast(req.query_params.get('scan_id')) | ||
if scan_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression
)
scan_id = safe_int_cast(req.query_params.get('scan_id')) | |
if scan_id: | |
if scan_id := safe_int_cast(req.query_params.get('scan_id')): |
subdomain_names = [] | ||
|
||
for id in subdomain_ids: | ||
subdomain_names.append(Subdomain.objects.get(id=id).name) | ||
for id in subdomain_ids: | ||
subdomain_names.append(Subdomain.objects.get(id=id).name) | ||
|
||
if subdomain_names: | ||
return Response({'status': True, "results": subdomain_names}) | ||
if subdomain_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): We've found these issues:
- Convert for loop into list comprehension (
list-comprehension
) - Use named expression to simplify assignment and conditional (
use-named-expression
)
subdomain_names = [] | |
for id in subdomain_ids: | |
subdomain_names.append(Subdomain.objects.get(id=id).name) | |
for id in subdomain_ids: | |
subdomain_names.append(Subdomain.objects.get(id=id).name) | |
if subdomain_names: | |
return Response({'status': True, "results": subdomain_names}) | |
if subdomain_names: | |
if subdomain_names := [ | |
Subdomain.objects.get(id=id).name for id in subdomain_ids | |
]: |
Too many sourcery review here. |
Uncommented the code responsible for generating technical, impact, and remediation sections in the LLMVulnerabilityReportGenerator class.
- Removed extensive data from targetApp.json, including historical IPs, related domains, registrars, domain registrations, WHOIS status, nameservers, DNS records, and domain info. - Updated auth.json to modify permission names and codenames, and removed several user and group entries. - Added new todo notes in recon_note.json and updated existing ones. - Updated dashboard.json to modify project details and remove unused settings. - Modified scanEngine.json to update YAML configurations and add a new interesting lookup model. - Enabled remote debugging in docker-compose.dev.yml. - Updated a test in test_vulnerability.py to patch a different method for generating vulnerability reports
Overview - Improved handling of CVE references by parsing string representations of arrays and generating appropriate HTML content. - Updated the template to handle references more flexibly, displaying them as a list or paragraph based on their format. - Converted markdown to HTML for various sections in the LLM vulnerability report. - Removed redundant code in the LLM vulnerability report generator. - Simplified the get_refs_str method in the Vulnerability model to return references directly.
Modified the llm_vulnerability_report function in tasks.py to convert the references field from a list to a single string before converting it to HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have time yet to do an extensive test, but this is what I initially found. Will continue tomorrow.
Adjusted the logic for displaying the dropdown menu in the LLM Toolkit settings to only show when a model is not selected.
- Introduced a new modal for adding and managing models in the LLM Toolkit, allowing users to view and select recommended models with detailed information on RAM requirements. - Enhanced the model download process with progress tracking and error handling, including the ability to cancel downloads. - Implemented a new API endpoint to fetch available models from the Ollama library, caching the results for improved performance. - Updated the server configuration to support streaming responses for model downloads, improving user feedback during long operations. - Added new recommended models to the configuration, providing descriptions and size options for each model.
- Introduced WebSocket support for real-time updates during model operations, including download and deletion processes. - Refactored JavaScript to enhance user interaction with model operations using SweetAlert2 for better UI feedback. - Added new API endpoints and views to handle WebSocket connections and model operations. - Updated server configuration to support ASGI and WebSocket connections, including changes to the entrypoint script and Nginx configuration. - Enhanced the Django application with Channels for WebSocket communication, including necessary package updates and settings adjustments.
- Added setup for OllamaSettings in TestOllamaManager to initialize test environment. - Enhanced test cases in TestOllamaManager to include additional mock setups and assertions for model management. - Refactored TestLLMVulnerabilityReport and TestLLMAttackSuggestion to improve test coverage and clarity, including renaming methods and updating mock responses. - Made minor adjustments to HTML templates and JavaScript comments for clarity and consistency. - Removed unnecessary selected field from OllamaDetailManager view defaults.
- Removed the "Cancel Download" button and its associated functionality from the llm_toolkit.html template. - Added a newline in the OllamaManager class in views.py for formatting consistency.
Updated the UI text to include the model name in the download progress popup.
Modified API endpoint usage in JavaScript to include the selected model in the URL path.
- Improved URL handling by encoding model names in API requests across multiple JavaScript files. - Modified the URL pattern in the Django application to support path parameters for model names.
Summary
Fixes #232
Enhance the LLM model selection process and attack surface analysis by introducing a new modal for model selection and implementing a new API endpoint. Refactor existing functions to support these changes and improve error handling. Update tests to reflect the new functionalities and remove outdated tests.
New Features:
Introduce a new modal for selecting LLM models for attack surface analysis, allowing users to choose from a list of available models.
Implement a new API endpoint to fetch available LLM models and the currently selected model.
Display a recommended list of models in the LLM Toolkit
Add a download progress bar during model download
Enhancements:
Tests:
Todo