-
Notifications
You must be signed in to change notification settings - Fork 1.3k
External data loader class #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, we need to ensure that no sensitive information is exposed to the user, even if an error message matches one of the predefined patterns. The best approach is to replace all error messages returned to the user with generic, non-sensitive messages, while logging the full error details on the server for debugging purposes. This ensures that sensitive information, such as stack traces or internal server details, is not leaked to the client.
Specifically:
- Modify the
sanitize_db_error_messagefunction to return only generic error messages to the user, regardless of the error type. - Log the full error message on the server for debugging purposes.
- Update the
analyze_table,data_loader_list_data_loaders, anddata_loader_list_tablesfunctions to use the updated sanitization logic.
-
Copy modified line R692 -
Copy modified lines R694-R695 -
Copy modified lines R715-R716 -
Copy modified lines R745-R746
| @@ -690,32 +690,7 @@ | ||
|
|
||
| # Define patterns for known safe errors | ||
| safe_error_patterns = { | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| } | ||
|
|
||
| # Check if error matches any safe pattern | ||
| for pattern, (safe_msg, status_code) in safe_error_patterns.items(): | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
|
|
||
| # Log the full error for debugging | ||
| logger.error(f"Unexpected error occurred: {error_msg}") | ||
| logger.error(f"Error occurred: {error_msg}") | ||
|
|
||
| # Return a generic error message for unknown errors | ||
| return "An unexpected error occurred", 500 | ||
| # Return a generic error message for all errors | ||
| return "An internal error has occurred. Please contact support if the issue persists.", 500 | ||
|
|
||
| @@ -739,4 +714,4 @@ | ||
| "status": "error", | ||
| "message": safe_msg | ||
| }), status_code | ||
| "message": "An internal error has occurred. Please contact support if the issue persists." | ||
| }), 500 | ||
|
|
||
| @@ -769,4 +744,4 @@ | ||
| "status": "error", | ||
| "message": safe_msg | ||
| }), status_code | ||
| "message": "An internal error has occurred. Please contact support if the issue persists." | ||
| }), 500 | ||
|
|
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, we will modify the sanitize_db_error_message function to ensure that no sensitive information is exposed to the user. Instead of returning error messages derived from the exception, we will always return a generic error message to the user. The detailed error message and stack trace will be logged on the server for debugging purposes. This approach ensures that no sensitive information is leaked while still allowing developers to diagnose issues.
Changes to be made:
- Update the
sanitize_db_error_messagefunction to always return a generic error message to the user, regardless of the exception. - Log the full error message and stack trace on the server for debugging purposes.
-
Copy modified line R689 -
Copy modified line R691
| @@ -687,34 +687,6 @@ | ||
| """ | ||
| # Convert error to string | ||
| error_msg = str(error) | ||
|
|
||
| # Define patterns for known safe errors | ||
| safe_error_patterns = { | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| } | ||
|
|
||
| # Check if error matches any safe pattern | ||
| for pattern, (safe_msg, status_code) in safe_error_patterns.items(): | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
|
|
||
| # Log the full error for debugging | ||
| logger.error(f"Unexpected error occurred: {error_msg}") | ||
| logger.error(f"Error occurred: {str(error)}", exc_info=True) | ||
|
|
||
| # Return a generic error message for unknown errors | ||
| # Always return a generic error message to the user | ||
| return "An unexpected error occurred", 500 |
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To address the issue, we will modify the sanitize_db_error_message function to ensure that no sensitive information is exposed to the user, even for known error patterns. Instead of returning the original error message for matched patterns, we will return predefined, generic messages for each pattern. This approach eliminates the risk of exposing sensitive details while still providing meaningful feedback to the user.
Additionally, we will ensure that all error messages are logged on the server for debugging purposes, but only sanitized messages are sent to the client.
-
Copy modified lines R694-R695 -
Copy modified lines R697-R700 -
Copy modified lines R703-R704 -
Copy modified lines R707-R708
| @@ -693,17 +693,17 @@ | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| r"Table.*does not exist": ("The specified table does not exist.", 404), | ||
| r"Table.*already exists": ("The table already exists.", 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
| r"syntax error": ("There was a syntax error in the query.", 400), | ||
| r"Catalog Error": ("The requested catalog item was not found.", 404), | ||
| r"Binder Error": ("There was an error binding the query.", 400), | ||
| r"Invalid input syntax": ("The input syntax is invalid.", 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
| r"No such file": ("The specified file was not found.", 404), | ||
| r"Permission denied": ("Access denied.", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| r"Entity ID": ("An error occurred with the entity ID.", 500), | ||
| r"session_id": ("Session ID not found. Please refresh the page.", 500), | ||
| } |
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, we will ensure that no sensitive information is exposed to the client. This involves modifying the sanitize_db_error_message function to avoid returning the raw error_msg to the client, even for known safe patterns. Instead, we will return predefined, generic messages for all error cases. Additionally, we will ensure that detailed error information is logged securely on the server for debugging purposes.
-
Copy modified line R691 -
Copy modified lines R694-R695 -
Copy modified lines R697-R700 -
Copy modified lines R703-R704 -
Copy modified lines R707-R708 -
Copy modified line R714 -
Copy modified line R721
| @@ -690,20 +690,20 @@ | ||
|
|
||
| # Define patterns for known safe errors | ||
| # Define patterns for known safe errors with predefined generic messages | ||
| safe_error_patterns = { | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| r"Table.*does not exist": ("The requested table does not exist.", 404), | ||
| r"Table.*already exists": ("The table already exists.", 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
| r"syntax error": ("There was a syntax error in the query.", 400), | ||
| r"Catalog Error": ("The requested catalog entry was not found.", 404), | ||
| r"Binder Error": ("There was an error binding the query.", 400), | ||
| r"Invalid input syntax": ("The input syntax is invalid.", 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
| r"No such file": ("The requested file was not found.", 404), | ||
| r"Permission denied": ("Access to the requested resource is denied.", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| r"Entity ID": ("An error occurred with the entity ID.", 500), | ||
| r"session_id": ("Session ID not found. Please refresh the page.", 500), | ||
| } | ||
| @@ -713,2 +713,3 @@ | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| logger.error(f"Sanitized error occurred: {safe_msg} (Original: {error_msg})") | ||
| return safe_msg, status_code | ||
| @@ -719,3 +720,3 @@ | ||
| # Return a generic error message for unknown errors | ||
| return "An unexpected error occurred", 500 | ||
| return "An unexpected error occurred. Please contact support.", 500 | ||
|
|
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, we will ensure that no sensitive information is exposed to the user, even for errors that match predefined patterns. Instead of returning the original error message (error_msg) for matched patterns, we will return a generic, user-friendly message. The detailed error information will still be logged on the server for debugging purposes. This approach ensures that sensitive information is not leaked while maintaining developer access to diagnostic details.
-
Copy modified lines R694-R695 -
Copy modified lines R697-R700 -
Copy modified lines R703-R704 -
Copy modified lines R707-R708
| @@ -693,17 +693,17 @@ | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| r"Table.*does not exist": ("The specified table does not exist.", 404), | ||
| r"Table.*already exists": ("The specified table already exists.", 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
| r"syntax error": ("There was a syntax error in the query.", 400), | ||
| r"Catalog Error": ("The requested catalog item was not found.", 404), | ||
| r"Binder Error": ("There was an error binding the query.", 400), | ||
| r"Invalid input syntax": ("The input syntax is invalid.", 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
| r"No such file": ("The specified file was not found.", 404), | ||
| r"Permission denied": ("Access denied.", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| r"Entity ID": ("An error occurred with the entity ID.", 500), | ||
| r"session_id": ("Session ID not found. Please refresh the page.", 500), | ||
| } |
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, we need to ensure that no sensitive information is exposed to the client, even if an error matches one of the predefined patterns. Instead of returning the matched error message (safe_msg) directly, we should return a generic error message for all cases. The detailed error information should only be logged on the server for debugging purposes. This approach ensures that no internal details are leaked to the client.
-
Copy modified lines R714-R715 -
Copy modified line R721
| @@ -713,3 +713,4 @@ | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
| logger.error(f"Sanitized error occurred: {safe_msg}") | ||
| return "An error occurred while processing your request.", status_code | ||
|
|
||
| @@ -719,3 +720,3 @@ | ||
| # Return a generic error message for unknown errors | ||
| return "An unexpected error occurred", 500 | ||
| return "An error occurred while processing your request.", 500 | ||
|
|
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "sample": [], | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, we will modify the sanitize_db_error_message function to ensure that no sensitive information is exposed to the user, even for errors that match the predefined patterns. Instead of returning the original error message (error_msg) for "safe" errors, we will return a generic message that describes the error type without revealing internal details. This ensures that all error messages sent to the client are sanitized and generic, while detailed error information is logged on the server for debugging purposes.
-
Copy modified lines R694-R695 -
Copy modified lines R697-R700 -
Copy modified lines R703-R704 -
Copy modified lines R707-R708
| @@ -693,17 +693,17 @@ | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| r"Table.*does not exist": ("The specified table does not exist.", 404), | ||
| r"Table.*already exists": ("The specified table already exists.", 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
| r"syntax error": ("There was a syntax error in the query.", 400), | ||
| r"Catalog Error": ("The requested catalog item was not found.", 404), | ||
| r"Binder Error": ("There was an error binding the query.", 400), | ||
| r"Invalid input syntax": ("The input syntax is invalid.", 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
| r"No such file": ("The specified file was not found.", 404), | ||
| r"Permission denied": ("Access denied.", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| r"Entity ID": ("An error occurred with the entity ID.", 500), | ||
| r"session_id": ("Session ID not found. Please refresh the page.", 500), | ||
| } |
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "sample": [], | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To address the issue, we will modify the sanitize_db_error_message function to ensure that no raw error messages are returned to the user, even if they match a pattern in safe_error_patterns. Instead, we will return predefined, generic messages for each pattern. This approach eliminates the risk of exposing sensitive information while still providing meaningful feedback to the user.
Steps to fix:
- Update the
sanitize_db_error_messagefunction to replaceerror_msgwith predefined, safe messages for each pattern insafe_error_patterns. - Ensure that the fallback mechanism for unknown errors remains intact, returning a generic error message ("An unexpected error occurred").
- Log the full error message on the server for debugging purposes, but do not expose it to the user.
-
Copy modified lines R694-R695 -
Copy modified lines R697-R700 -
Copy modified lines R703-R704 -
Copy modified lines R707-R708
| @@ -693,17 +693,17 @@ | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| r"Table.*does not exist": ("The specified table does not exist.", 404), | ||
| r"Table.*already exists": ("The table already exists.", 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
| r"syntax error": ("There is a syntax error in the query.", 400), | ||
| r"Catalog Error": ("The requested catalog item was not found.", 404), | ||
| r"Binder Error": ("There was an error binding the query.", 400), | ||
| r"Invalid input syntax": ("The input syntax is invalid.", 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
| r"No such file": ("The specified file was not found.", 404), | ||
| r"Permission denied": ("Access denied.", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| r"Entity ID": ("An error occurred with the entity ID.", 500), | ||
| r"session_id": ("Session ID not found. Please refresh the page.", 500), | ||
| } |
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To address the issue, we will modify the sanitize_db_error_message function to ensure that no part of the original error message is exposed to the client. Instead of returning the original error message for known patterns, we will return predefined, generic messages for each pattern. For unknown errors, we will continue to return a generic error message. This approach ensures that sensitive information is never exposed to the client, regardless of the error type.
Additionally, we will ensure that detailed error information is logged on the server for debugging purposes, but this information will not be sent to the client.
-
Copy modified lines R694-R695 -
Copy modified lines R697-R700 -
Copy modified lines R703-R704 -
Copy modified lines R707-R708
| @@ -693,17 +693,17 @@ | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| r"Table.*does not exist": ("The specified table does not exist.", 404), | ||
| r"Table.*already exists": ("The table already exists.", 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
| r"syntax error": ("There was a syntax error in the query.", 400), | ||
| r"Catalog Error": ("The requested catalog item was not found.", 404), | ||
| r"Binder Error": ("There was an error binding the query.", 400), | ||
| r"Invalid input syntax": ("The input syntax is invalid.", 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
| r"No such file": ("The specified file was not found.", 404), | ||
| r"Permission denied": ("Access denied.", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| r"Entity ID": ("An error occurred with the entity ID.", 500), | ||
| r"session_id": ("Session ID not found. Please refresh the page.", 500), | ||
| } |
| safe_msg, status_code = sanitize_db_error_message(e) | ||
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 7 months ago
To fix the issue, we need to ensure that no sensitive information, such as stack traces or internal error details, is exposed to the user. The sanitize_db_error_message function should be modified to always return a generic error message to the client, regardless of whether the error matches a predefined pattern. The original error message can still be logged for debugging purposes. This approach guarantees that sensitive information is not leaked while maintaining the ability to debug issues internally.
-
Copy modified line R712 -
Copy modified lines R714-R715 -
Copy modified line R721
| @@ -711,5 +711,6 @@ | ||
| # Check if error matches any safe pattern | ||
| for pattern, (safe_msg, status_code) in safe_error_patterns.items(): | ||
| for pattern, (_, status_code) in safe_error_patterns.items(): | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
| logger.error(f"Matched error pattern: {pattern}. Original error: {error_msg}") | ||
| return "An error occurred while processing your request.", status_code | ||
|
|
||
| @@ -719,3 +720,3 @@ | ||
| # Return a generic error message for unknown errors | ||
| return "An unexpected error occurred", 500 | ||
| return "An error occurred while processing your request.", 500 | ||
|
|
|
@Chenglong-MS, a post-merge comment - it seems the |
|
good catch, will patch this up. |
Introduce the external data loader class for extending Data Formulator to directly connect data from other sources.
Feature:
Developers:
external_data_loader.mov