Skip to content

Conversation

@Chenglong-MS
Copy link
Collaborator

Introduce the external data loader class for extending Data Formulator to directly connect data from other sources.

Feature:

  • connect to an external data source, and load data tables directly (into the local DuckDB) for exploration
  • Run a query against the data source and load the view to analysis (AI completion of the query is supported).
  • Two example data loaders are provided: MySQL and Azure Data Explorer (Kusto) data sources

Developers:

  • extend the data loader class to support more data sources
  • contribute to the project by building new data loaders
external_data_loader.mov

Comment on lines +738 to +741
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To fix the issue, we need to ensure that no sensitive information is exposed to the user, even if an error message matches one of the predefined patterns. The best approach is to replace all error messages returned to the user with generic, non-sensitive messages, while logging the full error details on the server for debugging purposes. This ensures that sensitive information, such as stack traces or internal server details, is not leaked to the client.

Specifically:

  1. Modify the sanitize_db_error_message function to return only generic error messages to the user, regardless of the error type.
  2. Log the full error message on the server for debugging purposes.
  3. Update the analyze_table, data_loader_list_data_loaders, and data_loader_list_tables functions to use the updated sanitization logic.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -690,32 +690,7 @@
     
-    # Define patterns for known safe errors
-    safe_error_patterns = {
-        # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
-        # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
-        
-        # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
-
-        # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
-    }
-    
-    # Check if error matches any safe pattern
-    for pattern, (safe_msg, status_code) in safe_error_patterns.items():
-        if re.search(pattern, error_msg, re.IGNORECASE):
-            return safe_msg, status_code
-            
     # Log the full error for debugging
-    logger.error(f"Unexpected error occurred: {error_msg}")
+    logger.error(f"Error occurred: {error_msg}")
     
-    # Return a generic error message for unknown errors
-    return "An unexpected error occurred", 500
+    # Return a generic error message for all errors
+    return "An internal error has occurred. Please contact support if the issue persists.", 500
 
@@ -739,4 +714,4 @@
             "status": "error", 
-            "message": safe_msg
-        }), status_code
+            "message": "An internal error has occurred. Please contact support if the issue persists."
+        }), 500
 
@@ -769,4 +744,4 @@
             "status": "error", 
-            "message": safe_msg
-        }), status_code
+            "message": "An internal error has occurred. Please contact support if the issue persists."
+        }), 500
 
EOF
@@ -690,32 +690,7 @@

# Define patterns for known safe errors
safe_error_patterns = {
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
}

# Check if error matches any safe pattern
for pattern, (safe_msg, status_code) in safe_error_patterns.items():
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code

# Log the full error for debugging
logger.error(f"Unexpected error occurred: {error_msg}")
logger.error(f"Error occurred: {error_msg}")

# Return a generic error message for unknown errors
return "An unexpected error occurred", 500
# Return a generic error message for all errors
return "An internal error has occurred. Please contact support if the issue persists.", 500

@@ -739,4 +714,4 @@
"status": "error",
"message": safe_msg
}), status_code
"message": "An internal error has occurred. Please contact support if the issue persists."
}), 500

@@ -769,4 +744,4 @@
"status": "error",
"message": safe_msg
}), status_code
"message": "An internal error has occurred. Please contact support if the issue persists."
}), 500

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +738 to +741
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To fix the issue, we will modify the sanitize_db_error_message function to ensure that no sensitive information is exposed to the user. Instead of returning error messages derived from the exception, we will always return a generic error message to the user. The detailed error message and stack trace will be logged on the server for debugging purposes. This approach ensures that no sensitive information is leaked while still allowing developers to diagnose issues.

Changes to be made:

  1. Update the sanitize_db_error_message function to always return a generic error message to the user, regardless of the exception.
  2. Log the full error message and stack trace on the server for debugging purposes.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -687,34 +687,6 @@
     """
-    # Convert error to string
-    error_msg = str(error)
-    
-    # Define patterns for known safe errors
-    safe_error_patterns = {
-        # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
-        # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
-        
-        # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
-
-        # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
-    }
-    
-    # Check if error matches any safe pattern
-    for pattern, (safe_msg, status_code) in safe_error_patterns.items():
-        if re.search(pattern, error_msg, re.IGNORECASE):
-            return safe_msg, status_code
-            
     # Log the full error for debugging
-    logger.error(f"Unexpected error occurred: {error_msg}")
+    logger.error(f"Error occurred: {str(error)}", exc_info=True)
     
-    # Return a generic error message for unknown errors
+    # Always return a generic error message to the user
     return "An unexpected error occurred", 500
EOF
@@ -687,34 +687,6 @@
"""
# Convert error to string
error_msg = str(error)

# Define patterns for known safe errors
safe_error_patterns = {
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
}

# Check if error matches any safe pattern
for pattern, (safe_msg, status_code) in safe_error_patterns.items():
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code

# Log the full error for debugging
logger.error(f"Unexpected error occurred: {error_msg}")
logger.error(f"Error occurred: {str(error)}", exc_info=True)

# Return a generic error message for unknown errors
# Always return a generic error message to the user
return "An unexpected error occurred", 500
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +768 to +771
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To address the issue, we will modify the sanitize_db_error_message function to ensure that no sensitive information is exposed to the user, even for known error patterns. Instead of returning the original error message for matched patterns, we will return predefined, generic messages for each pattern. This approach eliminates the risk of exposing sensitive details while still providing meaningful feedback to the user.

Additionally, we will ensure that all error messages are logged on the server for debugging purposes, but only sanitized messages are sent to the client.


Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -693,17 +693,17 @@
         # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
+        r"Table.*does not exist": ("The specified table does not exist.", 404),
+        r"Table.*already exists": ("The table already exists.", 409),
         # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
+        r"syntax error": ("There was a syntax error in the query.", 400),
+        r"Catalog Error": ("The requested catalog item was not found.", 404), 
+        r"Binder Error": ("There was an error binding the query.", 400),
+        r"Invalid input syntax": ("The input syntax is invalid.", 400),
         
         # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
+        r"No such file": ("The specified file was not found.", 404),
+        r"Permission denied": ("Access denied.", 403),
 
         # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
+        r"Entity ID": ("An error occurred with the entity ID.", 500),
+        r"session_id": ("Session ID not found. Please refresh the page.", 500),
     }
EOF
@@ -693,17 +693,17 @@
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
r"Table.*does not exist": ("The specified table does not exist.", 404),
r"Table.*already exists": ("The table already exists.", 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),
r"syntax error": ("There was a syntax error in the query.", 400),
r"Catalog Error": ("The requested catalog item was not found.", 404),
r"Binder Error": ("There was an error binding the query.", 400),
r"Invalid input syntax": ("The input syntax is invalid.", 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),
r"No such file": ("The specified file was not found.", 404),
r"Permission denied": ("Access denied.", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
r"Entity ID": ("An error occurred with the entity ID.", 500),
r"session_id": ("Session ID not found. Please refresh the page.", 500),
}
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +768 to +771
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To fix the issue, we will ensure that no sensitive information is exposed to the client. This involves modifying the sanitize_db_error_message function to avoid returning the raw error_msg to the client, even for known safe patterns. Instead, we will return predefined, generic messages for all error cases. Additionally, we will ensure that detailed error information is logged securely on the server for debugging purposes.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -690,20 +690,20 @@
     
-    # Define patterns for known safe errors
+    # Define patterns for known safe errors with predefined generic messages
     safe_error_patterns = {
         # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
+        r"Table.*does not exist": ("The requested table does not exist.", 404),
+        r"Table.*already exists": ("The table already exists.", 409),
         # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
+        r"syntax error": ("There was a syntax error in the query.", 400),
+        r"Catalog Error": ("The requested catalog entry was not found.", 404), 
+        r"Binder Error": ("There was an error binding the query.", 400),
+        r"Invalid input syntax": ("The input syntax is invalid.", 400),
         
         # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
+        r"No such file": ("The requested file was not found.", 404),
+        r"Permission denied": ("Access to the requested resource is denied.", 403),
 
         # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
+        r"Entity ID": ("An error occurred with the entity ID.", 500),
+        r"session_id": ("Session ID not found. Please refresh the page.", 500),
     }
@@ -713,2 +713,3 @@
         if re.search(pattern, error_msg, re.IGNORECASE):
+            logger.error(f"Sanitized error occurred: {safe_msg} (Original: {error_msg})")
             return safe_msg, status_code
@@ -719,3 +720,3 @@
     # Return a generic error message for unknown errors
-    return "An unexpected error occurred", 500
+    return "An unexpected error occurred. Please contact support.", 500
 
EOF
@@ -690,20 +690,20 @@

# Define patterns for known safe errors
# Define patterns for known safe errors with predefined generic messages
safe_error_patterns = {
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
r"Table.*does not exist": ("The requested table does not exist.", 404),
r"Table.*already exists": ("The table already exists.", 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),
r"syntax error": ("There was a syntax error in the query.", 400),
r"Catalog Error": ("The requested catalog entry was not found.", 404),
r"Binder Error": ("There was an error binding the query.", 400),
r"Invalid input syntax": ("The input syntax is invalid.", 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),
r"No such file": ("The requested file was not found.", 404),
r"Permission denied": ("Access to the requested resource is denied.", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
r"Entity ID": ("An error occurred with the entity ID.", 500),
r"session_id": ("Session ID not found. Please refresh the page.", 500),
}
@@ -713,2 +713,3 @@
if re.search(pattern, error_msg, re.IGNORECASE):
logger.error(f"Sanitized error occurred: {safe_msg} (Original: {error_msg})")
return safe_msg, status_code
@@ -719,3 +720,3 @@
# Return a generic error message for unknown errors
return "An unexpected error occurred", 500
return "An unexpected error occurred. Please contact support.", 500

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +799 to +802
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To fix the issue, we will ensure that no sensitive information is exposed to the user, even for errors that match predefined patterns. Instead of returning the original error message (error_msg) for matched patterns, we will return a generic, user-friendly message. The detailed error information will still be logged on the server for debugging purposes. This approach ensures that sensitive information is not leaked while maintaining developer access to diagnostic details.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -693,17 +693,17 @@
         # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
+        r"Table.*does not exist": ("The specified table does not exist.", 404),
+        r"Table.*already exists": ("The specified table already exists.", 409),
         # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
+        r"syntax error": ("There was a syntax error in the query.", 400),
+        r"Catalog Error": ("The requested catalog item was not found.", 404), 
+        r"Binder Error": ("There was an error binding the query.", 400),
+        r"Invalid input syntax": ("The input syntax is invalid.", 400),
         
         # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
+        r"No such file": ("The specified file was not found.", 404),
+        r"Permission denied": ("Access denied.", 403),
 
         # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
+        r"Entity ID": ("An error occurred with the entity ID.", 500),
+        r"session_id": ("Session ID not found. Please refresh the page.", 500),
     }
EOF
@@ -693,17 +693,17 @@
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
r"Table.*does not exist": ("The specified table does not exist.", 404),
r"Table.*already exists": ("The specified table already exists.", 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),
r"syntax error": ("There was a syntax error in the query.", 400),
r"Catalog Error": ("The requested catalog item was not found.", 404),
r"Binder Error": ("There was an error binding the query.", 400),
r"Invalid input syntax": ("The input syntax is invalid.", 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),
r"No such file": ("The specified file was not found.", 404),
r"Permission denied": ("Access denied.", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
r"Entity ID": ("An error occurred with the entity ID.", 500),
r"session_id": ("Session ID not found. Please refresh the page.", 500),
}
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +799 to +802
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To fix the issue, we need to ensure that no sensitive information is exposed to the client, even if an error matches one of the predefined patterns. Instead of returning the matched error message (safe_msg) directly, we should return a generic error message for all cases. The detailed error information should only be logged on the server for debugging purposes. This approach ensures that no internal details are leaked to the client.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -713,3 +713,4 @@
         if re.search(pattern, error_msg, re.IGNORECASE):
-            return safe_msg, status_code
+            logger.error(f"Sanitized error occurred: {safe_msg}")
+            return "An error occurred while processing your request.", status_code
             
@@ -719,3 +720,3 @@
     # Return a generic error message for unknown errors
-    return "An unexpected error occurred", 500
+    return "An error occurred while processing your request.", 500
 
EOF
@@ -713,3 +713,4 @@
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code
logger.error(f"Sanitized error occurred: {safe_msg}")
return "An error occurred while processing your request.", status_code

@@ -719,3 +720,3 @@
# Return a generic error message for unknown errors
return "An unexpected error occurred", 500
return "An error occurred while processing your request.", 500

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +830 to +834
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"sample": [],
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To fix the issue, we will modify the sanitize_db_error_message function to ensure that no sensitive information is exposed to the user, even for errors that match the predefined patterns. Instead of returning the original error message (error_msg) for "safe" errors, we will return a generic message that describes the error type without revealing internal details. This ensures that all error messages sent to the client are sanitized and generic, while detailed error information is logged on the server for debugging purposes.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -693,17 +693,17 @@
         # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
+        r"Table.*does not exist": ("The specified table does not exist.", 404),
+        r"Table.*already exists": ("The specified table already exists.", 409),
         # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
+        r"syntax error": ("There was a syntax error in the query.", 400),
+        r"Catalog Error": ("The requested catalog item was not found.", 404), 
+        r"Binder Error": ("There was an error binding the query.", 400),
+        r"Invalid input syntax": ("The input syntax is invalid.", 400),
         
         # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
+        r"No such file": ("The specified file was not found.", 404),
+        r"Permission denied": ("Access denied.", 403),
 
         # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
+        r"Entity ID": ("An error occurred with the entity ID.", 500),
+        r"session_id": ("Session ID not found. Please refresh the page.", 500),
     }
EOF
@@ -693,17 +693,17 @@
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
r"Table.*does not exist": ("The specified table does not exist.", 404),
r"Table.*already exists": ("The specified table already exists.", 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),
r"syntax error": ("There was a syntax error in the query.", 400),
r"Catalog Error": ("The requested catalog item was not found.", 404),
r"Binder Error": ("There was an error binding the query.", 400),
r"Invalid input syntax": ("The input syntax is invalid.", 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),
r"No such file": ("The specified file was not found.", 404),
r"Permission denied": ("Access denied.", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
r"Entity ID": ("An error occurred with the entity ID.", 500),
r"session_id": ("Session ID not found. Please refresh the page.", 500),
}
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +830 to +834
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"sample": [],
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To address the issue, we will modify the sanitize_db_error_message function to ensure that no raw error messages are returned to the user, even if they match a pattern in safe_error_patterns. Instead, we will return predefined, generic messages for each pattern. This approach eliminates the risk of exposing sensitive information while still providing meaningful feedback to the user.

Steps to fix:

  1. Update the sanitize_db_error_message function to replace error_msg with predefined, safe messages for each pattern in safe_error_patterns.
  2. Ensure that the fallback mechanism for unknown errors remains intact, returning a generic error message ("An unexpected error occurred").
  3. Log the full error message on the server for debugging purposes, but do not expose it to the user.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -693,17 +693,17 @@
         # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
+        r"Table.*does not exist": ("The specified table does not exist.", 404),
+        r"Table.*already exists": ("The table already exists.", 409),
         # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
+        r"syntax error": ("There is a syntax error in the query.", 400),
+        r"Catalog Error": ("The requested catalog item was not found.", 404), 
+        r"Binder Error": ("There was an error binding the query.", 400),
+        r"Invalid input syntax": ("The input syntax is invalid.", 400),
         
         # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
+        r"No such file": ("The specified file was not found.", 404),
+        r"Permission denied": ("Access denied.", 403),
 
         # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
+        r"Entity ID": ("An error occurred with the entity ID.", 500),
+        r"session_id": ("Session ID not found. Please refresh the page.", 500),
     }
EOF
@@ -693,17 +693,17 @@
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
r"Table.*does not exist": ("The specified table does not exist.", 404),
r"Table.*already exists": ("The table already exists.", 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),
r"syntax error": ("There is a syntax error in the query.", 400),
r"Catalog Error": ("The requested catalog item was not found.", 404),
r"Binder Error": ("There was an error binding the query.", 400),
r"Invalid input syntax": ("The input syntax is invalid.", 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),
r"No such file": ("The specified file was not found.", 404),
r"Permission denied": ("Access denied.", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
r"Entity ID": ("An error occurred with the entity ID.", 500),
r"session_id": ("Session ID not found. Please refresh the page.", 500),
}
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +863 to +866
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To address the issue, we will modify the sanitize_db_error_message function to ensure that no part of the original error message is exposed to the client. Instead of returning the original error message for known patterns, we will return predefined, generic messages for each pattern. For unknown errors, we will continue to return a generic error message. This approach ensures that sensitive information is never exposed to the client, regardless of the error type.

Additionally, we will ensure that detailed error information is logged on the server for debugging purposes, but this information will not be sent to the client.


Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -693,17 +693,17 @@
         # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
+        r"Table.*does not exist": ("The specified table does not exist.", 404),
+        r"Table.*already exists": ("The table already exists.", 409),
         # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
+        r"syntax error": ("There was a syntax error in the query.", 400),
+        r"Catalog Error": ("The requested catalog item was not found.", 404), 
+        r"Binder Error": ("There was an error binding the query.", 400),
+        r"Invalid input syntax": ("The input syntax is invalid.", 400),
         
         # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
+        r"No such file": ("The specified file was not found.", 404),
+        r"Permission denied": ("Access denied.", 403),
 
         # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
+        r"Entity ID": ("An error occurred with the entity ID.", 500),
+        r"session_id": ("Session ID not found. Please refresh the page.", 500),
     }
EOF
@@ -693,17 +693,17 @@
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
r"Table.*does not exist": ("The specified table does not exist.", 404),
r"Table.*already exists": ("The table already exists.", 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),
r"syntax error": ("There was a syntax error in the query.", 400),
r"Catalog Error": ("The requested catalog item was not found.", 404),
r"Binder Error": ("There was an error binding the query.", 400),
r"Invalid input syntax": ("The input syntax is invalid.", 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),
r"No such file": ("The specified file was not found.", 404),
r"Permission denied": ("Access denied.", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
r"Entity ID": ("An error occurred with the entity ID.", 500),
r"session_id": ("Session ID not found. Please refresh the page.", 500),
}
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +863 to +866
safe_msg, status_code = sanitize_db_error_message(e)
return jsonify({
"status": "error",
"message": safe_msg

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 7 months ago

To fix the issue, we need to ensure that no sensitive information, such as stack traces or internal error details, is exposed to the user. The sanitize_db_error_message function should be modified to always return a generic error message to the client, regardless of whether the error matches a predefined pattern. The original error message can still be logged for debugging purposes. This approach guarantees that sensitive information is not leaked while maintaining the ability to debug issues internally.


Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -711,5 +711,6 @@
     # Check if error matches any safe pattern
-    for pattern, (safe_msg, status_code) in safe_error_patterns.items():
+    for pattern, (_, status_code) in safe_error_patterns.items():
         if re.search(pattern, error_msg, re.IGNORECASE):
-            return safe_msg, status_code
+            logger.error(f"Matched error pattern: {pattern}. Original error: {error_msg}")
+            return "An error occurred while processing your request.", status_code
             
@@ -719,3 +720,3 @@
     # Return a generic error message for unknown errors
-    return "An unexpected error occurred", 500
+    return "An error occurred while processing your request.", 500
 
EOF
@@ -711,5 +711,6 @@
# Check if error matches any safe pattern
for pattern, (safe_msg, status_code) in safe_error_patterns.items():
for pattern, (_, status_code) in safe_error_patterns.items():
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code
logger.error(f"Matched error pattern: {pattern}. Original error: {error_msg}")
return "An error occurred while processing your request.", status_code

@@ -719,3 +720,3 @@
# Return a generic error message for unknown errors
return "An unexpected error occurred", 500
return "An error occurred while processing your request.", 500

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@Chenglong-MS Chenglong-MS merged commit 0d3e6c0 into main May 13, 2025
7 checks passed
@lucaslfranco
Copy link

@Chenglong-MS, a post-merge comment - it seems the azure-kusto-data package is missing in the requirements.txt file. I could not get the app running without installing that package first.

@Chenglong-MS
Copy link
Collaborator Author

good catch, will patch this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants