Skip to content

Conversation

@raphael-intugle
Copy link
Collaborator

Potential fix for https://github.com/Intugle/data-tools/security/code-scanning/5

To robustly defend against any path traversal or escaping, after constructing the new CSV path, normalize and check that it still resides within the intended MODIFIED_DIR. This means:

  • After _csv_path_for(new_name, MODIFIED_DIR) produces new_path, resolve it to its absolute form (using .resolve() from pathlib).
  • Also resolve MODIFIED_DIR to its absolute path.
  • Before any filesystem operation (like replace() or writing), confirm that new_path is a child of MODIFIED_DIR via is_relative_to (Python 3.9+) or using a manual fallback.
  • If the check fails, abort the operation and inform the user, rather than proceeding.
  • This code should be added just after the assignment to new_path, before any filesystem modification.

No new methods or imports are required as Path.resolve() and (if available) Path.is_relative_to() are standard, but for broad compatibility (<3.9) we can use a try/except or manual check.

Edit region:

  • streamlit_app/main.py
    Just after new_path = _csv_path_for(new_name, MODIFIED_DIR) (line 671, in the RENAME TABLE button block).

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

…n path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

# Check that new_path is within MODIFIED_DIR to prevent path traversal or escapes
resolved_dir = MODIFIED_DIR.resolve()
resolved_path = new_path.resolve()

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

To robustly address the risk, we should:

  1. Update safe_filename in streamlit_app/helper.py so it rejects/filters any input that could result in absolute paths or traversal (".." or separators), not just forbidden characters.
  2. In _csv_path_for, after constructing the path, normalize (resolve) and check that the resulting path is inside MODIFIED_DIR, returning None or raising an exception if not.
  3. In the Streamlit UI (in main.py), before proceeding to rename, ensure the result of _csv_path_for is not None or is valid, and display an error if the filename is unsafe.

Concrete implementation:

  • Edit safe_filename: Reject any names containing path separators or "..", return a default safe name.
  • Edit _csv_path_for: After constructing the path, normalize with .resolve() and check if path is within MODIFIED_DIR, otherwise signal unsafe.
  • Edit UI logic in main.py: Ensure we abort with a clear message if the new filename/path is deemed unsafe by _csv_path_for.

Suggested changeset 2
streamlit_app/main.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/streamlit_app/main.py b/streamlit_app/main.py
--- a/streamlit_app/main.py
+++ b/streamlit_app/main.py
@@ -668,24 +668,10 @@
             if new_name == current_name:
                 st.toast("Table name unchanged.")
             else:
-                new_path = _csv_path_for(new_name,MODIFIED_DIR)
-
-                # Check that new_path is within MODIFIED_DIR to prevent path traversal or escapes
-                resolved_dir = MODIFIED_DIR.resolve()
-                resolved_path = new_path.resolve()
                 try:
-                    # Python 3.9+ has is_relative_to
-                    if hasattr(resolved_path, "is_relative_to"):
-                        if not resolved_path.is_relative_to(resolved_dir):
-                            st.error("Unsafe file name: resulting path is outside the allowed directory.")
-                            st.stop()
-                    else:
-                        # Fallback for Python <3.9
-                        if str(resolved_path).find(str(resolved_dir)) != 0:
-                            st.error("Unsafe file name: resulting path is outside the allowed directory.")
-                            st.stop()
-                except Exception as check_exc:
-                    st.error(f"Error verifying target path: {check_exc}")
+                    new_path = _csv_path_for(new_name,MODIFIED_DIR)
+                except ValueError as bad_path_exc:
+                    st.error(str(bad_path_exc))
                     st.stop()
 
                 # Move/rename the CSV on disk (if it exists). If it doesn't, create from in-memory df.
EOF
@@ -668,24 +668,10 @@
if new_name == current_name:
st.toast("Table name unchanged.")
else:
new_path = _csv_path_for(new_name,MODIFIED_DIR)

# Check that new_path is within MODIFIED_DIR to prevent path traversal or escapes
resolved_dir = MODIFIED_DIR.resolve()
resolved_path = new_path.resolve()
try:
# Python 3.9+ has is_relative_to
if hasattr(resolved_path, "is_relative_to"):
if not resolved_path.is_relative_to(resolved_dir):
st.error("Unsafe file name: resulting path is outside the allowed directory.")
st.stop()
else:
# Fallback for Python <3.9
if str(resolved_path).find(str(resolved_dir)) != 0:
st.error("Unsafe file name: resulting path is outside the allowed directory.")
st.stop()
except Exception as check_exc:
st.error(f"Error verifying target path: {check_exc}")
new_path = _csv_path_for(new_name,MODIFIED_DIR)
except ValueError as bad_path_exc:
st.error(str(bad_path_exc))
st.stop()

# Move/rename the CSV on disk (if it exists). If it doesn't, create from in-memory df.
streamlit_app/helper.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/streamlit_app/helper.py b/streamlit_app/helper.py
--- a/streamlit_app/helper.py
+++ b/streamlit_app/helper.py
@@ -173,6 +173,7 @@
     - Replaces sequences of forbidden characters with underscores.
     - Ensures the base name is non-empty (falls back to 'table').
     - Ensures `ext` starts with a '.'.
+    - Rejects path traversal and separator, always returns a safe basename.
 
     Parameters
     ----------
@@ -186,9 +187,13 @@
     str
         A sanitized filename like 'my_table.csv'.
     """
-    base = re.sub(r"[^A-Za-z0-9_.-]+", "_", name).strip("._")
-    if not base:
+    # Reject path traversal or absolute path attempts
+    if os.path.isabs(name) or ".." in name or "/" in name or "\\" in name:
         base = "table"
+    else:
+        base = re.sub(r"[^A-Za-z0-9_.-]+", "_", name).strip("._")
+        if not base:
+            base = "table"
 
     # Normalize extension: ensure a single leading dot if ext was provided
     ext = ext.strip()
@@ -720,8 +724,20 @@
     """
     Return the output CSV path for a given table name within MODIFIED_DIR.
     Requires `MODIFIED_DIR` and `safe_filename` to be defined globally.
+    Always guarantees containment within MODIFIED_DIR or raises ValueError.
     """
-    return MODIFIED_DIR / safe_filename(name, ".csv")  # type: ignore[name-defined]
+    filename = safe_filename(name, ".csv")
+    path = MODIFIED_DIR / filename
+    resolved_dir = MODIFIED_DIR.resolve()
+    resolved_path = path.resolve()
+    # Python 3.9+ supports is_relative_to
+    if hasattr(resolved_path, "is_relative_to"):
+        if not resolved_path.is_relative_to(resolved_dir):
+            raise ValueError("Unsafe file name: resulting path is outside the allowed directory.")
+    else:
+        if str(resolved_path).find(str(resolved_dir)) != 0:
+            raise ValueError("Unsafe file name: resulting path is outside the allowed directory.")
+    return path
 
 
 # ------------------------------ UI callbacks ------------------------------
EOF
@@ -173,6 +173,7 @@
- Replaces sequences of forbidden characters with underscores.
- Ensures the base name is non-empty (falls back to 'table').
- Ensures `ext` starts with a '.'.
- Rejects path traversal and separator, always returns a safe basename.

Parameters
----------
@@ -186,9 +187,13 @@
str
A sanitized filename like 'my_table.csv'.
"""
base = re.sub(r"[^A-Za-z0-9_.-]+", "_", name).strip("._")
if not base:
# Reject path traversal or absolute path attempts
if os.path.isabs(name) or ".." in name or "/" in name or "\\" in name:
base = "table"
else:
base = re.sub(r"[^A-Za-z0-9_.-]+", "_", name).strip("._")
if not base:
base = "table"

# Normalize extension: ensure a single leading dot if ext was provided
ext = ext.strip()
@@ -720,8 +724,20 @@
"""
Return the output CSV path for a given table name within MODIFIED_DIR.
Requires `MODIFIED_DIR` and `safe_filename` to be defined globally.
Always guarantees containment within MODIFIED_DIR or raises ValueError.
"""
return MODIFIED_DIR / safe_filename(name, ".csv") # type: ignore[name-defined]
filename = safe_filename(name, ".csv")
path = MODIFIED_DIR / filename
resolved_dir = MODIFIED_DIR.resolve()
resolved_path = path.resolve()
# Python 3.9+ supports is_relative_to
if hasattr(resolved_path, "is_relative_to"):
if not resolved_path.is_relative_to(resolved_dir):
raise ValueError("Unsafe file name: resulting path is outside the allowed directory.")
else:
if str(resolved_path).find(str(resolved_dir)) != 0:
raise ValueError("Unsafe file name: resulting path is outside the allowed directory.")
return path


# ------------------------------ UI callbacks ------------------------------
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants