Skip to content

Streamlit#2

Open
Ovler-Young wants to merge 6 commits intomainfrom
streamlit
Open

Streamlit#2
Ovler-Young wants to merge 6 commits intomainfrom
streamlit

Conversation

@Ovler-Young
Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings March 10, 2026 05:20
async with httpx.AsyncClient(timeout=300) as client:
if from_local:
usr_dir = jid.split("@")[0]
with open(f"user_backups/{usr_dir}/notes.json", "r") as f:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 29 days ago

In general, to fix uncontrolled path usage you should (1) define a safe base directory, (2) derive the user-specified component in a restricted/normalized manner, and (3) construct the final path using os.path.join and validate that the normalized result is still inside the base directory. Optionally, further restrict the user-derived component (e.g., only allow certain characters).

For this specific code, the best minimal fix is:

  • Define a constant base directory for all backups, e.g. BASE_BACKUP_DIR = "user_backups".
  • Sanitize usr_dir so it cannot contain path separators or traversal like "..". A simple, non-breaking approach is to strip it down to a whitelist of safe characters (letters, digits, underscore, dash), and fall back to an error if the result is empty.
  • Build the path using os.path.join(BASE_BACKUP_DIR, safe_usr_dir, "notes.json").
  • Normalize the path with os.path.normpath and verify that it starts with the normalized base backup directory path (using os.path.abspath or similar), raising an exception or showing an error if not.

All of this can be done directly in app.py, just above the open(...) call. We already import os, so no new imports are needed. We should keep the behavior of using the part before "@" for usr_dir but ensure it’s sanitized before use and checked against the base directory. For minimal impact and clarity inside this snippet, we can implement a small helper function (within the shown region) to construct and validate the backup path, then use it in place of the f-string path at line 59.

Suggested changeset 1
app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/app.py b/app.py
--- a/app.py
+++ b/app.py
@@ -52,11 +52,24 @@
 from_local = st.checkbox("从本地 notes.json 加载(跳过 API 拉取)")
 
 if st.button("开始导出") and jid:
+    def _get_safe_notes_path(jid_value: str) -> str:
+        # Use the part before "@" as user directory, as before
+        raw_usr_dir = jid_value.split("@")[0]
+        # Restrict to a safe subset of characters to avoid path traversal and separators
+        safe_usr_dir = "".join(ch for ch in raw_usr_dir if ch.isalnum() or ch in ("-", "_"))
+        if not safe_usr_dir:
+            raise ValueError("Invalid JID: user directory is empty or contains only unsafe characters")
+        base_dir = os.path.abspath("user_backups")
+        candidate = os.path.abspath(os.path.normpath(os.path.join(base_dir, safe_usr_dir, "notes.json")))
+        if not candidate.startswith(base_dir + os.sep):
+            raise ValueError("Invalid path derived from JID")
+        return candidate
+
     async def _export():
         async with httpx.AsyncClient(timeout=300) as client:
             if from_local:
-                usr_dir = jid.split("@")[0]
-                with open(f"user_backups/{usr_dir}/notes.json", "r") as f:
+                notes_path = _get_safe_notes_path(jid)
+                with open(notes_path, "r") as f:
                     notes = json.load(f)
             else:
                 r = await client.get(API_BASE + "notes", params={"jid": jid})
EOF
@@ -52,11 +52,24 @@
from_local = st.checkbox("从本地 notes.json 加载(跳过 API 拉取)")

if st.button("开始导出") and jid:
def _get_safe_notes_path(jid_value: str) -> str:
# Use the part before "@" as user directory, as before
raw_usr_dir = jid_value.split("@")[0]
# Restrict to a safe subset of characters to avoid path traversal and separators
safe_usr_dir = "".join(ch for ch in raw_usr_dir if ch.isalnum() or ch in ("-", "_"))
if not safe_usr_dir:
raise ValueError("Invalid JID: user directory is empty or contains only unsafe characters")
base_dir = os.path.abspath("user_backups")
candidate = os.path.abspath(os.path.normpath(os.path.join(base_dir, safe_usr_dir, "notes.json")))
if not candidate.startswith(base_dir + os.sep):
raise ValueError("Invalid path derived from JID")
return candidate

async def _export():
async with httpx.AsyncClient(timeout=300) as client:
if from_local:
usr_dir = jid.split("@")[0]
with open(f"user_backups/{usr_dir}/notes.json", "r") as f:
notes_path = _get_safe_notes_path(jid)
with open(notes_path, "r") as f:
notes = json.load(f)
else:
r = await client.get(API_BASE + "notes", params={"jid": jid})
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Streamlit-based UI for searching users and exporting backups, while making the backend API endpoint configurable via an environment variable.

Changes:

  • Add a new app.py Streamlit UI for user search + export workflow.
  • Make API_BASE configurable through the API_BASE environment variable in existing scripts.
  • Add streamlit and pandas dependencies and wire up a new takeout_ui service in Docker Compose.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
app.py New Streamlit UI that calls existing search/export helpers to provide an interactive workflow.
compose.yml Adds a takeout_ui service to run the Streamlit app alongside draws_index.
takeout.py Switches API base URL to be environment-configurable.
search.py Switches API base URL to be environment-configurable.
pyproject.toml Adds Streamlit and Pandas dependencies required by the UI.
uv.lock Updates the lockfile to include Streamlit/Pandas and their transitive dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +60
usr_dir = jid.split("@")[0]
with open(f"user_backups/{usr_dir}/notes.json", "r") as f:
notes = json.load(f)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jid (user input) is used to derive usr_dir and then interpolated into filesystem paths (e.g., user_backups/{usr_dir}/...). Without validation, a crafted jid containing path separators could write/read outside the intended backup directory. Please sanitize/validate usr_dir (e.g., allowlist characters like [A-Za-z0-9_-] and reject anything containing /, \\, or ..) before using it in paths.

Copilot uses AI. Check for mistakes.
)

API_BASE = "http://127.0.0.63:8539/api/"
API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/")
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since API_BASE is now configurable via API_BASE, the code assumes the provided value ends with a trailing / (because calls do API_BASE + "notes", etc.). To avoid subtle misconfiguration bugs, normalize this value (e.g., API_BASE = API_BASE.rstrip('/') + '/') or build URLs via urllib.parse.urljoin.

Suggested change
API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/")
API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/")
API_BASE = API_BASE.rstrip("/") + "/"

Copilot uses AI. Check for mistakes.
import csv

API_BASE = "http://127.0.0.63:8539/api/"
API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/")
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API_BASE is now read from the environment, but downstream code concatenates endpoints via API_BASE + "search"/"notes", which will break if the env var omits the trailing /. Consider normalizing API_BASE (e.g., rstrip('/') + '/') or using urljoin when constructing request URLs.

Suggested change
API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/")
API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/").rstrip("/") + "/"

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +25
command: sh -c "apk add --no-cache pandoc zip && uv run streamlit run app.py --server.address=0.0.0.0 --server.baseUrlPath=/ui"
depends_on:
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing pandoc and zip via apk add in the container command means every restart requires network access and adds startup latency (and can fail if mirrors are unavailable). Consider baking these OS packages into a small custom image (Dockerfile) and running uv sync/uv run without mutating the container at runtime.

Copilot uses AI. Check for mistakes.
@Ovler-Young Ovler-Young requested a review from yzqzss March 11, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants