Conversation
| async with httpx.AsyncClient(timeout=300) as client: | ||
| if from_local: | ||
| usr_dir = jid.split("@")[0] | ||
| with open(f"user_backups/{usr_dir}/notes.json", "r") as f: |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 29 days ago
In general, to fix uncontrolled path usage you should (1) define a safe base directory, (2) derive the user-specified component in a restricted/normalized manner, and (3) construct the final path using os.path.join and validate that the normalized result is still inside the base directory. Optionally, further restrict the user-derived component (e.g., only allow certain characters).
For this specific code, the best minimal fix is:
- Define a constant base directory for all backups, e.g.
BASE_BACKUP_DIR = "user_backups". - Sanitize
usr_dirso it cannot contain path separators or traversal like"..". A simple, non-breaking approach is to strip it down to a whitelist of safe characters (letters, digits, underscore, dash), and fall back to an error if the result is empty. - Build the path using
os.path.join(BASE_BACKUP_DIR, safe_usr_dir, "notes.json"). - Normalize the path with
os.path.normpathand verify that it starts with the normalized base backup directory path (usingos.path.abspathor similar), raising an exception or showing an error if not.
All of this can be done directly in app.py, just above the open(...) call. We already import os, so no new imports are needed. We should keep the behavior of using the part before "@" for usr_dir but ensure it’s sanitized before use and checked against the base directory. For minimal impact and clarity inside this snippet, we can implement a small helper function (within the shown region) to construct and validate the backup path, then use it in place of the f-string path at line 59.
| @@ -52,11 +52,24 @@ | ||
| from_local = st.checkbox("从本地 notes.json 加载(跳过 API 拉取)") | ||
|
|
||
| if st.button("开始导出") and jid: | ||
| def _get_safe_notes_path(jid_value: str) -> str: | ||
| # Use the part before "@" as user directory, as before | ||
| raw_usr_dir = jid_value.split("@")[0] | ||
| # Restrict to a safe subset of characters to avoid path traversal and separators | ||
| safe_usr_dir = "".join(ch for ch in raw_usr_dir if ch.isalnum() or ch in ("-", "_")) | ||
| if not safe_usr_dir: | ||
| raise ValueError("Invalid JID: user directory is empty or contains only unsafe characters") | ||
| base_dir = os.path.abspath("user_backups") | ||
| candidate = os.path.abspath(os.path.normpath(os.path.join(base_dir, safe_usr_dir, "notes.json"))) | ||
| if not candidate.startswith(base_dir + os.sep): | ||
| raise ValueError("Invalid path derived from JID") | ||
| return candidate | ||
|
|
||
| async def _export(): | ||
| async with httpx.AsyncClient(timeout=300) as client: | ||
| if from_local: | ||
| usr_dir = jid.split("@")[0] | ||
| with open(f"user_backups/{usr_dir}/notes.json", "r") as f: | ||
| notes_path = _get_safe_notes_path(jid) | ||
| with open(notes_path, "r") as f: | ||
| notes = json.load(f) | ||
| else: | ||
| r = await client.get(API_BASE + "notes", params={"jid": jid}) |
There was a problem hiding this comment.
Pull request overview
This PR introduces a Streamlit-based UI for searching users and exporting backups, while making the backend API endpoint configurable via an environment variable.
Changes:
- Add a new
app.pyStreamlit UI for user search + export workflow. - Make
API_BASEconfigurable through theAPI_BASEenvironment variable in existing scripts. - Add
streamlitandpandasdependencies and wire up a newtakeout_uiservice in Docker Compose.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
app.py |
New Streamlit UI that calls existing search/export helpers to provide an interactive workflow. |
compose.yml |
Adds a takeout_ui service to run the Streamlit app alongside draws_index. |
takeout.py |
Switches API base URL to be environment-configurable. |
search.py |
Switches API base URL to be environment-configurable. |
pyproject.toml |
Adds Streamlit and Pandas dependencies required by the UI. |
uv.lock |
Updates the lockfile to include Streamlit/Pandas and their transitive dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| usr_dir = jid.split("@")[0] | ||
| with open(f"user_backups/{usr_dir}/notes.json", "r") as f: | ||
| notes = json.load(f) |
There was a problem hiding this comment.
jid (user input) is used to derive usr_dir and then interpolated into filesystem paths (e.g., user_backups/{usr_dir}/...). Without validation, a crafted jid containing path separators could write/read outside the intended backup directory. Please sanitize/validate usr_dir (e.g., allowlist characters like [A-Za-z0-9_-] and reject anything containing /, \\, or ..) before using it in paths.
| ) | ||
|
|
||
| API_BASE = "http://127.0.0.63:8539/api/" | ||
| API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/") |
There was a problem hiding this comment.
Since API_BASE is now configurable via API_BASE, the code assumes the provided value ends with a trailing / (because calls do API_BASE + "notes", etc.). To avoid subtle misconfiguration bugs, normalize this value (e.g., API_BASE = API_BASE.rstrip('/') + '/') or build URLs via urllib.parse.urljoin.
| API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/") | |
| API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/") | |
| API_BASE = API_BASE.rstrip("/") + "/" |
| import csv | ||
|
|
||
| API_BASE = "http://127.0.0.63:8539/api/" | ||
| API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/") |
There was a problem hiding this comment.
API_BASE is now read from the environment, but downstream code concatenates endpoints via API_BASE + "search"/"notes", which will break if the env var omits the trailing /. Consider normalizing API_BASE (e.g., rstrip('/') + '/') or using urljoin when constructing request URLs.
| API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/") | |
| API_BASE = os.environ.get("API_BASE", "http://127.0.0.63:8539/api/").rstrip("/") + "/" |
| command: sh -c "apk add --no-cache pandoc zip && uv run streamlit run app.py --server.address=0.0.0.0 --server.baseUrlPath=/ui" | ||
| depends_on: |
There was a problem hiding this comment.
Installing pandoc and zip via apk add in the container command means every restart requires network access and adds startup latency (and can fail if mirrors are unavailable). Consider baking these OS packages into a small custom image (Dockerfile) and running uv sync/uv run without mutating the container at runtime.
No description provided.