Skip to content

Web UI #634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 140 commits into
base: main
Choose a base branch
from
Open

Web UI #634

wants to merge 140 commits into from

Conversation

hasan7n
Copy link
Contributor

@hasan7n hasan7n commented Dec 21, 2024

Comments/discussions to check:

To be merged before merging to main: #618, which depends on #615

@mhmdk0 mhmdk0 had a problem deploying to testing-external-code March 23, 2025 01:42 — with GitHub Actions Failure
@mhmdk0 mhmdk0 had a problem deploying to testing-external-code April 3, 2025 21:13 — with GitHub Actions Failure
@mhmdk0 mhmdk0 had a problem deploying to testing-external-code April 15, 2025 16:19 — with GitHub Actions Failure
@mhmdk0 mhmdk0 had a problem deploying to testing-external-code April 23, 2025 00:12 — with GitHub Actions Failure
@mhmdk0 mhmdk0 requested a deployment to testing-external-code May 15, 2025 11:30 — with GitHub Actions Waiting
@mhmdk0 mhmdk0 requested a deployment to testing-external-code May 26, 2025 20:41 — with GitHub Actions Waiting
# List directories inside the path and sort them
sorted_folders = []
sorted_files = []
for item in os.listdir(full_path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 6 days ago

To fix the issue, we will enhance the validation of the path parameter to ensure it does not allow directory traversal attacks. Specifically:

  1. Normalize the path using os.path.normpath to remove any .. segments.
  2. Verify that the normalized full_path starts with BASE_DIR to ensure it is contained within the base directory.
  3. Raise an exception if the validation fails.

This approach ensures that even if a malicious user provides a crafted input, the resulting path will not escape the intended base directory.


Suggested changeset 1
cli/medperf/web_ui/api/routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/api/routes.py b/cli/medperf/web_ui/api/routes.py
--- a/cli/medperf/web_ui/api/routes.py
+++ b/cli/medperf/web_ui/api/routes.py
@@ -21,10 +21,11 @@
 ):
-    full_path = os.path.abspath(os.path.join(BASE_DIR, path))
-
-    if not os.path.exists(full_path) or not os.path.isdir(full_path):
-        raise HTTPException(status_code=404, detail="Directory not found")
-
-    # Ensure path is within the base directory
-    if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
-        raise HTTPException(status_code=403, detail="Access denied")
+    # Normalize the path to prevent directory traversal
+    normalized_path = os.path.normpath(path)
+    full_path = os.path.abspath(os.path.join(BASE_DIR, normalized_path))
+
+    if not full_path.startswith(BASE_DIR):
+        raise HTTPException(status_code=403, detail="Access denied")
+
+    if not os.path.exists(full_path) or not os.path.isdir(full_path):
+        raise HTTPException(status_code=404, detail="Directory not found")
 
EOF
@@ -21,10 +21,11 @@
):
full_path = os.path.abspath(os.path.join(BASE_DIR, path))

if not os.path.exists(full_path) or not os.path.isdir(full_path):
raise HTTPException(status_code=404, detail="Directory not found")

# Ensure path is within the base directory
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
raise HTTPException(status_code=403, detail="Access denied")
# Normalize the path to prevent directory traversal
normalized_path = os.path.normpath(path)
full_path = os.path.abspath(os.path.join(BASE_DIR, normalized_path))

if not full_path.startswith(BASE_DIR):
raise HTTPException(status_code=403, detail="Access denied")

if not os.path.exists(full_path) or not os.path.isdir(full_path):
raise HTTPException(status_code=404, detail="Directory not found")

Copilot is powered by AI and may make mistakes. Always verify output.
sorted_files = []
for item in os.listdir(full_path):
item_path = os.path.join(full_path, item)
if os.path.isdir(item_path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 6 days ago

To fix the issue, we need to ensure that the full_path is normalized before performing the containment check against BASE_DIR. This can be achieved by using os.path.normpath or os.path.realpath on the full_path before comparing it with BASE_DIR. This normalization step will resolve any .. sequences or symbolic links in the path, ensuring that the containment check is robust.

Additionally, we should ensure that the full_path is not only within the BASE_DIR but also a valid directory before proceeding with further operations.

Suggested changeset 1
cli/medperf/web_ui/api/routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/api/routes.py b/cli/medperf/web_ui/api/routes.py
--- a/cli/medperf/web_ui/api/routes.py
+++ b/cli/medperf/web_ui/api/routes.py
@@ -21,10 +21,10 @@
 ):
-    full_path = os.path.abspath(os.path.join(BASE_DIR, path))
-
-    if not os.path.exists(full_path) or not os.path.isdir(full_path):
-        raise HTTPException(status_code=404, detail="Directory not found")
-
-    # Ensure path is within the base directory
-    if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
-        raise HTTPException(status_code=403, detail="Access denied")
+    full_path = os.path.realpath(os.path.join(BASE_DIR, path))
+
+    # Ensure path is within the base directory
+    if not full_path.startswith(BASE_DIR):
+        raise HTTPException(status_code=403, detail="Access denied")
+
+    if not os.path.exists(full_path) or not os.path.isdir(full_path):
+        raise HTTPException(status_code=404, detail="Directory not found")
 
EOF
@@ -21,10 +21,10 @@
):
full_path = os.path.abspath(os.path.join(BASE_DIR, path))

if not os.path.exists(full_path) or not os.path.isdir(full_path):
raise HTTPException(status_code=404, detail="Directory not found")

# Ensure path is within the base directory
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
raise HTTPException(status_code=403, detail="Access denied")
full_path = os.path.realpath(os.path.join(BASE_DIR, path))

# Ensure path is within the base directory
if not full_path.startswith(BASE_DIR):
raise HTTPException(status_code=403, detail="Access denied")

if not os.path.exists(full_path) or not os.path.isdir(full_path):
raise HTTPException(status_code=404, detail="Directory not found")

Copilot is powered by AI and may make mistakes. Always verify output.
folders = []
for item in sorted_items:
item_path = os.path.join(full_path, item)
if os.path.isdir(item_path):

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 6 days ago

To fix the issue, we need to ensure that the path parameter is properly sanitized and validated before being used to construct file paths. This involves normalizing the path using os.path.normpath and ensuring that the resulting path is strictly within the BASE_DIR. Additionally, we should validate item_path to ensure it does not point outside the allowed directory structure.

Steps to fix:

  1. Normalize path using os.path.normpath before constructing full_path.
  2. Revalidate that the normalized full_path starts with BASE_DIR after normalization.
  3. Ensure that item_path is also validated to prevent any potential misuse.

Suggested changeset 1
cli/medperf/web_ui/api/routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/api/routes.py b/cli/medperf/web_ui/api/routes.py
--- a/cli/medperf/web_ui/api/routes.py
+++ b/cli/medperf/web_ui/api/routes.py
@@ -21,10 +21,11 @@
 ):
-    full_path = os.path.abspath(os.path.join(BASE_DIR, path))
-
-    if not os.path.exists(full_path) or not os.path.isdir(full_path):
-        raise HTTPException(status_code=404, detail="Directory not found")
-
-    # Ensure path is within the base directory
-    if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
-        raise HTTPException(status_code=403, detail="Access denied")
+    normalized_path = os.path.normpath(path)
+    full_path = os.path.abspath(os.path.join(BASE_DIR, normalized_path))
+
+    if not os.path.exists(full_path) or not os.path.isdir(full_path):
+        raise HTTPException(status_code=404, detail="Directory not found")
+
+    # Ensure path is within the base directory
+    if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
+        raise HTTPException(status_code=403, detail="Access denied")
 
@@ -48,4 +49,7 @@
     for item in sorted_items:
-        item_path = os.path.join(full_path, item)
-        if os.path.isdir(item_path):
+        item_path = os.path.join(full_path, item)
+        # Ensure item_path is within full_path
+        if not os.path.commonpath([full_path, item_path]) == full_path:
+            continue
+        if os.path.isdir(item_path):
             folders.append({"name": item, "path": item_path, "type": "dir"})
EOF
@@ -21,10 +21,11 @@
):
full_path = os.path.abspath(os.path.join(BASE_DIR, path))

if not os.path.exists(full_path) or not os.path.isdir(full_path):
raise HTTPException(status_code=404, detail="Directory not found")

# Ensure path is within the base directory
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
raise HTTPException(status_code=403, detail="Access denied")
normalized_path = os.path.normpath(path)
full_path = os.path.abspath(os.path.join(BASE_DIR, normalized_path))

if not os.path.exists(full_path) or not os.path.isdir(full_path):
raise HTTPException(status_code=404, detail="Directory not found")

# Ensure path is within the base directory
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR:
raise HTTPException(status_code=403, detail="Access denied")

@@ -48,4 +49,7 @@
for item in sorted_items:
item_path = os.path.join(full_path, item)
if os.path.isdir(item_path):
item_path = os.path.join(full_path, item)
# Ensure item_path is within full_path
if not os.path.commonpath([full_path, item_path]) == full_path:
continue
if os.path.isdir(item_path):
folders.append({"name": item, "path": item_path, "type": "dir"})
Copilot is powered by AI and may make mistakes. Always verify output.
# Check if user is already authenticated
if token == security_token:
# User is already authenticated, redirect to original URL
return RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)

Check warning

Code scanning / CodeQL

URL redirection from remote source Medium

Untrusted URL redirection depends on a
user-provided value
.
redirect_url: str = Form("/"),
):
if token == security_token:
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)

Check warning

Code scanning / CodeQL

URL redirection from remote source Medium

Untrusted URL redirection depends on a
user-provided value
.

Copilot Autofix

AI 6 days ago

To fix the issue, we need to validate the redirect_url parameter before using it in the RedirectResponse. A safe approach is to ensure that the redirect_url is either a relative URL or matches a predefined list of allowed URLs. This can be achieved using Python's urlparse module to check that the redirect_url does not include an explicit host name or scheme, ensuring it is a relative path. If the validation fails, the application should redirect to a default safe URL (e.g., the home page).

The changes will be made in the access_web_ui function where the redirect_url is used. We will also add a utility function to perform the validation.


Suggested changeset 1
cli/medperf/web_ui/security_check.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/security_check.py b/cli/medperf/web_ui/security_check.py
--- a/cli/medperf/web_ui/security_check.py
+++ b/cli/medperf/web_ui/security_check.py
@@ -5,4 +5,9 @@
 from medperf.web_ui.common import templates, api_key_cookie
-
-router = APIRouter()
+from urllib.parse import urlparse
+
+def is_safe_redirect_url(url: str) -> bool:
+    """Validate that the URL is a relative path or matches allowed hosts."""
+    url = url.replace("\\", "")  # Normalize backslashes
+    parsed = urlparse(url)
+    return not parsed.netloc and not parsed.scheme
 
@@ -32,6 +37,8 @@
 ):
-    if token == security_token:
-        response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
-        response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
-        return response
+    if token == security_token:
+        if not is_safe_redirect_url(redirect_url):
+            redirect_url = "/"  # Default to home page if validation fails
+        response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
+        response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
+        return response
     else:
EOF
@@ -5,4 +5,9 @@
from medperf.web_ui.common import templates, api_key_cookie

router = APIRouter()
from urllib.parse import urlparse

def is_safe_redirect_url(url: str) -> bool:
"""Validate that the URL is a relative path or matches allowed hosts."""
url = url.replace("\\", "") # Normalize backslashes
parsed = urlparse(url)
return not parsed.netloc and not parsed.scheme

@@ -32,6 +37,8 @@
):
if token == security_token:
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
return response
if token == security_token:
if not is_safe_redirect_url(redirect_url):
redirect_url = "/" # Default to home page if validation fails
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
return response
else:
Copilot is powered by AI and may make mistakes. Always verify output.
):
if token == security_token:
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
response.set_cookie(key=AUTH_COOKIE_NAME, value=token)

Check warning

Code scanning / CodeQL

Failure to use secure cookies Medium

Cookie is added without the Secure and HttpOnly attributes properly set.

Copilot Autofix

AI 6 days ago

To fix the issue, the set_cookie method should explicitly set the secure, httponly, and samesite attributes. Specifically:

  • secure=True ensures the cookie is only sent over HTTPS.
  • httponly=True prevents JavaScript from accessing the cookie.
  • samesite='Lax' or samesite='Strict' mitigates CSRF risks by restricting cross-origin requests.

The fix involves modifying the set_cookie call on line 35 to include these attributes.


Suggested changeset 1
cli/medperf/web_ui/security_check.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/security_check.py b/cli/medperf/web_ui/security_check.py
--- a/cli/medperf/web_ui/security_check.py
+++ b/cli/medperf/web_ui/security_check.py
@@ -34,3 +34,3 @@
         response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
-        response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
+        response.set_cookie(key=AUTH_COOKIE_NAME, value=token, secure=True, httponly=True, samesite='Lax')
         return response
EOF
@@ -34,3 +34,3 @@
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
response.set_cookie(key=AUTH_COOKIE_NAME, value=token, secure=True, httponly=True, samesite='Lax')
return response
Copilot is powered by AI and may make mistakes. Always verify output.
):
if token == security_token:
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
response.set_cookie(key=AUTH_COOKIE_NAME, value=token)

Check warning

Code scanning / CodeQL

Construction of a cookie using user-supplied input Medium

Cookie is constructed from a
user-supplied input
.

Copilot Autofix

AI 6 days ago

To fix the issue, we need to ensure that the token value is sanitized or validated before being used to construct the cookie. A secure approach would involve:

  1. Validating the token against a predefined set of acceptable values or formats.
  2. Ensuring that the token does not contain any malicious or unexpected content.
  3. Optionally, encoding the token to prevent injection attacks.

In this case, we will validate the token by ensuring it matches the security_token and then use a secure, predefined value (e.g., security_token) to set the cookie instead of the raw user input.


Suggested changeset 1
cli/medperf/web_ui/security_check.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/security_check.py b/cli/medperf/web_ui/security_check.py
--- a/cli/medperf/web_ui/security_check.py
+++ b/cli/medperf/web_ui/security_check.py
@@ -32,6 +32,6 @@
 ):
-    if token == security_token:
-        response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
-        response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
-        return response
+    if token == security_token:
+        response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
+        response.set_cookie(key=AUTH_COOKIE_NAME, value=security_token)
+        return response
     else:
EOF
@@ -32,6 +32,6 @@
):
if token == security_token:
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
response.set_cookie(key=AUTH_COOKIE_NAME, value=token)
return response
if token == security_token:
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND)
response.set_cookie(key=AUTH_COOKIE_NAME, value=security_token)
return response
else:
Copilot is powered by AI and may make mistakes. Always verify output.
@mhmdk0 mhmdk0 requested a deployment to testing-external-code May 27, 2025 14:55 — with GitHub Actions Waiting
@mhmdk0 mhmdk0 requested a deployment to testing-external-code May 27, 2025 17:15 — with GitHub Actions Waiting

# raw_data_path should be provided if the imported dataset is in dev
if self.dataset.state == "DEVELOPMENT" and (
self.raw_data_path is None or os.path.exists(self.raw_data_path)
self.raw_data_path is None
or os.path.isfile(self.raw_data_path)

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

To fix the issue, we need to validate the raw_data_path to ensure it is within a predefined safe root directory. This can be achieved by:

  1. Normalizing the raw_data_path using os.path.realpath or Path.resolve to remove any .. segments or symbolic links.
  2. Verifying that the normalized path starts with a predefined safe root directory (e.g., a directory dedicated to storing raw data).
  3. Raising an exception if the validation fails.

The validation should be added in the validate_input method of the ImportDataset class, as this is where the input parameters are initially checked.


Suggested changeset 2
cli/medperf/commands/dataset/import_dataset.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/commands/dataset/import_dataset.py b/cli/medperf/commands/dataset/import_dataset.py
--- a/cli/medperf/commands/dataset/import_dataset.py
+++ b/cli/medperf/commands/dataset/import_dataset.py
@@ -32,10 +32,22 @@
         # raw_data_path should be provided if the imported dataset is in dev
-        if self.dataset.state == "DEVELOPMENT" and (
-            self.raw_data_path is None
-            or os.path.isfile(self.raw_data_path)
-            or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))
-        ):
-            raise InvalidArgumentError(
-                "Output raw data path must be specified and, the directory should be empty or does not exist."
-            )
+        if self.dataset.state == "DEVELOPMENT":
+            if self.raw_data_path is None:
+                raise InvalidArgumentError(
+                    "Output raw data path must be specified."
+                )
+
+            # Normalize and validate raw_data_path
+            safe_root = config.raw_data_storage  # Define a safe root directory in the config
+            normalized_path = str(Path(self.raw_data_path).resolve())
+            if not normalized_path.startswith(str(Path(safe_root).resolve())):
+                raise InvalidArgumentError(
+                    f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}."
+                )
+
+            if os.path.isfile(normalized_path) or (
+                os.path.exists(normalized_path) and os.listdir(normalized_path)
+            ):
+                raise InvalidArgumentError(
+                    "Output raw data path must be an empty directory or not exist."
+                )
 
EOF
@@ -32,10 +32,22 @@
# raw_data_path should be provided if the imported dataset is in dev
if self.dataset.state == "DEVELOPMENT" and (
self.raw_data_path is None
or os.path.isfile(self.raw_data_path)
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))
):
raise InvalidArgumentError(
"Output raw data path must be specified and, the directory should be empty or does not exist."
)
if self.dataset.state == "DEVELOPMENT":
if self.raw_data_path is None:
raise InvalidArgumentError(
"Output raw data path must be specified."
)

# Normalize and validate raw_data_path
safe_root = config.raw_data_storage # Define a safe root directory in the config
normalized_path = str(Path(self.raw_data_path).resolve())
if not normalized_path.startswith(str(Path(safe_root).resolve())):
raise InvalidArgumentError(
f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}."
)

if os.path.isfile(normalized_path) or (
os.path.exists(normalized_path) and os.listdir(normalized_path)
):
raise InvalidArgumentError(
"Output raw data path must be an empty directory or not exist."
)

cli/medperf/web_ui/datasets/routes.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/datasets/routes.py b/cli/medperf/web_ui/datasets/routes.py
--- a/cli/medperf/web_ui/datasets/routes.py
+++ b/cli/medperf/web_ui/datasets/routes.py
@@ -434,3 +434,3 @@
     try:
-        ImportDataset.run(dataset_id, input_path, raw_dataset_path)
+        ImportDataset.run(dataset_id, input_path, raw_dataset_path or config.raw_data_storage)
         return_response["status"] = "success"
EOF
@@ -434,3 +434,3 @@
try:
ImportDataset.run(dataset_id, input_path, raw_dataset_path)
ImportDataset.run(dataset_id, input_path, raw_dataset_path or config.raw_data_storage)
return_response["status"] = "success"
Copilot is powered by AI and may make mistakes. Always verify output.
self.raw_data_path is None or os.path.exists(self.raw_data_path)
self.raw_data_path is None
or os.path.isfile(self.raw_data_path)
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

To fix the issue, we need to validate the raw_data_path to ensure it is within a safe root directory and does not allow directory traversal. This can be achieved by:

  1. Defining a safe root directory for raw_data_path.
  2. Normalizing the user-provided path using os.path.normpath or Path.resolve to eliminate any .. segments.
  3. Verifying that the normalized path starts with the safe root directory.

The changes will be made in the validate_input method of the ImportDataset class in cli/medperf/commands/dataset/import_dataset.py. Additionally, we will update the import_dataset function in cli/medperf/web_ui/datasets/routes.py to define a safe root directory for raw_data_path.


Suggested changeset 2
cli/medperf/commands/dataset/import_dataset.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/commands/dataset/import_dataset.py b/cli/medperf/commands/dataset/import_dataset.py
--- a/cli/medperf/commands/dataset/import_dataset.py
+++ b/cli/medperf/commands/dataset/import_dataset.py
@@ -32,10 +32,18 @@
         # raw_data_path should be provided if the imported dataset is in dev
-        if self.dataset.state == "DEVELOPMENT" and (
-            self.raw_data_path is None
-            or os.path.isfile(self.raw_data_path)
-            or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))
-        ):
-            raise InvalidArgumentError(
-                "Output raw data path must be specified and, the directory should be empty or does not exist."
-            )
+        if self.dataset.state == "DEVELOPMENT":
+            if self.raw_data_path is None:
+                raise InvalidArgumentError("Output raw data path must be specified.")
+
+            # Normalize and validate the raw_data_path
+            safe_root = config.safe_root  # Safe root directory defined in config
+            normalized_path = Path(self.raw_data_path).resolve()
+            if not str(normalized_path).startswith(str(Path(safe_root).resolve())):
+                raise InvalidArgumentError(
+                    f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}."
+                )
+
+            if normalized_path.is_file() or (normalized_path.exists() and any(normalized_path.iterdir())):
+                raise InvalidArgumentError(
+                    "Output raw data path must be an empty directory or not exist."
+                )
 
EOF
@@ -32,10 +32,18 @@
# raw_data_path should be provided if the imported dataset is in dev
if self.dataset.state == "DEVELOPMENT" and (
self.raw_data_path is None
or os.path.isfile(self.raw_data_path)
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))
):
raise InvalidArgumentError(
"Output raw data path must be specified and, the directory should be empty or does not exist."
)
if self.dataset.state == "DEVELOPMENT":
if self.raw_data_path is None:
raise InvalidArgumentError("Output raw data path must be specified.")

# Normalize and validate the raw_data_path
safe_root = config.safe_root # Safe root directory defined in config
normalized_path = Path(self.raw_data_path).resolve()
if not str(normalized_path).startswith(str(Path(safe_root).resolve())):
raise InvalidArgumentError(
f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}."
)

if normalized_path.is_file() or (normalized_path.exists() and any(normalized_path.iterdir())):
raise InvalidArgumentError(
"Output raw data path must be an empty directory or not exist."
)

cli/medperf/web_ui/datasets/routes.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/web_ui/datasets/routes.py b/cli/medperf/web_ui/datasets/routes.py
--- a/cli/medperf/web_ui/datasets/routes.py
+++ b/cli/medperf/web_ui/datasets/routes.py
@@ -434,3 +434,8 @@
     try:
-        ImportDataset.run(dataset_id, input_path, raw_dataset_path)
+        # Define a safe root directory for raw_dataset_path
+        safe_root = config.safe_root  # Safe root directory defined in config
+        if raw_dataset_path:
+            raw_dataset_path = str(Path(safe_root).joinpath(raw_dataset_path).resolve())
+
+        ImportDataset.run(dataset_id, input_path, raw_dataset_path)
         return_response["status"] = "success"
EOF
@@ -434,3 +434,8 @@
try:
ImportDataset.run(dataset_id, input_path, raw_dataset_path)
# Define a safe root directory for raw_dataset_path
safe_root = config.safe_root # Safe root directory defined in config
if raw_dataset_path:
raw_dataset_path = str(Path(safe_root).joinpath(raw_dataset_path).resolve())

ImportDataset.run(dataset_id, input_path, raw_dataset_path)
return_response["status"] = "success"
Copilot is powered by AI and may make mistakes. Always verify output.
self.raw_data_path is None or os.path.exists(self.raw_data_path)
self.raw_data_path is None
or os.path.isfile(self.raw_data_path)
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

To fix the issue, we need to validate the raw_data_path to ensure it is safe to use. This involves:

  1. Normalizing the path using os.path.normpath or Path.resolve to remove any .. segments or symbolic links.
  2. Ensuring the normalized path is contained within a predefined safe root directory (e.g., a specific directory for raw data).
  3. Raising an exception if the path is invalid or outside the allowed directory.

The changes will be made in the validate_input method of the ImportDataset class in cli/medperf/commands/dataset/import_dataset.py.


Suggested changeset 1
cli/medperf/commands/dataset/import_dataset.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/cli/medperf/commands/dataset/import_dataset.py b/cli/medperf/commands/dataset/import_dataset.py
--- a/cli/medperf/commands/dataset/import_dataset.py
+++ b/cli/medperf/commands/dataset/import_dataset.py
@@ -32,10 +32,23 @@
         # raw_data_path should be provided if the imported dataset is in dev
-        if self.dataset.state == "DEVELOPMENT" and (
-            self.raw_data_path is None
-            or os.path.isfile(self.raw_data_path)
-            or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))
-        ):
-            raise InvalidArgumentError(
-                "Output raw data path must be specified and, the directory should be empty or does not exist."
-            )
+        if self.dataset.state == "DEVELOPMENT":
+            if self.raw_data_path is None:
+                raise InvalidArgumentError(
+                    "Output raw data path must be specified."
+                )
+
+            # Normalize and validate the raw_data_path
+            safe_root = config.raw_data_storage  # Define a safe root directory
+            normalized_path = Path(self.raw_data_path).resolve()
+            if not str(normalized_path).startswith(str(Path(safe_root).resolve())):
+                raise InvalidArgumentError(
+                    f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}."
+                )
+
+            # Ensure the directory is empty or does not exist
+            if os.path.isfile(normalized_path) or (
+                os.path.exists(normalized_path) and os.listdir(normalized_path)
+            ):
+                raise InvalidArgumentError(
+                    "Output raw data path must be an empty directory or not exist."
+                )
 
EOF
@@ -32,10 +32,23 @@
# raw_data_path should be provided if the imported dataset is in dev
if self.dataset.state == "DEVELOPMENT" and (
self.raw_data_path is None
or os.path.isfile(self.raw_data_path)
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path))
):
raise InvalidArgumentError(
"Output raw data path must be specified and, the directory should be empty or does not exist."
)
if self.dataset.state == "DEVELOPMENT":
if self.raw_data_path is None:
raise InvalidArgumentError(
"Output raw data path must be specified."
)

# Normalize and validate the raw_data_path
safe_root = config.raw_data_storage # Define a safe root directory
normalized_path = Path(self.raw_data_path).resolve()
if not str(normalized_path).startswith(str(Path(safe_root).resolve())):
raise InvalidArgumentError(
f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}."
)

# Ensure the directory is empty or does not exist
if os.path.isfile(normalized_path) or (
os.path.exists(normalized_path) and os.listdir(normalized_path)
):
raise InvalidArgumentError(
"Output raw data path must be an empty directory or not exist."
)

Copilot is powered by AI and may make mistakes. Always verify output.
@mhmdk0
Copy link
Contributor

mhmdk0 commented May 28, 2025

Notes:
Should we do some modifications like:

  • Change notifications to job lists (running, finished, etc..)
  • Changes in tooltips and placeholders to match medperf --help (Place the text in separate file).
  • Refactor CSS as needed
  • model comp. test running multiple times causes an error (found existing predictions)
  • Notifications/Finished tasks logs (last 10 maybe) to be saved in database

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants