-
Notifications
You must be signed in to change notification settings - Fork 36
Web UI #634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…d add placeholders + tooltips to forms.
# List directories inside the path and sort them | ||
sorted_folders = [] | ||
sorted_files = [] | ||
for item in os.listdir(full_path): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
To fix the issue, we will enhance the validation of the path
parameter to ensure it does not allow directory traversal attacks. Specifically:
- Normalize the
path
usingos.path.normpath
to remove any..
segments. - Verify that the normalized
full_path
starts withBASE_DIR
to ensure it is contained within the base directory. - Raise an exception if the validation fails.
This approach ensures that even if a malicious user provides a crafted input, the resulting path will not escape the intended base directory.
-
Copy modified lines R22-R30
@@ -21,10 +21,11 @@ | ||
): | ||
full_path = os.path.abspath(os.path.join(BASE_DIR, path)) | ||
|
||
if not os.path.exists(full_path) or not os.path.isdir(full_path): | ||
raise HTTPException(status_code=404, detail="Directory not found") | ||
|
||
# Ensure path is within the base directory | ||
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR: | ||
raise HTTPException(status_code=403, detail="Access denied") | ||
# Normalize the path to prevent directory traversal | ||
normalized_path = os.path.normpath(path) | ||
full_path = os.path.abspath(os.path.join(BASE_DIR, normalized_path)) | ||
|
||
if not full_path.startswith(BASE_DIR): | ||
raise HTTPException(status_code=403, detail="Access denied") | ||
|
||
if not os.path.exists(full_path) or not os.path.isdir(full_path): | ||
raise HTTPException(status_code=404, detail="Directory not found") | ||
|
sorted_files = [] | ||
for item in os.listdir(full_path): | ||
item_path = os.path.join(full_path, item) | ||
if os.path.isdir(item_path): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
To fix the issue, we need to ensure that the full_path
is normalized before performing the containment check against BASE_DIR
. This can be achieved by using os.path.normpath
or os.path.realpath
on the full_path
before comparing it with BASE_DIR
. This normalization step will resolve any ..
sequences or symbolic links in the path, ensuring that the containment check is robust.
Additionally, we should ensure that the full_path
is not only within the BASE_DIR
but also a valid directory before proceeding with further operations.
-
Copy modified lines R22-R29
@@ -21,10 +21,10 @@ | ||
): | ||
full_path = os.path.abspath(os.path.join(BASE_DIR, path)) | ||
|
||
if not os.path.exists(full_path) or not os.path.isdir(full_path): | ||
raise HTTPException(status_code=404, detail="Directory not found") | ||
|
||
# Ensure path is within the base directory | ||
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR: | ||
raise HTTPException(status_code=403, detail="Access denied") | ||
full_path = os.path.realpath(os.path.join(BASE_DIR, path)) | ||
|
||
# Ensure path is within the base directory | ||
if not full_path.startswith(BASE_DIR): | ||
raise HTTPException(status_code=403, detail="Access denied") | ||
|
||
if not os.path.exists(full_path) or not os.path.isdir(full_path): | ||
raise HTTPException(status_code=404, detail="Directory not found") | ||
|
folders = [] | ||
for item in sorted_items: | ||
item_path = os.path.join(full_path, item) | ||
if os.path.isdir(item_path): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
To fix the issue, we need to ensure that the path
parameter is properly sanitized and validated before being used to construct file paths. This involves normalizing the path using os.path.normpath
and ensuring that the resulting path is strictly within the BASE_DIR
. Additionally, we should validate item_path
to ensure it does not point outside the allowed directory structure.
Steps to fix:
- Normalize
path
usingos.path.normpath
before constructingfull_path
. - Revalidate that the normalized
full_path
starts withBASE_DIR
after normalization. - Ensure that
item_path
is also validated to prevent any potential misuse.
-
Copy modified lines R22-R30 -
Copy modified lines R50-R54
@@ -21,10 +21,11 @@ | ||
): | ||
full_path = os.path.abspath(os.path.join(BASE_DIR, path)) | ||
|
||
if not os.path.exists(full_path) or not os.path.isdir(full_path): | ||
raise HTTPException(status_code=404, detail="Directory not found") | ||
|
||
# Ensure path is within the base directory | ||
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR: | ||
raise HTTPException(status_code=403, detail="Access denied") | ||
normalized_path = os.path.normpath(path) | ||
full_path = os.path.abspath(os.path.join(BASE_DIR, normalized_path)) | ||
|
||
if not os.path.exists(full_path) or not os.path.isdir(full_path): | ||
raise HTTPException(status_code=404, detail="Directory not found") | ||
|
||
# Ensure path is within the base directory | ||
if not os.path.commonpath([BASE_DIR, full_path]) == BASE_DIR: | ||
raise HTTPException(status_code=403, detail="Access denied") | ||
|
||
@@ -48,4 +49,7 @@ | ||
for item in sorted_items: | ||
item_path = os.path.join(full_path, item) | ||
if os.path.isdir(item_path): | ||
item_path = os.path.join(full_path, item) | ||
# Ensure item_path is within full_path | ||
if not os.path.commonpath([full_path, item_path]) == full_path: | ||
continue | ||
if os.path.isdir(item_path): | ||
folders.append({"name": item, "path": item_path, "type": "dir"}) |
# Check if user is already authenticated | ||
if token == security_token: | ||
# User is already authenticated, redirect to original URL | ||
return RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) |
Check warning
Code scanning / CodeQL
URL redirection from remote source Medium
user-provided value
redirect_url: str = Form("/"), | ||
): | ||
if token == security_token: | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) |
Check warning
Code scanning / CodeQL
URL redirection from remote source Medium
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
To fix the issue, we need to validate the redirect_url
parameter before using it in the RedirectResponse
. A safe approach is to ensure that the redirect_url
is either a relative URL or matches a predefined list of allowed URLs. This can be achieved using Python's urlparse
module to check that the redirect_url
does not include an explicit host name or scheme, ensuring it is a relative path. If the validation fails, the application should redirect to a default safe URL (e.g., the home page).
The changes will be made in the access_web_ui
function where the redirect_url
is used. We will also add a utility function to perform the validation.
-
Copy modified lines R6-R12 -
Copy modified lines R38-R43
@@ -5,4 +5,9 @@ | ||
from medperf.web_ui.common import templates, api_key_cookie | ||
|
||
router = APIRouter() | ||
from urllib.parse import urlparse | ||
|
||
def is_safe_redirect_url(url: str) -> bool: | ||
"""Validate that the URL is a relative path or matches allowed hosts.""" | ||
url = url.replace("\\", "") # Normalize backslashes | ||
parsed = urlparse(url) | ||
return not parsed.netloc and not parsed.scheme | ||
|
||
@@ -32,6 +37,8 @@ | ||
): | ||
if token == security_token: | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=token) | ||
return response | ||
if token == security_token: | ||
if not is_safe_redirect_url(redirect_url): | ||
redirect_url = "/" # Default to home page if validation fails | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=token) | ||
return response | ||
else: |
): | ||
if token == security_token: | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=token) |
Check warning
Code scanning / CodeQL
Failure to use secure cookies Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
To fix the issue, the set_cookie
method should explicitly set the secure
, httponly
, and samesite
attributes. Specifically:
secure=True
ensures the cookie is only sent over HTTPS.httponly=True
prevents JavaScript from accessing the cookie.samesite='Lax'
orsamesite='Strict'
mitigates CSRF risks by restricting cross-origin requests.
The fix involves modifying the set_cookie
call on line 35 to include these attributes.
-
Copy modified line R35
@@ -34,3 +34,3 @@ | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=token) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=token, secure=True, httponly=True, samesite='Lax') | ||
return response |
): | ||
if token == security_token: | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=token) |
Check warning
Code scanning / CodeQL
Construction of a cookie using user-supplied input Medium
user-supplied input
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
To fix the issue, we need to ensure that the token
value is sanitized or validated before being used to construct the cookie. A secure approach would involve:
- Validating the
token
against a predefined set of acceptable values or formats. - Ensuring that the
token
does not contain any malicious or unexpected content. - Optionally, encoding the
token
to prevent injection attacks.
In this case, we will validate the token
by ensuring it matches the security_token
and then use a secure, predefined value (e.g., security_token
) to set the cookie instead of the raw user input.
-
Copy modified lines R33-R36
@@ -32,6 +32,6 @@ | ||
): | ||
if token == security_token: | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=token) | ||
return response | ||
if token == security_token: | ||
response = RedirectResponse(url=redirect_url, status_code=status.HTTP_302_FOUND) | ||
response.set_cookie(key=AUTH_COOKIE_NAME, value=security_token) | ||
return response | ||
else: |
|
||
# raw_data_path should be provided if the imported dataset is in dev | ||
if self.dataset.state == "DEVELOPMENT" and ( | ||
self.raw_data_path is None or os.path.exists(self.raw_data_path) | ||
self.raw_data_path is None | ||
or os.path.isfile(self.raw_data_path) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
To fix the issue, we need to validate the raw_data_path
to ensure it is within a predefined safe root directory. This can be achieved by:
- Normalizing the
raw_data_path
usingos.path.realpath
orPath.resolve
to remove any..
segments or symbolic links. - Verifying that the normalized path starts with a predefined safe root directory (e.g., a directory dedicated to storing raw data).
- Raising an exception if the validation fails.
The validation should be added in the validate_input
method of the ImportDataset
class, as this is where the input parameters are initially checked.
-
Copy modified lines R33-R52
@@ -32,10 +32,22 @@ | ||
# raw_data_path should be provided if the imported dataset is in dev | ||
if self.dataset.state == "DEVELOPMENT" and ( | ||
self.raw_data_path is None | ||
or os.path.isfile(self.raw_data_path) | ||
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path)) | ||
): | ||
raise InvalidArgumentError( | ||
"Output raw data path must be specified and, the directory should be empty or does not exist." | ||
) | ||
if self.dataset.state == "DEVELOPMENT": | ||
if self.raw_data_path is None: | ||
raise InvalidArgumentError( | ||
"Output raw data path must be specified." | ||
) | ||
|
||
# Normalize and validate raw_data_path | ||
safe_root = config.raw_data_storage # Define a safe root directory in the config | ||
normalized_path = str(Path(self.raw_data_path).resolve()) | ||
if not normalized_path.startswith(str(Path(safe_root).resolve())): | ||
raise InvalidArgumentError( | ||
f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}." | ||
) | ||
|
||
if os.path.isfile(normalized_path) or ( | ||
os.path.exists(normalized_path) and os.listdir(normalized_path) | ||
): | ||
raise InvalidArgumentError( | ||
"Output raw data path must be an empty directory or not exist." | ||
) | ||
|
-
Copy modified line R435
@@ -434,3 +434,3 @@ | ||
try: | ||
ImportDataset.run(dataset_id, input_path, raw_dataset_path) | ||
ImportDataset.run(dataset_id, input_path, raw_dataset_path or config.raw_data_storage) | ||
return_response["status"] = "success" |
self.raw_data_path is None or os.path.exists(self.raw_data_path) | ||
self.raw_data_path is None | ||
or os.path.isfile(self.raw_data_path) | ||
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path)) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
To fix the issue, we need to validate the raw_data_path
to ensure it is within a safe root directory and does not allow directory traversal. This can be achieved by:
- Defining a safe root directory for
raw_data_path
. - Normalizing the user-provided path using
os.path.normpath
orPath.resolve
to eliminate any..
segments. - Verifying that the normalized path starts with the safe root directory.
The changes will be made in the validate_input
method of the ImportDataset
class in cli/medperf/commands/dataset/import_dataset.py
. Additionally, we will update the import_dataset
function in cli/medperf/web_ui/datasets/routes.py
to define a safe root directory for raw_data_path
.
-
Copy modified lines R33-R48
@@ -32,10 +32,18 @@ | ||
# raw_data_path should be provided if the imported dataset is in dev | ||
if self.dataset.state == "DEVELOPMENT" and ( | ||
self.raw_data_path is None | ||
or os.path.isfile(self.raw_data_path) | ||
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path)) | ||
): | ||
raise InvalidArgumentError( | ||
"Output raw data path must be specified and, the directory should be empty or does not exist." | ||
) | ||
if self.dataset.state == "DEVELOPMENT": | ||
if self.raw_data_path is None: | ||
raise InvalidArgumentError("Output raw data path must be specified.") | ||
|
||
# Normalize and validate the raw_data_path | ||
safe_root = config.safe_root # Safe root directory defined in config | ||
normalized_path = Path(self.raw_data_path).resolve() | ||
if not str(normalized_path).startswith(str(Path(safe_root).resolve())): | ||
raise InvalidArgumentError( | ||
f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}." | ||
) | ||
|
||
if normalized_path.is_file() or (normalized_path.exists() and any(normalized_path.iterdir())): | ||
raise InvalidArgumentError( | ||
"Output raw data path must be an empty directory or not exist." | ||
) | ||
|
-
Copy modified lines R435-R440
@@ -434,3 +434,8 @@ | ||
try: | ||
ImportDataset.run(dataset_id, input_path, raw_dataset_path) | ||
# Define a safe root directory for raw_dataset_path | ||
safe_root = config.safe_root # Safe root directory defined in config | ||
if raw_dataset_path: | ||
raw_dataset_path = str(Path(safe_root).joinpath(raw_dataset_path).resolve()) | ||
|
||
ImportDataset.run(dataset_id, input_path, raw_dataset_path) | ||
return_response["status"] = "success" |
self.raw_data_path is None or os.path.exists(self.raw_data_path) | ||
self.raw_data_path is None | ||
or os.path.isfile(self.raw_data_path) | ||
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path)) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
To fix the issue, we need to validate the raw_data_path
to ensure it is safe to use. This involves:
- Normalizing the path using
os.path.normpath
orPath.resolve
to remove any..
segments or symbolic links. - Ensuring the normalized path is contained within a predefined safe root directory (e.g., a specific directory for raw data).
- Raising an exception if the path is invalid or outside the allowed directory.
The changes will be made in the validate_input
method of the ImportDataset
class in cli/medperf/commands/dataset/import_dataset.py
.
-
Copy modified lines R33-R53
@@ -32,10 +32,23 @@ | ||
# raw_data_path should be provided if the imported dataset is in dev | ||
if self.dataset.state == "DEVELOPMENT" and ( | ||
self.raw_data_path is None | ||
or os.path.isfile(self.raw_data_path) | ||
or (os.path.exists(self.raw_data_path) and os.listdir(self.raw_data_path)) | ||
): | ||
raise InvalidArgumentError( | ||
"Output raw data path must be specified and, the directory should be empty or does not exist." | ||
) | ||
if self.dataset.state == "DEVELOPMENT": | ||
if self.raw_data_path is None: | ||
raise InvalidArgumentError( | ||
"Output raw data path must be specified." | ||
) | ||
|
||
# Normalize and validate the raw_data_path | ||
safe_root = config.raw_data_storage # Define a safe root directory | ||
normalized_path = Path(self.raw_data_path).resolve() | ||
if not str(normalized_path).startswith(str(Path(safe_root).resolve())): | ||
raise InvalidArgumentError( | ||
f"Invalid raw data path: {self.raw_data_path}. Path must be within {safe_root}." | ||
) | ||
|
||
# Ensure the directory is empty or does not exist | ||
if os.path.isfile(normalized_path) or ( | ||
os.path.exists(normalized_path) and os.listdir(normalized_path) | ||
): | ||
raise InvalidArgumentError( | ||
"Output raw data path must be an empty directory or not exist." | ||
) | ||
|
Notes:
|
Comments/discussions to check:
To be merged before merging to main: #618, which depends on #615