Improve CI reliability, attachment data handling, and safe string utilities#41
Improve CI reliability, attachment data handling, and safe string utilities#41zeel-codder merged 9 commits intomainfrom
Conversation
…achments_data function
…ncoding and decoding
- Change "runs-on" to "ubuntu-latest" in build-test.yml - Change "runs-on" to "ubuntu-latest" in linters.yml Signed-off-by: xieyt <alokmilenium@gmail.com>
…n permissions Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
ci: Update runner type for workflows
…fix/s3-attachments
Enhance attachment handling by retrieving latest file URLs in get_attachments_data function
…fix/surrogate_escape
Fix surrogate handling in safe_str function to ensure proper string encoding and decoding
There was a problem hiding this comment.
Pull request overview
This pull request improves CI/CD reliability, email attachment handling, and string processing utilities. The changes migrate workflows from self-hosted runners to GitHub-hosted ubuntu-latest runners, introduce dynamic attachment URL retrieval to prevent stale links, and refine string encoding/decoding logic.
Key Changes:
- Migrated CI/CD workflows from
self-hostedtoubuntu-latestrunners with explicitcontents: readpermissions for better reliability - Implemented dynamic attachment
file_urlretrieval from the database instead of storing URLs directly in attachment data - Enhanced
safe_strfunction to handleNone, bytes, and string types more explicitly
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/build-test.yml |
Updated runner from self-hosted to ubuntu-latest and added explicit contents: read permissions |
.github/workflows/linters.yml |
Changed runner from self-hosted to ubuntu-latest for improved portability |
.pre-commit-config.yaml |
Corrected default_stages from non-standard [commit] to standard [pre-commit] |
frappe_gmail_thread/api/activity.py |
Added get_attachments_data function to fetch current file_url values dynamically from the database |
frappe_gmail_thread/utils/helpers.py |
Removed file_url from stored attachment data and refined safe_str function with explicit type handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| def get_attachments_data(email): | ||
| attachments_data = json.loads(email.attachments_data) |
There was a problem hiding this comment.
The get_attachments_data function doesn't handle the case where email.attachments_data is None or an empty string. If an email has no attachments, json.loads() will fail with a TypeError or JSONDecodeError. Consider adding a check to return an empty list when attachments_data is falsy.
| attachments_data = json.loads(email.attachments_data) | |
| attachments_data_raw = getattr(email, "attachments_data", None) | |
| if not attachments_data_raw: | |
| return [] | |
| try: | |
| attachments_data = json.loads(attachments_data_raw) | |
| except (TypeError, json.JSONDecodeError): | |
| return [] |
| for attachment in attachments_data: | ||
| file_doc_name = attachment.get("file_doc_name") | ||
| if file_doc_name: | ||
| file_url = frappe.db.get_value("File", file_doc_name, "file_url") | ||
| attachment["file_url"] = file_url | ||
| return attachments_data |
There was a problem hiding this comment.
When a File document is deleted or doesn't exist, frappe.db.get_value() will return None, but the code doesn't handle this case. The attachment will have file_url set to None, which could cause issues in the frontend or when attempting to access the attachment. Consider either filtering out attachments with missing files or providing a fallback value.
| for attachment in attachments_data: | |
| file_doc_name = attachment.get("file_doc_name") | |
| if file_doc_name: | |
| file_url = frappe.db.get_value("File", file_doc_name, "file_url") | |
| attachment["file_url"] = file_url | |
| return attachments_data | |
| filtered_attachments = [] | |
| for attachment in attachments_data: | |
| file_doc_name = attachment.get("file_doc_name") | |
| if not file_doc_name: | |
| # Skip attachments without a linked File document | |
| continue | |
| file_url = frappe.db.get_value("File", file_doc_name, "file_url") | |
| if not file_url: | |
| # File document is missing or has no URL; skip this attachment | |
| continue | |
| attachment["file_url"] = file_url | |
| filtered_attachments.append(attachment) | |
| return filtered_attachments |
| if isinstance(val, str): | ||
| return val.encode("utf-8", "surrogateescape").decode("utf-8", "replace") | ||
| return val | ||
| return val.encode("utf-8", errors="replace").decode("utf-8") |
There was a problem hiding this comment.
The encoding/decoding operation on strings using errors="replace" may introduce replacement characters (�) for valid unicode strings that contain surrogate pairs or certain special characters. The original implementation with "surrogateescape" was specifically designed to handle surrogates properly. The new approach could corrupt valid unicode data by replacing characters unnecessarily.
| # instead of using file_url from attachments_data, we have to use frappe.get_value to get the latest file_url | ||
| for attachment in attachments_data: | ||
| file_doc_name = attachment.get("file_doc_name") | ||
| if file_doc_name: | ||
| file_url = frappe.db.get_value("File", file_doc_name, "file_url") | ||
| attachment["file_url"] = file_url |
There was a problem hiding this comment.
The function makes individual database queries for each attachment's file_url within a loop. When emails have multiple attachments and multiple emails are being processed in get_linked_gmail_threads, this creates an N+1 query problem. Consider using frappe.db.get_all() or frappe.get_list() to batch fetch all file URLs in a single query for better performance.
| # instead of using file_url from attachments_data, we have to use frappe.get_value to get the latest file_url | |
| for attachment in attachments_data: | |
| file_doc_name = attachment.get("file_doc_name") | |
| if file_doc_name: | |
| file_url = frappe.db.get_value("File", file_doc_name, "file_url") | |
| attachment["file_url"] = file_url | |
| # instead of using file_url from attachments_data, we have to use the latest file_url from the File doc | |
| # Collect all file_doc_names first to avoid N+1 queries | |
| file_doc_names = [ | |
| attachment.get("file_doc_name") | |
| for attachment in attachments_data | |
| if attachment.get("file_doc_name") | |
| ] | |
| file_url_by_name = {} | |
| if file_doc_names: | |
| # Batch fetch all file URLs in a single query | |
| files = frappe.db.get_all( | |
| "File", | |
| filters={"name": ["in", file_doc_names]}, | |
| fields=["name", "file_url"], | |
| ) | |
| file_url_by_name = {f.name: f.file_url for f in files} | |
| for attachment in attachments_data: | |
| file_doc_name = attachment.get("file_doc_name") | |
| if file_doc_name: | |
| # Use the latest file_url from the File doc; default to None if not found | |
| attachment["file_url"] = file_url_by_name.get(file_doc_name) |
Description
This pull request introduces several improvements and fixes across CI/CD workflows and the email handling logic. The main themes are workflow reliability, attachment data accuracy, and safer string handling.
CI/CD Workflow Improvements:
runs-onsetting fromself-hostedtoubuntu-latestin both.github/workflows/build-test.ymland.github/workflows/linters.ymlto improve reliability and portability of CI jobs. [1] [2]permissionsforcontents: readin.github/workflows/build-test.ymlto ensure proper access control for workflows.Email Attachment Handling:
get_attachments_datafunction infrappe_gmail_thread/api/activity.pyto fetch the latestfile_urlfor each attachment from the database, ensuring attachment links are always up-to-date.get_linked_gmail_threadsfunction to use the newget_attachments_datamethod for more reliable attachment information.file_urlin the attachments data withinprocess_attachmentsto avoid stale links and rely on dynamic retrieval instead.String Handling Improvements:
safe_strfunction infrappe_gmail_thread/utils/helpers.pyto handleNone, bytes, and string types more robustly, preventing encoding issues when saving email data.Pre-commit Configuration:
default_stagesin.pre-commit-config.yamlfrom[commit]to[pre-commit]for more standard pre-commit hook behavior.Relevant Technical Choices
Testing Instructions
Additional Information:
Screenshot/Screencast
Checklist
Fixes #