Skip to content

Improve CI reliability, attachment data handling, and safe string utilities#41

Merged
zeel-codder merged 9 commits intomainfrom
develop
Jan 7, 2026
Merged

Improve CI reliability, attachment data handling, and safe string utilities#41
zeel-codder merged 9 commits intomainfrom
develop

Conversation

@zeel-codder
Copy link
Copy Markdown
Contributor

Description

This pull request introduces several improvements and fixes across CI/CD workflows and the email handling logic. The main themes are workflow reliability, attachment data accuracy, and safer string handling.

CI/CD Workflow Improvements:

  • Changed the runs-on setting from self-hosted to ubuntu-latest in both .github/workflows/build-test.yml and .github/workflows/linters.yml to improve reliability and portability of CI jobs. [1] [2]
  • Added explicit permissions for contents: read in .github/workflows/build-test.yml to ensure proper access control for workflows.

Email Attachment Handling:

  • Introduced the get_attachments_data function in frappe_gmail_thread/api/activity.py to fetch the latest file_url for each attachment from the database, ensuring attachment links are always up-to-date.
  • Updated the get_linked_gmail_threads function to use the new get_attachments_data method for more reliable attachment information.
  • Removed the direct inclusion of file_url in the attachments data within process_attachments to avoid stale links and rely on dynamic retrieval instead.

String Handling Improvements:

  • Refined the safe_str function in frappe_gmail_thread/utils/helpers.py to handle None, bytes, and string types more robustly, preventing encoding issues when saving email data.

Pre-commit Configuration:

  • Changed the default_stages in .pre-commit-config.yaml from [commit] to [pre-commit] for more standard pre-commit hook behavior.

Relevant Technical Choices

Testing Instructions

Additional Information:

Screenshot/Screencast

Checklist

  • I have carefully reviewed the code before submitting it for review.
  • This code is adequately covered by unit tests to validate its functionality.
  • I have conducted thorough testing to ensure it functions as intended.
  • A member of the QA team has reviewed and tested this PR (To be checked by QA or code reviewer)

Fixes #

shivamsn97 and others added 9 commits December 29, 2025 13:34
- Change "runs-on" to "ubuntu-latest" in build-test.yml
- Change "runs-on" to "ubuntu-latest" in linters.yml

Signed-off-by: xieyt <alokmilenium@gmail.com>
…n permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
ci: Update runner type for workflows
Enhance attachment handling by retrieving latest file URLs in get_attachments_data function
Fix surrogate handling in safe_str function to ensure proper string encoding and decoding
@zeel-codder zeel-codder merged commit e9bf4ff into main Jan 7, 2026
13 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request improves CI/CD reliability, email attachment handling, and string processing utilities. The changes migrate workflows from self-hosted runners to GitHub-hosted ubuntu-latest runners, introduce dynamic attachment URL retrieval to prevent stale links, and refine string encoding/decoding logic.

Key Changes:

  • Migrated CI/CD workflows from self-hosted to ubuntu-latest runners with explicit contents: read permissions for better reliability
  • Implemented dynamic attachment file_url retrieval from the database instead of storing URLs directly in attachment data
  • Enhanced safe_str function to handle None, bytes, and string types more explicitly

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
.github/workflows/build-test.yml Updated runner from self-hosted to ubuntu-latest and added explicit contents: read permissions
.github/workflows/linters.yml Changed runner from self-hosted to ubuntu-latest for improved portability
.pre-commit-config.yaml Corrected default_stages from non-standard [commit] to standard [pre-commit]
frappe_gmail_thread/api/activity.py Added get_attachments_data function to fetch current file_url values dynamically from the database
frappe_gmail_thread/utils/helpers.py Removed file_url from stored attachment data and refined safe_str function with explicit type handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



def get_attachments_data(email):
attachments_data = json.loads(email.attachments_data)
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_attachments_data function doesn't handle the case where email.attachments_data is None or an empty string. If an email has no attachments, json.loads() will fail with a TypeError or JSONDecodeError. Consider adding a check to return an empty list when attachments_data is falsy.

Suggested change
attachments_data = json.loads(email.attachments_data)
attachments_data_raw = getattr(email, "attachments_data", None)
if not attachments_data_raw:
return []
try:
attachments_data = json.loads(attachments_data_raw)
except (TypeError, json.JSONDecodeError):
return []

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +15
for attachment in attachments_data:
file_doc_name = attachment.get("file_doc_name")
if file_doc_name:
file_url = frappe.db.get_value("File", file_doc_name, "file_url")
attachment["file_url"] = file_url
return attachments_data
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a File document is deleted or doesn't exist, frappe.db.get_value() will return None, but the code doesn't handle this case. The attachment will have file_url set to None, which could cause issues in the frontend or when attempting to access the attachment. Consider either filtering out attachments with missing files or providing a fallback value.

Suggested change
for attachment in attachments_data:
file_doc_name = attachment.get("file_doc_name")
if file_doc_name:
file_url = frappe.db.get_value("File", file_doc_name, "file_url")
attachment["file_url"] = file_url
return attachments_data
filtered_attachments = []
for attachment in attachments_data:
file_doc_name = attachment.get("file_doc_name")
if not file_doc_name:
# Skip attachments without a linked File document
continue
file_url = frappe.db.get_value("File", file_doc_name, "file_url")
if not file_url:
# File document is missing or has no URL; skip this attachment
continue
attachment["file_url"] = file_url
filtered_attachments.append(attachment)
return filtered_attachments

Copilot uses AI. Check for mistakes.
Comment on lines 134 to +135
if isinstance(val, str):
return val.encode("utf-8", "surrogateescape").decode("utf-8", "replace")
return val
return val.encode("utf-8", errors="replace").decode("utf-8")
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding/decoding operation on strings using errors="replace" may introduce replacement characters (�) for valid unicode strings that contain surrogate pairs or certain special characters. The original implementation with "surrogateescape" was specifically designed to handle surrogates properly. The new approach could corrupt valid unicode data by replacing characters unnecessarily.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +14
# instead of using file_url from attachments_data, we have to use frappe.get_value to get the latest file_url
for attachment in attachments_data:
file_doc_name = attachment.get("file_doc_name")
if file_doc_name:
file_url = frappe.db.get_value("File", file_doc_name, "file_url")
attachment["file_url"] = file_url
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function makes individual database queries for each attachment's file_url within a loop. When emails have multiple attachments and multiple emails are being processed in get_linked_gmail_threads, this creates an N+1 query problem. Consider using frappe.db.get_all() or frappe.get_list() to batch fetch all file URLs in a single query for better performance.

Suggested change
# instead of using file_url from attachments_data, we have to use frappe.get_value to get the latest file_url
for attachment in attachments_data:
file_doc_name = attachment.get("file_doc_name")
if file_doc_name:
file_url = frappe.db.get_value("File", file_doc_name, "file_url")
attachment["file_url"] = file_url
# instead of using file_url from attachments_data, we have to use the latest file_url from the File doc
# Collect all file_doc_names first to avoid N+1 queries
file_doc_names = [
attachment.get("file_doc_name")
for attachment in attachments_data
if attachment.get("file_doc_name")
]
file_url_by_name = {}
if file_doc_names:
# Batch fetch all file URLs in a single query
files = frappe.db.get_all(
"File",
filters={"name": ["in", file_doc_names]},
fields=["name", "file_url"],
)
file_url_by_name = {f.name: f.file_url for f in files}
for attachment in attachments_data:
file_doc_name = attachment.get("file_doc_name")
if file_doc_name:
# Use the latest file_url from the File doc; default to None if not found
attachment["file_url"] = file_url_by_name.get(file_doc_name)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants