Skip to content

Fix google calendar and notion errors#768

Open
manojag115 wants to merge 2 commits intoMODSetter:mainfrom
manojag115:bugs_prod
Open

Fix google calendar and notion errors#768
manojag115 wants to merge 2 commits intoMODSetter:mainfrom
manojag115:bugs_prod

Conversation

@manojag115
Copy link
Contributor

@manojag115 manojag115 commented Feb 2, 2026

Description

Fixes 2 production errors affecting connector sync operations and adds debug logging to help diagnose remaining issues.

  • Google Calendar date range error - Fixed "start_date must be strictly before end_date" errors by auto-adjusting end_date when it equals start_date (happens when last_indexed_at is today)
  • Notion API limitation handling - Improved error messages for unsupported block types (transcription, ai_block) to clarify these are known Notion API limitations, not application errors
  • Webcrawler connector - Added detailed logging for "No URLs provided for indexing" errors, including connector name, config keys, and INITIAL_URLS raw value
  • Greenlet/SQLAlchemy errors - Added _handle_greenlet_error() helper to Celery tasks with specific detection and logging for async/sync context issues

Motivation and Context

FIX #
The issues being fixed or adding debug logs for are:

  • Sync failed: No URLs provided for indexing
  • Sync failed: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)
  • Sync failed: Failed to get Notion pages: Block type transcription is not supported via the API.
  • Sync failed: Failed to get Google Calendar events: start_date (2026-01-21T00:00:00+00:00) must be strictly before end_date (2026-01-21T00:00:00+00:00).

Screenshots

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification
    Was able to verify google calendar fix locally works. Notion one is expected, so just basic error handling

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR fixes production errors affecting connector sync operations and enhances debugging capabilities. It resolves a Google Calendar date range validation error that occurred when start_date equals end_date by automatically adjusting the end date to be one day later. For Notion, it improves error handling by distinguishing unsupported block types (transcription, ai_block) as known API limitations rather than application errors. Additionally, the PR adds comprehensive debug logging for webcrawler connector URL issues and introduces a helper function to detect and log SQLAlchemy greenlet errors with detailed context for easier troubleshooting.

⏱️ Estimated Review Time: 15-30 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/app/tasks/connector_indexers/base.py
2 surfsense_backend/app/tasks/connector_indexers/google_calendar_indexer.py
3 surfsense_backend/app/tasks/connector_indexers/notion_indexer.py
4 surfsense_backend/app/tasks/connector_indexers/webcrawler_indexer.py
5 surfsense_backend/app/tasks/celery_tasks/connector_tasks.py

Need help? Join our Discord

Analyze latest changes

@vercel
Copy link

vercel bot commented Feb 2, 2026

@manojag115 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 6c94ffe..48e6466

  Severity     Location     Issue     Delete  
High surfsense_backend/app/tasks/connector_indexers/base.py:171 Unhandled datetime parsing error
High surfsense_backend/app/tasks/connector_indexers/google_calendar_indexer.py:229 Unhandled datetime parsing error
✅ Files analyzed, no issues (3)

surfsense_backend/app/tasks/celery_tasks/connector_tasks.py
surfsense_backend/app/tasks/connector_indexers/notion_indexer.py
surfsense_backend/app/tasks/connector_indexers/webcrawler_indexer.py

"adjusting end date to next day to ensure valid date range"
)
# Parse end_date and add 1 day
end_dt = datetime.strptime(end_date_str, "%Y-%m-%d")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical runtime error: datetime.strptime() will crash with ValueError if end_date_str is not in 'YYYY-MM-DD' format. This occurs when both start_date and end_date are provided by the user in a non-standard format (lines 122-123 return them as-is without validation). If these equal dates are in ISO format (e.g., '2024-01-21T00:00:00+00:00') or any other format, the code will crash when trying to parse them at line 171.

The flow: User provides both dates → Line 123 returns them without validation → Lines 165-171 detect they're equal and try to parse → ValueError crash if format is not 'YYYY-MM-DD'.

Fix: Add try-except around datetime.strptime() or validate/normalize date format before this check.


React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manojag115 , can you check this ?

"adjusting end date to next day to ensure valid date range"
)
# Parse end_date and add 1 day
end_dt = datetime.strptime(end_date_str, "%Y-%m-%d")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical runtime error: datetime.strptime() will crash with ValueError if end_date_str is not in 'YYYY-MM-DD' format. This Google Calendar indexer has its own date calculation logic (lines 179-218) that returns user-provided dates directly at line 217-218 without format validation. If a user provides dates in ISO format (e.g., '2024-01-21T00:00:00+00:00') or any other non-standard format, and they happen to be equal, the code will crash at line 229 when trying to parse with the strict 'YYYY-MM-DD' format.

The flow: User provides both dates in non-standard format → Line 218 assigns them as-is → Line 223 detects they're equal → Line 229 tries to parse → ValueError crash.

Fix: Add try-except around datetime.strptime() or validate/normalize date format before this check.


React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

@AnishSarkar22
Copy link
Contributor

AnishSarkar22 commented Feb 2, 2026

@manojag115 If a notion page contains transcription/ai blocks, will it still skip those blocks and index the other available blocks?

The google calendar fix should already work, can you please check search_source_connectors_routes.py starting from line 742. I might be wrong, please let me know.

@manojag115
Copy link
Contributor Author

@manojag115 If a notion page contains transcription/ai blocks, will it still skip those blocks and index the other available blocks?

The google calendar fix should already work, can you please check search_source_connectors_routes.py starting from line 742. I might be wrong, please let me know.

@AnishSarkar22 Good catch on the Google Calendar fix! You're right. it's already handled in search_source_connectors_routes for the manual sync path. My changes to base.py and google_calendar_indexer.py are redundant for route-triggered syncs, but provide defense for periodic/scheduled syncs that may call the indexer directly via calculate_date_range. Happy to remove them if you prefer to keep it DRY.

For Notion: Yes, it does skip unsupported blocks and continue indexing the rest of the page. The handling is in notion_history.py:

  • Known block types (transcription, ai_block) are detected and replaced with a placeholder, then processing continues
  • API errors about unsupported blocks are caught and the page continues with available blocks

The errors we're seeing in prod are likely edge cases where the unsupported block is at the page root and Notion fails the entire blocks.children.list call before returning any blocks. My change just improves the error message clarity to indicate it's a known Notion API limitation, not an application bug.

@AnishSarkar22
Copy link
Contributor

AnishSarkar22 commented Feb 2, 2026

@AnishSarkar22 Good catch on the Google Calendar fix! You're right. it's already handled in search_source_connectors_routes for the manual sync path. My changes to base.py and google_calendar_indexer.py are redundant for route-triggered syncs, but provide defense for periodic/scheduled syncs that may call the indexer directly via calculate_date_range. Happy to remove them if you prefer to keep it DRY.

@manojag115 I think google calendar fix can be removed, it is redundant.

For Notion: Yes, it does skip unsupported blocks and continue indexing the rest of the page. The handling is in notion_history.py:

  • Known block types (transcription, ai_block) are detected and replaced with a placeholder, then processing continues
  • API errors about unsupported blocks are caught and the page continues with available blocks

The errors we're seeing in prod are likely edge cases where the unsupported block is at the page root and Notion fails the entire blocks.children.list call before returning any blocks. My change just improves the error message clarity to indicate it's a known Notion API limitation, not an application bug.

Yeah its better to show improved message for better clarity to the user.

Thank you so much for your hard work.

@MODSetter
Copy link
Owner

@CREDO23 Please review this PR and let me know if we need any changes.

Copy link
Contributor

@CREDO23 CREDO23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manojag115, please resolve recurseml[bot] comments!

"adjusting end date to next day to ensure valid date range"
)
# Parse end_date and add 1 day
end_dt = datetime.strptime(end_date_str, "%Y-%m-%d")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manojag115 , can you check this ?

@manojag115
Copy link
Contributor Author

@CREDO23 the recurseml comments are now fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants