-
Notifications
You must be signed in to change notification settings - Fork 0
feat: added retry policy for mediwiki activities! #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces a Changes
Sequence Diagram(s)sequenceDiagram
participant W as MediaWikiETLWorkflow
participant P as get_hivemind_mediawiki_platforms
participant E as extract_mediawiki
participant T as transform_mediawiki_data
participant L as load_mediawiki_data
W->>P: Call with RetryPolicy (1-min interval, max 3 attempts)
alt On Success
P-->>W: Return platforms
else On Failure
P-->>W: Error after 3 attempts
end
W->>E: Call with RetryPolicy (1-min interval, max 3 attempts)
alt On Success
E-->>W: Return extraction result
else On Failure
E-->>W: Error after 3 attempts
end
W->>T: Call with RetryPolicy (1-min interval, max 3 attempts)
alt On Success
T-->>W: Return transformed data
else On Failure
T-->>W: Error after 3 attempts
end
W->>L: Call with RetryPolicy (1-min interval, max 3 attempts)
alt On Success
L-->>W: Confirmation of load
else On Failure
L-->>W: Error after 3 attempts
end
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
hivemind_etl/mediawiki/workflows.py (3)
53-56: Consider exponential backoff for long-running extract activityThe retry policy is appropriately added to the extract activity, but given this activity has a 5-day timeout, you might consider implementing an exponential backoff strategy instead of a fixed 1-minute interval for more efficient retries of this long-running operation.
retry_policy=RetryPolicy( initial_interval=timedelta(minutes=1), maximum_attempts=3, + backoff_coefficient=2.0, + maximum_interval=timedelta(minutes=10), ),
64-67: Consider error filtering in retry policyThe transform activity's retry policy looks good, but you might want to consider adding non-retryable error types for scenarios where retrying wouldn't help (like data validation errors). The activity's implementation already has exception handling, so this complements it well.
retry_policy=RetryPolicy( initial_interval=timedelta(minutes=1), maximum_attempts=3, + non_retryable_error_types=["ValueError", "KeyError"], ),
31-39: Consider extracting the retry policy to a constantAll four activities use the same retry policy configuration. To improve maintainability and ensure consistency, consider extracting this to a shared constant at the top of the file.
import logging from datetime import timedelta from temporalio import workflow from temporalio.common import RetryPolicy +# Standard retry policy for MediaWiki ETL activities +STANDARD_RETRY_POLICY = RetryPolicy( + initial_interval=timedelta(minutes=1), + maximum_attempts=3, +) with workflow.unsafe.imports_passed_through(): from hivemind_etl.mediawiki.activities import (Then in each activity execution:
retry_policy=RetryPolicy( initial_interval=timedelta(minutes=1), maximum_attempts=3, ),Would become:
retry_policy=STANDARD_RETRY_POLICY,Also applies to: 49-57, 60-68, 71-80
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hivemind_etl/mediawiki/workflows.py(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
hivemind_etl/mediawiki/workflows.py (1)
hivemind_etl/mediawiki/activities.py (1)
transform_mediawiki_data(75-85)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: ci / test / Test
- GitHub Check: ci / lint / Lint
🔇 Additional comments (3)
hivemind_etl/mediawiki/workflows.py (3)
5-5: Appropriate import added for RetryPolicyThe
RetryPolicyimport fromtemporalio.commonis correctly added to support the retry capabilities being implemented in the workflow activities.
35-38: Good addition of retry policy for get_hivemind_mediawiki_platformsAdding retry capabilities to the platform retrieval activity improves resilience against transient failures. The retry policy with 3 maximum attempts and a 1-minute interval is appropriate for this activity which has a 1-minute timeout.
76-79: Consistent retry policy for load activityThe retry policy for the load activity maintains consistency with the other activities, which is good for maintainability. The 3 maximum attempts with a 1-minute interval aligns well with the 30-minute timeout of this activity.
Summary by CodeRabbit