Skip to content

Conversation

@lionakhnazarov
Copy link
Collaborator

Modified pkg/tbtcpg/tbtcpg.go - Generate method
Before: If any task's Run method returned an error, the entire proposal generation would fail and return that error.
After:
• If a task fails with an error, it logs the error with context (action name, wallet PKH, error details)
• Continues to the next task in the checklist instead of returning
• Only returns a NoopProposal if all tasks fail or complete without result
• Never returns an error from Generate (always succeeds with either a proposal or Noop)

Impact
• Resilience: Proposal generation continues even if individual tasks fail
• Better coordination: Leaders can still generate proposals (Noop or other actions) even when some tasks encounter errors
• Improved debugging: Errors are logged with full context for investigation
• No coordination failures: Proposal generation always succeeds, preventing leader routine failures that cause follower timeouts
This ensures that coordination messages are always sent, preventing the "coordination message not received on time" errors you were seeing.

- Updated the error handling in the proposal generation process to log errors for failed tasks while allowing subsequent tasks to continue execution.
- Modified test cases to reflect changes in expected behavior, including scenarios where the first task fails but subsequent tasks succeed, and where all tasks fail.
Copy link
Member

@lrsaturnino lrsaturnino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Two things for your evaluation:

  1. The retry loop in generateProposal() is now dead code. Since Generate() never returns an error anymore, the 2-attempt retry with 1-min delay never actually retries. It always succeeds on the first try. Evaluate if it makes sense to return an error only when all tasks fail. That way, single-task failures still occur (your change), but if everything blows up (a transient RPC outage), the retry loop has a chance to help.

  2. The "all tasks completed without result" log at line 122 now also covers "all tasks errored". Might be worth distinguishing the two so operators don't see "completed without result" when in reality everything failed.

- Enhanced error handling in the proposal generation process to provide detailed error messages when all tasks fail, while allowing for partial successes.
- Updated test cases to cover new scenarios for proposal task outcomes, ensuring robust validation of the changes.
- Upgraded several dependencies in go.mod, including major updates for libp2p and protobuf packages, enhancing compatibility and performance.
- Modified the test for provider addresses in libp2p to account for potential order variations, ensuring robustness against changes in address sorting.
Copy link
Member

@lrsaturnino lrsaturnino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lionakhnazarov lionakhnazarov merged commit 138c970 into threshold-network:main Feb 10, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants