feat: Fix upgrade height, add dry run mode and clarify API URL#2
Draft
feat: Fix upgrade height, add dry run mode and clarify API URL#2
Conversation
This commit introduces a DRY_RUN mode to prevent actual retagging operations, which is useful for testing. It also renames the RPC_URL environment variable to API_URL to more accurately reflect that it connects to the Cosmos SDK REST API, not the Tendermint RPC endpoint. The type has also been changed to *url.URL for better validation. Additionally, this commit includes: - A fix for an off-by-one error in the upgrade height calculation. - Updates to the README to document the new and changed environment variables.
09d7584 to
04061d2
Compare
- Make DockerHub credentials optional when in dry-run mode. - Add unit tests for the new configuration validation. - Create a sample Cosmos upgrade proposal file. - Update Dockerfile builder WORKDIR to /src.
04061d2 to
bffdac0
Compare
The previous fix to use the v1 API endpoint was incomplete. The JSON response structure for v1 proposals is different from v1beta1, requiring updated Go structs and parsing logic to correctly identify passed software upgrade proposals.
The previous fix for the v1 API was incomplete. The upgrade plan is nested within a 'content' object inside the 'messages' array. This commit updates the structs and parsing logic to correctly handle this nested structure, definitively fixing the 'no passed proposals found' bug.
To diagnose API parsing issues, this commit introduces debug logging to print the raw response body from the proposals endpoint. The log level is now configurable via the LOG_LEVEL environment variable, which is set to 'debug' in the docker-compose file for development.
6423b0c to
f539d58
Compare
The previous parsing logic for v1 proposals was incorrect and broke the tests. This commit reverts the faulty logic and structs, leaving only the debug logger in place. The goal is to capture the raw API response in the next E2E run to inform the correct implementation.
f539d58 to
d4186b1
Compare
The DAEMON_RESTART_AFTER_UPGRADE variable was incorrectly set to true. For a GitOps/Kubernetes workflow where the orchestrator handles restarts, this must be set to 'false'. This definitive fix ensures Cosmovisor will stop after an upgrade, allowing the orchestrator to take over as intended.
When the chain halts for an upgrade, a race condition can occur where the proposal is found but the subsequent block height query fails. This commit fixes the logic to handle this specific error gracefully. The updater now logs a warning and proceeds to process the upgrade, ensuring the Docker image is retagged as intended.
The previous race condition fix was too aggressive. This commit implements a much safer logic. The updater now stores the last known block height. If an API call fails, it only proceeds with the upgrade if the last known height is within 5 blocks of a known, passed upgrade proposal. This prevents false positives while correctly handling the chain halt race condition. The poll interval has also been reduced to 1s for faster E2E testing.
The updater was failing with 501 errors because it was using a legacy endpoint to get the block height. This commit updates the client to use the correct '/cosmos/base/tendermint/v1beta1/blocks/latest' endpoint. It also corrects the associated structs and integration tests to match the modern API response, definitively fixing the recurring API incompatibility.
The updater logic was flawed, causing it to fail during the chain halt race condition. This commit refactors the check to be more robust. It now only attempts to get the block height *after* finding passed proposals. If the height query fails, it then safely checks if the last known height is within a 5-block threshold of a specific proposal before proceeding. This is the definitive fix for the E2E test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a DRY_RUN mode to prevent actual retagging operations, which is useful for testing.
It also renames the RPC_URL environment variable to API_URL to more accurately reflect that it connects to the Cosmos SDK REST API, not the Tendermint RPC endpoint. The type has also been changed to *url.URL for better validation.
Additionally, this commit includes: