Skip to content

feat: Fix upgrade height, add dry run mode and clarify API URL#2

Draft
mcamou wants to merge 15 commits intomainfrom
feat/dry-run-and-api-url
Draft

feat: Fix upgrade height, add dry run mode and clarify API URL#2
mcamou wants to merge 15 commits intomainfrom
feat/dry-run-and-api-url

Conversation

@mcamou
Copy link
Contributor

@mcamou mcamou commented Oct 23, 2025

This commit introduces a DRY_RUN mode to prevent actual retagging operations, which is useful for testing.

It also renames the RPC_URL environment variable to API_URL to more accurately reflect that it connects to the Cosmos SDK REST API, not the Tendermint RPC endpoint. The type has also been changed to *url.URL for better validation.

Additionally, this commit includes:

  • A fix for an off-by-one error in the upgrade height calculation.
  • Updates to the README to document the new and changed environment variables.

This commit introduces a DRY_RUN mode to prevent actual retagging
operations, which is useful for testing.

It also renames the RPC_URL environment variable to API_URL to more
accurately reflect that it connects to the Cosmos SDK REST API, not the
Tendermint RPC endpoint. The type has also been changed to *url.URL
for better validation.

Additionally, this commit includes:
- A fix for an off-by-one error in the upgrade height calculation.
- Updates to the README to document the new and changed environment
  variables.
@mcamou mcamou marked this pull request as draft October 24, 2025 15:05
@mcamou mcamou force-pushed the feat/dry-run-and-api-url branch from 09d7584 to 04061d2 Compare October 24, 2025 15:33
- Make DockerHub credentials optional when in dry-run mode.
- Add unit tests for the new configuration validation.
- Create a sample Cosmos upgrade proposal file.
- Update Dockerfile builder WORKDIR to /src.
@mcamou mcamou force-pushed the feat/dry-run-and-api-url branch from 04061d2 to bffdac0 Compare October 24, 2025 16:08
The previous fix to use the v1 API endpoint was incomplete. The JSON response structure for v1 proposals is different from v1beta1, requiring updated Go structs and parsing logic to correctly identify passed software upgrade proposals.
The previous fix for the v1 API was incomplete. The upgrade plan is nested within a 'content' object inside the 'messages' array. This commit updates the structs and parsing logic to correctly handle this nested structure, definitively fixing the 'no passed proposals found' bug.
To diagnose API parsing issues, this commit introduces debug logging to print the raw response body from the proposals endpoint. The log level is now configurable via the LOG_LEVEL environment variable, which is set to 'debug' in the docker-compose file for development.
@mcamou mcamou force-pushed the feat/dry-run-and-api-url branch from 6423b0c to f539d58 Compare October 24, 2025 18:09
The previous parsing logic for v1 proposals was incorrect and broke the tests. This commit reverts the faulty logic and structs, leaving only the debug logger in place. The goal is to capture the raw API response in the next E2E run to inform the correct implementation.
@mcamou mcamou force-pushed the feat/dry-run-and-api-url branch from f539d58 to d4186b1 Compare October 24, 2025 18:13
The DAEMON_RESTART_AFTER_UPGRADE variable was incorrectly set to true. For a GitOps/Kubernetes workflow where the orchestrator handles restarts, this must be set to 'false'. This definitive fix ensures Cosmovisor will stop after an upgrade, allowing the orchestrator to take over as intended.
When the chain halts for an upgrade, a race condition can occur where the proposal is found but the subsequent block height query fails. This commit fixes the logic to handle this specific error gracefully. The updater now logs a warning and proceeds to process the upgrade, ensuring the Docker image is retagged as intended.
The previous race condition fix was too aggressive. This commit implements a much safer logic. The updater now stores the last known block height. If an API call fails, it only proceeds with the upgrade if the last known height is within 5 blocks of a known, passed upgrade proposal. This prevents false positives while correctly handling the chain halt race condition. The poll interval has also been reduced to 1s for faster E2E testing.
The updater was failing with 501 errors because it was using a legacy endpoint to get the block height. This commit updates the client to use the correct '/cosmos/base/tendermint/v1beta1/blocks/latest' endpoint. It also corrects the associated structs and integration tests to match the modern API response, definitively fixing the recurring API incompatibility.
The updater logic was flawed, causing it to fail during the chain halt race condition. This commit refactors the check to be more robust. It now only attempts to get the block height *after* finding passed proposals. If the height query fails, it then safely checks if the last known height is within a 5-block threshold of a specific proposal before proceeding. This is the definitive fix for the E2E test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant