Skip to content

Race condition between cosmovisor and upgrade handler #8964

Closed

Description

Summary of Bug

Under certain conditions the cosmosvisor process can terminate the blockchain executable before an upgrade plan file is flushed to disk. This action prevents a successful upgrade due to a missing upgrade info file.

Version

Tested against 0.42.2
Issue is present in latest/master

Steps to Reproduce

Create a local instance of the network running with cosmovisor. Use a local binary for upgrade such that there is no latency from downloading a binary. The cosmosvisor will terminiate the process as soon as the log message is received but before the upgrade info file can be persisted to disk.

In the following code the upgrade required message is written to the log on line 45 while the upgrade file is dumped on 49.

ctx.Logger().Error(upgradeMsg)
// Write the upgrade info to disk. The UpgradeStoreLoader uses this info to perform or skip
// store migrations.
err := k.DumpUpgradeInfoToDisk(ctx.BlockHeight(), plan.Name)
if err != nil {
panic(fmt.Errorf("unable to write upgrade info to filesystem: %s", err.Error()))
}

Meanwhile in the cosmovisor process the monitor will execute an unclean process termination to force an immediate exist when the message appears in the logs

upgrade, err := WaitForUpdate(scan)
if err != nil {
res.SetError(err)
} else if upgrade != nil {
res.SetUpgrade(upgrade)
// now we need to kill the process
_ = cmd.Process.Kill()

Remediation

Move the log message on abci.goL45 after the k.DumpUpgradeInfoToDisk on line 49.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

C:CosmovisorIssues and PR related to CosmovisorT:Bug

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions