Skip to content

Non-atomicity of operations #68

Closed
@andrey-kuprianov

Description

@andrey-kuprianov

Surfaced from Informal Systems IBC Audit of cosmos-sdk hash cosmos/cosmos-sdk@6cbbe0d

Description

In multiple places throughout the Cosmos-SDK, functions contain several sub-calls that may fail due to various reasons. At the same time, the sub-calls update the shared state, and the functions do not backtrack the state changes done by the first sub-calls if one of the subsequent sub-calls fails.

Consider this example of SubtractCoins():

for _, coin := range amt {
    balance := k.GetBalance(ctx, addr, coin.Denom)
    locked := sdk.NewCoin(coin.Denom, lockedCoins.AmountOf(coin.Denom))
    spendable := balance.Sub(locked)
    _, hasNeg := sdk.Coins{spendable}.SafeSub(sdk.Coins{coin})
    if hasNeg {
        return sdkerrors.Wrapf(sdkerrors.ErrInsufficientFunds, "%s is smaller than %s", spendable, coin)
    }
    newBalance := balance.Sub(coin)
    err := k.SetBalance(ctx, addr, newBalance)
    if err != nil {
        return err
    }
}

The function iterates over the set of coins and for each coin checks whether enough coins are available for the current denomination. The balance is updated for each coin as the loop progresses. Consider the scenario where the balance for the first coin is updated successfully, but for the second coin setting of the balance fails because of the negative amount of coins. The function returns the error, but the balance for the first coin remains updated in the shared state.

This violates one of the most basic assumptions of function atomicity, namely that either

  1. the function updates the state and succeeds, or
  2. the function returns an error, but the shared state is unmodified.

We have found multiple occasions of such non-atomic functions; here are some, besides the example above:

Bank:

  • AddCoins similar issue with the difference it panics when overflow happens for some denomination.
  • SendCoins first subtracts from the sender account and then adds to the receiver. If subtract is successful and add fails the state is changed.
  • DelegateCoins: delegation is processed denomination by denomination: the insufficient funds is check inside the loop allowing state updates even when an error is reached.
  • UndelegateCoins similar scenario.
  • BurnCoins and MintCoins use SubtractCoins and AddCoins.

ICS20:

The problem is that all above functions have implicit assumption on the behavior of the caller. This implicit assumption is that whenever any such function returns an error, the only correct behavior is to propagate this error up the stack.

While this assumption seems indeed to be satisfied by the present Cosmos SDK codebase (with one exception, see the problem scenario below), it is not documented anywhere in the Cosmos SDK developer documentation. There are only hints to this in the form similar to this paragraph in the Cosmos SDK handler documentation:

The Context contains all the necessary information needed to process the msg, as well as a cache-wrapped copy of the latest state. If the msg is succesfully processed, the modified version of the temporary state contained in the ctx will be written to the main state.

Such hints do not constitute enough developer guidance to avoid introducing severe bugs, especially for Cosmos SDK newcomers.

Problem Scenario

We have found one particular place where this non-atomicity has almost led to the real bug. Namely, in ICS20 OnRecvPacket we have two consecutive operations, minting and sending.

// mint new tokens if the source of the transfer is the same chain
if err := k.bankKeeper.MintCoins(
    ctx, types.ModuleName, sdk.NewCoins(voucher),
); err != nil {
    return err
}


// send to receiver
if err := k.bankKeeper.SendCoinsFromModuleToAccount(
    ctx, types.ModuleName, receiver, sdk.NewCoins(voucher),
); err != nil {
    return err
}

If the minting succeeds, but the sending fails, the function returns an error, while the funds are moved to the module account.

This is how this function is called in applications/transfer/module.go:

err := am.keeper.OnRecvPacket(ctx, packet, data)
if err != nil {
    acknowledgement = channeltypes.NewErrorAcknowledgement(err.Error())
}

We see that the error is turned into a negative acknowledgement. If sending of the coins above was to fail with an error, then the negative acknowledgement would be sent, but also the funds would be silently moved to the module account under the hood. We have carefully examined the code of SendCoinsFromModuleToAccount, and found out that its current implementation can only panic, e.g. when the receiver account overflows. But if there was a possibility for it to return an error this scenario would constitute a real problem. Please note that these two functions are probably written by two different developers, and it would be perfectly legitimate for SendCoinsFromModuleToAccount to return an error -- this is how close it comes to being a real bug. Please see also the https://github.com/cosmos/ics/issues/504 for more details on this issue.

Recommendation

  • Short term: properly explain in the developer documentation the implicit assumption of propagating all returned errors up the stack , as well as in the inline documentation for all functions exhibiting non-atomic behavior.
  • Long term: one of the following needs to be done:
    • either make the SDK functions atomic as described above;
    • or introduce a separate explicit step for handlers, say CommitState, that the handler will need to call to write state changes to the store.

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions