Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBC Rate Limiting Audit Fixes #818

Merged
merged 24 commits into from
Jun 13, 2023
Merged

IBC Rate Limiting Audit Fixes #818

merged 24 commits into from
Jun 13, 2023

Conversation

sampocs
Copy link
Collaborator

@sampocs sampocs commented Jun 7, 2023

Context and purpose of the change

This PR addresses two issues from the informal audit (note these are in a private repo):

Diagram

Excalidraw

Issue #​1 - Rolling back outflow

Overview

Currently when packets are sent, the outflow is incremented by the transfer amount; however, if the packet fails or times out, it is not decremented. This opens the door for someone to clog up the rate limit by sending packets with a short timeout.

Solution

The obvious solution here is to decrement the outflow during the timeout or ack failure. But the complexity lies in that we need to make sure that we're in the same rate limit time window as the original packet send, otherwise the outflow will have already been reset to 0 and any additional decrement would be incorrect. In other words, if the rate limit was reset before the packet response was relayed (but after it was sent), then the outflow should not be modified during the ack.

Implementation

This was implemented by saving down the sequence numbers of packets that have been sent in the given rate limit window, and then checking for the presence of that sequence number when handling the acknowledgment or timeout. More specifically, the high level changes were:

SendPacket:           + Stores sequence number for packet
ResetRateLimit:       + Clears all sequence numbers for pending packets
OnAckFailure/Timeout: + Check if sequence exists, and if so, decrement the outflow
                      + Removes sequence number (not entirely necessary, but decreases the size of the state)

Changelog

  • Added function UndoSendPacket which checks if the packet's sequence number exists in the pending store (i.e. the packet was sent in the current time window), and if so, decrements the outflow
  • Added the following keeper functions to keep track of the pending packets:
    • SetPendingSendPacket
    • RemovePendingSendPacket
    • CheckPacketSendDuringCurrentQuota
    • RemoveAllChannelPendingSendPackets
  • Modified SendRateLimitedPacket to store a pending packet
  • Modified ResetRateLimit to remove all pending packets
  • Implemented TimeoutRateLimitedPacket which calls UndoSendPacket
  • Implemented AcknowledgeRateLimitedPacket which checks if the ack failed, and if so, calls UndoSendPacket
  • Added helper function ParsePacketInfo to group some redundant packet parsing logic that was required in each IBC callback

Issue #​2 - Channel bottlenecks

Overview

In Stride's stakeibc flow, a single transfer is issued every epoch with all the aggregate liquid stake deposits from the previous epoch window. This transfer is large enough that it would likely exceed the rate limit quota depending on how strict it was set.

Solution

Transfers to or from Stride module account's or ICA's should be whitelisted such that they are not included in the Inflow/Outflow calculation and cannot be rate limited.

Implementation

This was implemented by adding a list of whitelisted addresses to the store, and adding a check to Send/Receive packet for whether the sender or receiver is in the list of addresses.

Changelog

  • Added keeper functions for the whitelist:
    • AddAddressToWhitelist
    • RemoveAddressFromWhitelist
    • IsAddressWhitelisted
    • GetAllWhitelistedAddresses
  • Modified CheckRateLimitAndUpdateFlow to check if the sender or receiver is in the whitelist, and if so, allows the packet to pass without updating the flow
  • Modified stakeibc to whitelist each module account when registering a host zone

Additional Misc Changelog

  • Updated and added new unit tests
  • Updated readme
  • Added TODO in upgrade handler to white list host zone addresses
  • Added queries for blacklisted denoms and whitelisted addresses

Testing

Ack Failure/Timeout

  • Set the host zones as (GAIA JUNO OSMO) in config.sh
  • Start dockernet
git submodule update --init --recursive
make start-docker
  • Go into dockernet/scripts/ratelimit/run_all_tests.sh and comment out all the tests besides the last one "test_send_failures"
  • In a separate terminal window, run the following to constantly print the ustrd rate limit
while true; do echo "=======================" && build/strided --home dockernet/state/stride1 q ratelimit rate-limit channel-2 --denom ustrd && sleep 1; done
  • Finally, run the ack timeout/failure tests which will simulate a timed out transfer and a failed transfer. In your other terminal window, you'll see the Outflow temporarily increase along the rate limit, and then shortly after get reset back to 0 as the timeout/failure is returned to stride. You'll have to look closely or you might miss it!
bash dockernet/scripts/ratelimit/run_all_tests.sh

Address Whitelist

  • After running the setup above, run the following in a different terminal window to display the atom rate limit flow
while true; do echo "=======================" && build/strided --home dockernet/state/stride1 q ratelimit rate-limit channel-2 --denom ustrd && sleep 1; done
  • Run the dockernet scripts to transfer and liquid stake
bash dockernet/scripts/1.sh
bash dockernet/scripts/2.sh
  • Confirm the rate limit flow was updated from the original transfer in script 1, but was not modified from the epochly transfer

@github-actions github-actions bot added the C:docs label Jun 7, 2023
@sampocs sampocs force-pushed the sam/ratelimit-audit-fix-timeout branch from a07f1cb to 1e5cbd8 Compare June 7, 2023 01:53
@sampocs sampocs added the v10 label Jun 7, 2023
dockernet/config.sh Outdated Show resolved Hide resolved
dockernet/config.sh Outdated Show resolved Hide resolved
switch {
// withdrawal address
case portID == withdrawalAddress:
zoneInfo.WithdrawalAccount = &types.ICAAccount{Address: address, Target: types.ICAAccountType_WITHDRAWAL}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiniest pico nit (definitely don't need to change) I find it slightly cleaner to repeat im.keeper.RatelimitKeeper.AddAddressToWhitelist(ctx, address) on each line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

x/ratelimit/README.md Outdated Show resolved Hide resolved
x/ratelimit/README.md Show resolved Hide resolved
x/ratelimit/keeper/rate_limit.go Show resolved Hide resolved
func (k Keeper) SetPendingSendPacket(ctx sdk.Context, channelId string, sequence uint64) {
store := prefix.NewStore(ctx.KVStore(k.storeKey), types.PendingSendPacketPrefix)
key := types.GetPendingSendPacketKey(channelId, sequence)
store.Set(key, []byte{1})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the []byte{1} needed for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All we really need to store is the key, the value doesn't matter as long as it's non-empty.

The implementation can be thought of more as a a set in python, rather than a map


// Check if the sender or receiver are white listed
// If so, return a success without modifying the quota
if k.IsAddressWhitelisted(ctx, packetInfo.Sender) || k.IsAddressWhitelisted(ctx, packetInfo.Receiver) {
Copy link
Contributor

@asalzmann asalzmann Jun 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if k.IsAddressWhitelisted(ctx, packetInfo.Sender) || k.IsAddressWhitelisted(ctx, packetInfo.Receiver) {
if k.IsAddressWhitelisted(ctx, packetInfo.Sender) && direction == types.PACKET_SEND || k.IsAddressWhitelisted(ctx, packetInfo.Receiver) && directino == types.PACKET_RECEIVE {

I think this condition is more specific, it's unnecessarily broad right now (e.g. sending to stride's module account address on a counterparty chain would be whitelisted)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asalzmann thinking about this a bit more, I'm wondering if we should just remove the ICA accounts from the whitelist now that there's the extra clause that specifies the packet type?

The epochly transfer would be a SEND_PACKET with Sender == module account, and the prop 8 transfer would be a RECV_PACKET with Receiver == module account. I'm not sure there's any other cases to worry about

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems cleaner to me. Agree that those are the only two cases. Note: I believe prop 8 rev goes into a separate module account, that will need to be whitelisted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually an even slightly safer design imo would be

Whitelist

  • Both must be true: (SENDER == DELEGATION_ICA) && (RECEIVER == MAIN STAKEIBC MODULE ACCOUNT)
  • Both must be true: (SENDER == WITHDRAWAL_ICA) && (RECEIVER == PROP8 STAKEIBC MODULE ACCOUNT)

With the "either or" whitelist approach and whitelisting the main stakeibc mod acct and the delegation ICA, I think we've disabled rate limits along these paths:

  • transfers from stakeibc module account to malicious cosmoshub address
  • transfers from delegation acct to malicious stride address

x/ratelimit/keeper/packet.go Show resolved Hide resolved
}

// If the ack was successful, remove the pending packet
if ackResponse.Status == icacallbacktypes.AckResponseStatus_SUCCESS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check if the packet exists before removing it? e.g. what if it's just a normal ack (from a channel that has no rate limit)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the stored pending packets is only used for rate limiting to determine if the packet was sent in the same rate limit window. This code isn't deleting the actual packet, just the record that it was sent this window. It has no affect on other modules

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but I think we only set the pending packet if we update a flow

	if updatedFlow {
		k.SetPendingSendPacket(ctx, packetInfo.ChannelID, packet.Sequence)
	}

However, this will always try and delete a packet (unless I'm missing something). If you call Delete on a key that doesn't exist, does it throw an error?

Copy link
Collaborator Author

@sampocs sampocs Jun 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, sorry I misunderstood your original question. Yeah this was intentional since calling Delete on a key that doesn't exist does not error

I figured it was simpler to just call Remove, but if you think it would help readability, I can add the check that it exists first before removing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Makes sense, either way seems fine to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I'll just leave it as is

@sampocs
Copy link
Collaborator Author

sampocs commented Jun 11, 2023

Added the testing instructions to the PR description

@riley-stride
Copy link
Contributor

riley-stride commented Jun 11, 2023

I believe autopilot liquid staking routes funds through the stakeibc module account. If we whiltelist that account to not be restricted by rate limiting when it performs protocol-initiated ibc-transfers to the delegation account, have we also unintentionally removed rate limiting from autopilot liquid staking? Or are we only whitelisting the host ICA accounts (seems like the right approach)?

Copy link
Contributor

@riley-stride riley-stride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix #1 looks pretty good to me, added a couple questions around fix #2

Awesome unit test coverage on all of this

@sampocs
Copy link
Collaborator Author

sampocs commented Jun 12, 2023

I believe autopilot liquid staking routes funds through the stakeibc module account. If we whiltelist that account to not be restricted by rate limiting when it performs protocol-initiated ibc-transfers to the delegation account, have we also unintentionally removed rate limiting from autopilot liquid staking? Or are we only whitelisting the host ICA accounts (seems like the right approach)?

Sorry @riley-stride, just noticed this! I don't think autopilot LS uses a module account - I think it sends it to the user's stride address

store.Delete(key)
}

// Check if an address is currently whitelisted
func (k Keeper) IsAddressWhitelisted(ctx sdk.Context, address string) bool {
// Check if an address pair is currently whitelisted
Copy link
Contributor

@riley-stride riley-stride Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Can we call them routes or paths in some other way make it more clear that they're directional?

For example, if I read that (A,B) was a whitelisted pair, I would assume that both A->B and B->A would be excluded from rate limiting, but in reality only A->B is excluded

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's a good point. But I think Address/Account should still be specified, otherwise Route may easily be conflated with a channel, and Path is already used to describe the channel

proto/stride/ratelimit/ratelimit.proto Show resolved Hide resolved
@@ -49,3 +49,8 @@ message RateLimit {
Quota quota = 2;
Flow flow = 3;
}

message WhitelistedAddressPair {
Copy link
Contributor

@asalzmann asalzmann Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we make an assumption here that only one of the two addresses will use the stride... address space. Otherwise, the address pair will whitelist two paths. E.g. for (address-A, address-B), the following paths are valid

From Stride
address-A -> address-B

To Stride
address-A -> address-B

Solutions

  1. Ignore - maybe not relevant since other chains won't have stride addresses?
  2. Add a check when we add a rate limit that both addresses don't have the same prefix
  3. Add a "StrideIsSource" bool to WhitelistedAddressPair

Copy link
Collaborator Author

@sampocs sampocs Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is option 1, I don't see any reason to add a restriction here since either a) the other chains don't have a stride address and a transfer would not be possible or b) they do have a stride address, in which case having a restriction here would break the transfer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sgtm - should we just add a comment above the data structure linking to this discussion?

@sampocs sampocs merged commit 5383d57 into main Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants