Skip to content

Gateway discovery and verification protocol #342

Open
@gusinacio

Description

@gusinacio

Problem statement

We have a system where indexers must manually add the aggregator endpoint for specific gateways and we'd like to facilitate the onboarding of new gateways.

But new gateways may be malicious and we need to make sure that the cost to attack is higher than the value they can extract. We currently limit the amount an indexer can lose with the setting max_amount_willing_to_lose but this must be scaled accordingly to the number of queries per second, so if an attacker targets these big indexers, just the max_amount_willing_to_lose isn't enough to stop a profit on this attack.

Expectation proposal

We propose an off-chain agreement protocol where gateways can send a registration request and indexers can verify if it's possible to aggregate receipts.

Gateway TAP state machine

graph TD;
    Unregistered-->Verifying;
    Unregistered-->Blocked;
    Verifying-->Blocked;
    Blocked-->Verifying;
    Verifying-->Allowed;
    Allowed-->Denied;
    Denied-->Allowed;
Loading

Behaviour

Currently, our system has only two states: Allowed and Denied. Every gateway that is not on the tap_aggregator_endpoints map is denied as soon as tap-agent tries to create a SenderAccount and can't find the aggregator value.

With this proposal, we update our system to have different behaviors depending on which state a sender is.

Unregistered state

This state demands that the sender sends a tap-aggregator header within the first request so it can register the sender and start aggregating. If a query sent by a unregistered sender doesn't have the header, we deny right away with a possible error: "Sender not registered".

We should aggregate the first receipt to verify that the tap-aggregator is working and we can communicate with it. If it works we update to a Verifying state otherwise we change to a Blocked state.

Verifying state

In this state, every RAV request should not fail until we reach a certain amount (configurable) where we can trust the sender and transition it to an Allowed state. If any of the RAV requests fail, we transition to a Blocked state

Blocked state

In this state, we have a backoff retry process where we keep trying to aggregate the receipts that we have. In case we have a successful aggregation, we transition to a Verifying state.
We should not spend that much resources trying to aggregate, so after some time of backoff (it can be configurable but with small defaults like 1 day), we stop trying to aggregate it.

Senders that are blocked can request to update their tap-aggregator by sending another query, but they won't have their query processed. When indexers get a new receipt, the system should try again in a tentative way to verify the gateway (stopping after the same backoff period).

Allowed state

It's the current normal operation, each sender has a max unaggregated fee that they are allowed to serve before being denied. In case the pending fees is over the escrow balance or the unaggregated fee is bigger than max_willing_to_lose, we transition to Deny state.

Deny state

We already have this state in the current system. In this state, we wait for the escrow balance to update, or we keep retrying every 30 seconds to do a RAV request which lowers the unaggregated fee. We then transition to Allowed state resuming operation.

Tap Aggregator Header

For a gateway to update its tap-aggregator, it must send a signed receipt by one of its signers on the tap contracts. The query handler already demands a receipt

Database modification

  • New table responsible for storing tap-aggregator endpoints.
  • Upgrade the deny_list table to a sender_state table.

New error responses

  • We should notify the sender that it should send the header updating the tap-aggregator in the next request.
  • It would be nice to have information if the sender is denied or blocked and what is the reason
    • Low escrow funds
    • Too much pending fees (Tap-Aggregator not working)
    • Verification failed

Alternative considerations

Register route

Instead of sending the aggregator through a header, a specific /register route could be used so we could save traffic on query handler by not sending through a header and just receiving a direct request. This new request would need to receive a receipt that should be aggregated and verified, updating it to the verifying state.

Tap Registry

We also have #94 as another possible solution but it requires a new contract and a new subgraph which is not worth it at the current time. Also, we'd need the same verification protocol to guarantee stability.

Synchronous communication between indexer-service and tap-agent

We used async communication by sharing the same database between those two components, if in any case it seems necessary synchronous communication, we should consider using #84. This should have a deep discussion because this means that indexer-service now would need to know tap-agent address.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions