Reduce competitive spam by having sentries always send full transactions to validators #292

ajb · 2022-01-10T16:46:01Z

Competitive spam is a huge problem on Polygon. There are many blocks where over 90% of the transactions are from arbitrageurs trying to get a "backrun", where they land immediately after a target transaction, by sending a transaction with the same exact gas price.

This is not a new thing. Geth has dealt with this before, which is why they implemented ethereum/go-ethereum#21358. In addition, there have been similar issues opened for BSC bnb-chain/bsc#269, and here in the Polygon repo: #209 (cc @ferranbt @moneyoriented)

This has also been highlighted on Twitter by threads such as this one: https://twitter.com/bertcmiller/status/1412579402345586696 from @bertmiller

However, Polygon actually implements the geth PR already! So why does this continue to happen, only on Polygon much worse than other chains?

It's not entirely because of the low gas price, as many may think.

The reason that the geth solution doesn't work on polygon is because of the sentry / validator setup that most validators have. I will explain more below:

Imagine you are an arbitrage trader and you see a target transaction in the mempool, and immediately broadcast your arbitrage tx at the same gas price.

Now imagine you happen to be connected to the current validator's sentry node.

It actually doesn't give you an advantage in terms of ordering, because the sentry node only broadcasts the full transaction to numDirect := int(math.Sqrt(float64(len(peers)))), and announcements to the rest of their peers, including their own validator!

This means that validators are essentially ordering transactions in a random order, hence, there is no way to get your desired outcome besides filling the mempool with more and more spam transactions. In fact, as you start to send more transactions, then so do your competitors, and you end up sending as many transactions as the current gas price will allow until it becomes unprofitable.

There are many different ways to address this, but the changes in this PR would perhaps be the easiest to implement from a theoretical standpoint -- it could just be included in a bor upgrade, with no config change for validators, since they should all have the sentry/validator set up as static peers already.

I'm not totally sure if this is a good idea -- I don't actually run a validator node or understand the exact resource consumption of bor for validators specifically .

But my guess is that if anything, it might decrease the amount of resources needed, since the validators won't have to deal with GetPooledTransactionsMsg / PooledTransactionsMsg if they are simply receiving direct transactions from the sentry.

I am opening this PR to see if there is any interest in addressing this issue, and to see what others think of the proposed solution.

ajb · 2022-01-10T19:23:07Z

Here is a tool for analyzing the spam % in a given block: https://github.com/ajb/polygon-spam-analysis. (It uses the Polygonscan API to find the transaction counts of any smart contracts that are being called.)

One example:

CodeForcer · 2022-01-11T00:28:21Z

@ajb would be really interesting to see a historical analysis of this (maybe over a week worth of blocks) and what % of the network activity is bot spam. This is worth reporting publicly as spam is a corrupting influence on chain usage metrics

ajb · 2022-01-11T08:10:10Z

@ajb would be really interesting to see a historical analysis of this (maybe over a week worth of blocks) and what % of the network activity is bot spam. This is worth reporting publicly as spam is a corrupting influence on chain usage metrics

Thanks, I agree.

I have updated the analysis script to use these rules to mark spam:

Contract does not have a verified source on Polygonscan
More than 5 calls to the same contract in the same block

I'm not prepared to run it on a week's data quite yet, and would probably need some pointers on how to best analyze that amount of data. (i think this would be around ~250k blocks or so.)

I did just run it for 2,000 blocks (> 1 hour) and here are some of the results:

Analyzed 2000 blocks. Average spam rate 32%
Spam rate ranged from 0% to 94% (23430104)
Highest spam count was 295 txs in the same block (23431972)

out.csv

ferranbt · 2022-01-11T15:04:44Z

Hello,

First of all thank for the work you have put both on the spam analysis tool and the PR.

Unfortunately, as of now I cannot merge this PR. These are the core points of my thinking:

I do think you are right on the problem and the sentry-validator pattern is affecting the ordering.
Though the analysis tool is solid and I think it does represent certain spam in the network. I think it is too coarse and it might be over representing spam to some extend.
Though the PR is really small, it is related to a core part of the sync protocol and it might cause unexpected CPU and memory loads. Besides, the static set of nodes can also be use outside of the sentry-validator pattern.
We have already plans down the line to optimize the sentry arquitecture that would remove this problem without any change to the Bor/Geth codebase.

Besides, we are working on in-house observability tools like this that will help us to be more proactive with this kind of situations.

ajb · 2022-01-11T15:56:44Z

Hi @ferranbt, thanks for the detailed response.

Though the PR is really small, it is related to a core part of the sync protocol and it might cause unexpected CPU and memory loads. Besides, the static set of nodes can also be use outside of the sentry-validator pattern.

This is definitely understandable. But if we agree on the core points, then what about a PR that adds a config option for sentries to specify exactly who the validators are?

I think it is too coarse and it might be over representing spam to some extend.

Do you have different parameters that you'd recommend using in order to identify spam? I am happy to adjust the heuristic that I am using in that repository.

In my opinion, the overall spam problem is pretty low, i.e. during "normal" periods, it is around 10-20%.

But during "high traffic" periods, such as market downturns, the spam rate is way higher, sometimes above 50% for long ranges.

We have already plans down the line to optimize the sentry arquitecture that would remove this problem without any change to the Bor/Geth codebase.

Can you share details of these plans?

ferranbt · 2022-01-11T17:21:59Z

Can you share details of these plans?

Just this week I did a full refactor of the Bor network stack here. Once we have control of the network stack and we are free to make changes (this is, we do not have to worry about upstream Go-ethereum changes) we are open for several options:

Add a special type of sentry node. We could PR your change again for this new sentry node instead of static.
Remove the current sentry nodes (full nodes) and use reverse proxies (We have to research the feasibility of this). This would also fix the problem you try to solve in the PR.
A similar option as 2 but built on top of LibP2P.

Besides, as you also mentioned, I want to make sentry nodes a native entity of the Bor client and make it configurable from config file or CLI.

Do you have different parameters that you'd recommend using in order to identify spam?

I am still a bit new on MEV so I cannot give you specific parameters to test. This is the reason why we want to invest in observability, to be more proactive instead of reactive to the problem.

ajb · 2022-01-11T18:21:34Z

Cool, sounds like an awesome update you are working on. I understand and agree that waiting for those changes might make the most sense.

When I get some time, I will work on making the spam analysis more solid and verifying that it is producing the correct output, i.e. not marking any legitimate contract usage as spam. I can also share my definition of "spam" so that it is clearer.

p.s. feel free close this PR if you'd like

ferranbt · 2022-01-12T14:10:11Z

I can also share my definition of "spam" so that it is clearer.

That sounds perfect to me. The more understanding we have of the problem the better we can target the solution.

p.s. feel free close this PR if you'd like

Yes, I will be closing the PR for now. We already keep track of this internally.

ajb · 2022-01-12T17:32:47Z

Thanks Ferran! Btw, I'm not affiliated, but this event tomorrow looks like it will discuss this problem if you have the time to tune in: flashbots/mev-research#68 (see "MEV on Polygon in 2021: A Case-Study of MEV on Low-Fee Chains by Supragya Raj")

ajb · 2022-01-13T04:16:18Z

Just published a bunch more data to https://github.com/ajb/polygon-spam-analysis.

I can verify that I've manually confirmed that the labeling of spam vs. not spam is working, at least in terms of my personal definition of spam. I might be missing some bots, too, as I only included contracts that were called at least 5 times in a block.

The high level summary is that when you zoom out to a large time period, the level of spam is not that high, maybe like 5-10% of all transactions and a similar percentage of unique EOAs.

However, whenever there is an arbitrage opportunity (for example), you will get a few blocks in a row that look like this:

Whether or not it's a "problem" I guess just depends on who you are. If you are a user who is trying to get a transaction confirmed during one of these periods, it might be frustrating and definitely leads to a degraded UX.

ethermachine · 2022-01-13T10:38:56Z

What I conclude from this is that MEV on polygon is exclusive to the validators. These are the fights between 100% validators seeking the same opportunity. They don't need this PR because they surely are already running their tweaked BORS

Best solution I think would be just open the validator pool to everyone, like all other networks do!

ajb · 2022-01-13T14:46:21Z

What I conclude from this is that MEV on polygon is exclusive to the validators. These are the fights between 100% validators seeking the same opportunity. They don't need this PR because they surely are already running their tweaked BORS

Best solution I think would be just open the validator pool to everyone, like all other networks do!

What do you draw this conclusion from? I am not seeing any evidence of this, despite the fact that validators could of course, capture all MEV themselves if they wanted to.

That said, there are just a handful of validators that own > 50% of the network share, so it's really a matter of whether those specific validators choose to do it.

ethermachine · 2022-01-13T14:59:29Z

What do you draw this conclusion from? I am not seeing any evidence of this, despite the fact that validators could of course, capture all MEV themselves if they wanted to.

I meant to say that your proposal directly would just facilitate the Mev for the validators (the only ones doing mev here..). Instead of having to spam the entire network to ensure the tx is broadcasted to the self sentry/validator, a single tx would be enough to guarantee the purpose. Indirectly all users would benefit because the entire network would not be stressed.

That said, there are just a handful of validators that own > 50% of the network share, so it's really a matter of whether those specific validators choose to do it.

if the network is more distributed and there are a lot of different validators, maybe the spam would not reach the current magnitude because it would be unlikely that the next block would be mined by you (or by a friend..). So each operator would just fire a few (maybe dozens) of txs, instead of hundreds like currently happens

ethermachine · 2022-01-13T15:07:03Z

You seem to be an analytics guy, so I shouldn't be telling you about the correlation between spammy networks and closed / centralised validation. I don't see spam on other networks and i believe this is the reason.

alexandre-chirouze · 2022-07-12T12:35:46Z

I see that this PR is closed but I was wondering if any changes was made during that time ? I can still see a lot of spam transactions inside some blocks. Are transactions with same gas still cannot being ordered from arrival time because of the sentry configuration ?

ajb · 2022-08-07T15:33:56Z

Just this week I did a full refactor of the Bor network stack here. Once we have control of the network stack and we are free to make changes (this is, we do not have to worry about upstream Go-ethereum changes) we are open for several options:

Hey @ferranbt do you still work for Polygon? Were any of these changes eventually implemented? It's been around 7-8 months now.

ajb · 2022-11-30T20:41:03Z

@ferranbt i think it is time to revisit this issue. Wdyt?

ajb · 2022-11-30T20:43:38Z

@temaniarpit27 @Raneet10 @0xsharma

you guys seem to be the active contributors, can you please take a look at this? I think it is still relevant and as needed as ever.

codecov-commenter · 2022-12-03T04:39:21Z

Codecov Report

❗ No coverage uploaded for pull request base (master@fbd2de7). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@            Coverage Diff            @@
##             master     #292   +/-   ##
=========================================
  Coverage          ?   56.70%           
=========================================
  Files             ?      578           
  Lines             ?    68316           
  Branches          ?        0           
=========================================
  Hits              ?    38737           
  Misses            ?    26225           
  Partials          ?     3354

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

temaniarpit27 · 2022-12-03T07:37:43Z

@temaniarpit27 @Raneet10 @0xsharma

you guys seem to be the active contributors, can you please take a look at this? I think it is still relevant and as needed as ever.

Sure. Will check

ajb · 2022-12-03T18:50:10Z

@temaniarpit27 @Raneet10 @0xsharma
you guys seem to be the active contributors, can you please take a look at this? I think it is still relevant and as needed as ever.

Sure. Will check

Thanks. Let me know if you would like some more or updated background info.

I am very confident that this change (or another change that effectively does the same thing - makes sentries always broadcast txs to the validator) would drastically reduce the load on the blockchain during times of high volatility.

github-actions · 2022-12-26T00:17:39Z

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

ajb · 2022-12-26T12:26:46Z

Can we get an updated review from the Polygon team?

temaniarpit27 · 2022-12-26T13:36:24Z

@ajb: the team is currently on global off and we were working on a release v0.3.2. We will try to act upon this in Jan 1st week and try to include this in the version after that. Thanks for your patience

ajb · 2022-12-26T13:58:43Z

no problem. I just got the “stale” ping from the GitHub issues bot, which is what I was responding to.

…

On Mon, Dec 26, 2022, at 8:36 AM, Arpit Temani wrote: @ajb <https://github.com/ajb>: the team is currently on global off and we were working on a release v0.3.2. We will try to act upon this in Jan 1st week and try to include this in the version after that. Thanks for your patience — Reply to this email directly, view it on GitHub <#292 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWELNHRE42ZNGDBM27XA3WPGNOFANCNFSM5LUD6MTQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

JekaMas · 2023-01-13T11:14:07Z

Started work on it. I agree with @ferranbt that the problem can be addressed with more general solution, although that PR can be researched as a first and easy step dealing with spam transactions. I believe, we can start with introducing spam transaction metrics as a service and then test the PR against this metric.

@ajb Thank you for your help. Really appreciate it. We'll keep the PR updated.

ajb · 2023-01-13T14:15:17Z

@ajb Thank you for your help. Really appreciate it. We'll keep the PR updated.

sounds good, thanks for the update!

ajb · 2023-01-26T23:53:50Z

@JekaMas @ferranbt any updated thoughts on this? (now that the hardfork has shipped)

JekaMas · 2023-01-27T02:57:53Z

@ajb the metric is still in the list to do in nearest strints. HF work still in progress, hoping to finish this week.

ajb · 2023-01-27T14:34:53Z

What's the metric that you're looking at? I might have some data I can share already. The problem is that this PR would in theory change the behavior that the network is incentivizing, but you won't actually see a change in how much spam the MEV bots are sending until the change is deployed across the network.

Alternatively, you could try to intstrument metrics around how much time / memory / ingress / egress / CPU load the validators and sentries spend passing txhashes back and forth to each other, when they could just be sending the full transactions instead. I think this would be a valuable data point to decide whether or not this deserves implementation.

github-actions · 2023-03-02T00:19:40Z

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

ajb · 2023-03-02T01:57:42Z

It's disappointing to see this issue get continuously marked stale... any word on the "MEV working group"? When's the next meeting?

temaniarpit27 · 2023-03-02T07:37:58Z

@ajb thanks for keeping the patience. We are working continuously in exploring MeV and how to take this forward internally. It might take some time but we will make sure to keep you in loop

github-actions · 2023-03-24T00:16:35Z

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

temaniarpit27 · 2023-04-04T04:37:53Z

@ajb: Thanks for raising the PR, putting out the results and waiting patiently. We have decided not to merge this PR for now as this particular change can affect the current network regarding resource consumption. We don't have any metrics for that and there is no way to simulate this on a full mainnet. We will be relooking at the current validator sentry arch (not a priority though) and will look to incorporate this change. Thanks for all the info provided in the PR.

Let me know if you have any concerns or suggestions and I will be happy to discuss that.
cc: @JekaMas @0xsharma

Closing this for now.

always send full transaction to static peers

3a2501c

ferranbt closed this Jan 12, 2022

TxByTime mentioned this pull request May 28, 2022

core: change ordering of txs of equal gas price from arrival time to hash bnb-chain/bsc#915

Closed

pratikspatil024 reopened this Dec 3, 2022

temaniarpit27 changed the base branch from master to develop December 4, 2022 15:50

github-actions bot added the Stale label Dec 26, 2022

temaniarpit27 removed the Stale label Dec 26, 2022

0xKrishna requested a review from a team January 12, 2023 06:49

JekaMas self-requested a review January 13, 2023 11:23

temaniarpit27 added the MEV label Feb 8, 2023

ajb mentioned this pull request Feb 20, 2023

Make txArriveTimeout Configurable w/ CLI Flag #734

Merged

7 tasks

github-actions bot added the Stale label Mar 2, 2023

temaniarpit27 removed the Stale label Mar 2, 2023

github-actions bot added the Stale label Mar 24, 2023

pratikspatil024 removed the Stale label Mar 24, 2023

temaniarpit27 closed this Apr 4, 2023

Reduce competitive spam by having sentries always send full transactions to validators #292

Reduce competitive spam by having sentries always send full transactions to validators #292

Uh oh!

Conversation

ajb commented Jan 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajb commented Jan 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CodeForcer commented Jan 11, 2022

Uh oh!

ajb commented Jan 11, 2022

Uh oh!

ferranbt commented Jan 11, 2022

Uh oh!

ajb commented Jan 11, 2022

Uh oh!

ferranbt commented Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajb commented Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ferranbt commented Jan 12, 2022

Uh oh!

ajb commented Jan 12, 2022

Uh oh!

ajb commented Jan 13, 2022

Uh oh!

ethermachine commented Jan 13, 2022

Uh oh!

ajb commented Jan 13, 2022

Uh oh!

ethermachine commented Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethermachine commented Jan 13, 2022

Uh oh!

alexandre-chirouze commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajb commented Aug 7, 2022

Uh oh!

ajb commented Nov 30, 2022

Uh oh!

ajb commented Nov 30, 2022

Uh oh!

codecov-commenter commented Dec 3, 2022

Codecov Report

Uh oh!

temaniarpit27 commented Dec 3, 2022

Uh oh!

ajb commented Dec 3, 2022

Uh oh!

github-actions bot commented Dec 26, 2022

Uh oh!

ajb commented Dec 26, 2022

Uh oh!

temaniarpit27 commented Dec 26, 2022

Uh oh!

ajb commented Dec 26, 2022 via email

Uh oh!

JekaMas commented Jan 13, 2023

Uh oh!

ajb commented Jan 13, 2023

Uh oh!

ajb commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JekaMas commented Jan 27, 2023

Uh oh!

ajb commented Jan 27, 2023

Uh oh!

github-actions bot commented Mar 2, 2023

Uh oh!

ajb commented Mar 2, 2023

Uh oh!

temaniarpit27 commented Mar 2, 2023

Uh oh!

ajb commented Jan 10, 2022 •

edited

Loading

ajb commented Jan 10, 2022 •

edited

Loading

ferranbt commented Jan 11, 2022 •

edited

Loading

ajb commented Jan 11, 2022 •

edited

Loading

ethermachine commented Jan 13, 2022 •

edited

Loading

alexandre-chirouze commented Jul 12, 2022 •

edited

Loading

ajb commented Jan 26, 2023 •

edited

Loading

temaniarpit27 commented Apr 4, 2023 •

edited

Loading