Skip to content

Conversation

ajb
Copy link

@ajb ajb commented Jan 10, 2022

Competitive spam is a huge problem on Polygon. There are many blocks where over 90% of the transactions are from arbitrageurs trying to get a "backrun", where they land immediately after a target transaction, by sending a transaction with the same exact gas price.

This is not a new thing. Geth has dealt with this before, which is why they implemented ethereum/go-ethereum#21358. In addition, there have been similar issues opened for BSC bnb-chain/bsc#269, and here in the Polygon repo: #209 (cc @ferranbt @moneyoriented)

This has also been highlighted on Twitter by threads such as this one: https://twitter.com/bertcmiller/status/1412579402345586696 from @bertmiller

However, Polygon actually implements the geth PR already! So why does this continue to happen, only on Polygon much worse than other chains?

It's not entirely because of the low gas price, as many may think.

The reason that the geth solution doesn't work on polygon is because of the sentry / validator setup that most validators have. I will explain more below:

Imagine you are an arbitrage trader and you see a target transaction in the mempool, and immediately broadcast your arbitrage tx at the same gas price.

Now imagine you happen to be connected to the current validator's sentry node.

It actually doesn't give you an advantage in terms of ordering, because the sentry node only broadcasts the full transaction to numDirect := int(math.Sqrt(float64(len(peers)))), and announcements to the rest of their peers, including their own validator!

This means that validators are essentially ordering transactions in a random order, hence, there is no way to get your desired outcome besides filling the mempool with more and more spam transactions. In fact, as you start to send more transactions, then so do your competitors, and you end up sending as many transactions as the current gas price will allow until it becomes unprofitable.

There are many different ways to address this, but the changes in this PR would perhaps be the easiest to implement from a theoretical standpoint -- it could just be included in a bor upgrade, with no config change for validators, since they should all have the sentry/validator set up as static peers already.

I'm not totally sure if this is a good idea -- I don't actually run a validator node or understand the exact resource consumption of bor for validators specifically .

But my guess is that if anything, it might decrease the amount of resources needed, since the validators won't have to deal with GetPooledTransactionsMsg / PooledTransactionsMsg if they are simply receiving direct transactions from the sentry.

I am opening this PR to see if there is any interest in addressing this issue, and to see what others think of the proposed solution.

@ajb
Copy link
Author

ajb commented Jan 10, 2022

Here is a tool for analyzing the spam % in a given block: https://github.com/ajb/polygon-spam-analysis. (It uses the Polygonscan API to find the transaction counts of any smart contracts that are being called.)

One example:

Screen Shot 2022-01-10 at 11 24 33 AM

@CodeForcer
Copy link

@ajb would be really interesting to see a historical analysis of this (maybe over a week worth of blocks) and what % of the network activity is bot spam. This is worth reporting publicly as spam is a corrupting influence on chain usage metrics

@ajb
Copy link
Author

ajb commented Jan 11, 2022

@ajb would be really interesting to see a historical analysis of this (maybe over a week worth of blocks) and what % of the network activity is bot spam. This is worth reporting publicly as spam is a corrupting influence on chain usage metrics

Thanks, I agree.

I have updated the analysis script to use these rules to mark spam:

  • Contract does not have a verified source on Polygonscan
  • More than 5 calls to the same contract in the same block

I'm not prepared to run it on a week's data quite yet, and would probably need some pointers on how to best analyze that amount of data. (i think this would be around ~250k blocks or so.)

I did just run it for 2,000 blocks (> 1 hour) and here are some of the results:

  • Analyzed 2000 blocks. Average spam rate 32%
  • Spam rate ranged from 0% to 94% (23430104)
  • Highest spam count was 295 txs in the same block (23431972)

Screen Shot 2022-01-11 at 12 07 59 AM

out.csv

@ferranbt
Copy link
Contributor

Hello,

First of all thank for the work you have put both on the spam analysis tool and the PR.

Unfortunately, as of now I cannot merge this PR. These are the core points of my thinking:

  • I do think you are right on the problem and the sentry-validator pattern is affecting the ordering.
  • Though the analysis tool is solid and I think it does represent certain spam in the network. I think it is too coarse and it might be over representing spam to some extend.
  • Though the PR is really small, it is related to a core part of the sync protocol and it might cause unexpected CPU and memory loads. Besides, the static set of nodes can also be use outside of the sentry-validator pattern.
  • We have already plans down the line to optimize the sentry arquitecture that would remove this problem without any change to the Bor/Geth codebase.

Besides, we are working on in-house observability tools like this that will help us to be more proactive with this kind of situations.

@ajb
Copy link
Author

ajb commented Jan 11, 2022

Hi @ferranbt, thanks for the detailed response.

Though the PR is really small, it is related to a core part of the sync protocol and it might cause unexpected CPU and memory loads. Besides, the static set of nodes can also be use outside of the sentry-validator pattern.

This is definitely understandable. But if we agree on the core points, then what about a PR that adds a config option for sentries to specify exactly who the validators are?

I think it is too coarse and it might be over representing spam to some extend.

Do you have different parameters that you'd recommend using in order to identify spam? I am happy to adjust the heuristic that I am using in that repository.

In my opinion, the overall spam problem is pretty low, i.e. during "normal" periods, it is around 10-20%.

But during "high traffic" periods, such as market downturns, the spam rate is way higher, sometimes above 50% for long ranges.

We have already plans down the line to optimize the sentry arquitecture that would remove this problem without any change to the Bor/Geth codebase.

Can you share details of these plans?

@ferranbt
Copy link
Contributor

ferranbt commented Jan 11, 2022

Can you share details of these plans?

Just this week I did a full refactor of the Bor network stack here. Once we have control of the network stack and we are free to make changes (this is, we do not have to worry about upstream Go-ethereum changes) we are open for several options:

  1. Add a special type of sentry node. We could PR your change again for this new sentry node instead of static.
  2. Remove the current sentry nodes (full nodes) and use reverse proxies (We have to research the feasibility of this). This would also fix the problem you try to solve in the PR.
  3. A similar option as 2 but built on top of LibP2P.

Besides, as you also mentioned, I want to make sentry nodes a native entity of the Bor client and make it configurable from config file or CLI.

Do you have different parameters that you'd recommend using in order to identify spam?

I am still a bit new on MEV so I cannot give you specific parameters to test. This is the reason why we want to invest in observability, to be more proactive instead of reactive to the problem.

@ajb
Copy link
Author

ajb commented Jan 11, 2022

Cool, sounds like an awesome update you are working on. I understand and agree that waiting for those changes might make the most sense.

When I get some time, I will work on making the spam analysis more solid and verifying that it is producing the correct output, i.e. not marking any legitimate contract usage as spam. I can also share my definition of "spam" so that it is clearer.

p.s. feel free close this PR if you'd like

@ferranbt
Copy link
Contributor

I can also share my definition of "spam" so that it is clearer.

That sounds perfect to me. The more understanding we have of the problem the better we can target the solution.

p.s. feel free close this PR if you'd like

Yes, I will be closing the PR for now. We already keep track of this internally.

@ferranbt ferranbt closed this Jan 12, 2022
@ajb
Copy link
Author

ajb commented Jan 12, 2022

Thanks Ferran! Btw, I'm not affiliated, but this event tomorrow looks like it will discuss this problem if you have the time to tune in: flashbots/mev-research#68 (see "MEV on Polygon in 2021: A Case-Study of MEV on Low-Fee Chains by Supragya Raj")

@ajb
Copy link
Author

ajb commented Jan 13, 2022

Just published a bunch more data to https://github.com/ajb/polygon-spam-analysis.

I can verify that I've manually confirmed that the labeling of spam vs. not spam is working, at least in terms of my personal definition of spam. I might be missing some bots, too, as I only included contracts that were called at least 5 times in a block.

The high level summary is that when you zoom out to a large time period, the level of spam is not that high, maybe like 5-10% of all transactions and a similar percentage of unique EOAs.

However, whenever there is an arbitrage opportunity (for example), you will get a few blocks in a row that look like this:

Screen Shot 2022-01-12 at 8 08 02 PM

Whether or not it's a "problem" I guess just depends on who you are. If you are a user who is trying to get a transaction confirmed during one of these periods, it might be frustrating and definitely leads to a degraded UX.

@ethermachine
Copy link

What I conclude from this is that MEV on polygon is exclusive to the validators. These are the fights between 100% validators seeking the same opportunity. They don't need this PR because they surely are already running their tweaked BORS

Best solution I think would be just open the validator pool to everyone, like all other networks do!

@ajb
Copy link
Author

ajb commented Jan 13, 2022

What I conclude from this is that MEV on polygon is exclusive to the validators. These are the fights between 100% validators seeking the same opportunity. They don't need this PR because they surely are already running their tweaked BORS

Best solution I think would be just open the validator pool to everyone, like all other networks do!

What do you draw this conclusion from? I am not seeing any evidence of this, despite the fact that validators could of course, capture all MEV themselves if they wanted to.

That said, there are just a handful of validators that own > 50% of the network share, so it's really a matter of whether those specific validators choose to do it.

@ethermachine
Copy link

ethermachine commented Jan 13, 2022

What do you draw this conclusion from? I am not seeing any evidence of this, despite the fact that validators could of course, capture all MEV themselves if they wanted to.

I meant to say that your proposal directly would just facilitate the Mev for the validators (the only ones doing mev here..). Instead of having to spam the entire network to ensure the tx is broadcasted to the self sentry/validator, a single tx would be enough to guarantee the purpose. Indirectly all users would benefit because the entire network would not be stressed.

That said, there are just a handful of validators that own > 50% of the network share, so it's really a matter of whether those specific validators choose to do it.

if the network is more distributed and there are a lot of different validators, maybe the spam would not reach the current magnitude because it would be unlikely that the next block would be mined by you (or by a friend..). So each operator would just fire a few (maybe dozens) of txs, instead of hundreds like currently happens

@ethermachine
Copy link

You seem to be an analytics guy, so I shouldn't be telling you about the correlation between spammy networks and closed / centralised validation. I don't see spam on other networks and i believe this is the reason.

@alexandre-chirouze
Copy link

alexandre-chirouze commented Jul 12, 2022

I see that this PR is closed but I was wondering if any changes was made during that time ? I can still see a lot of spam transactions inside some blocks. Are transactions with same gas still cannot being ordered from arrival time because of the sentry configuration ?

@ajb
Copy link
Author

ajb commented Aug 7, 2022

Just this week I did a full refactor of the Bor network stack here. Once we have control of the network stack and we are free to make changes (this is, we do not have to worry about upstream Go-ethereum changes) we are open for several options:

Hey @ferranbt do you still work for Polygon? Were any of these changes eventually implemented? It's been around 7-8 months now.

@ajb
Copy link
Author

ajb commented Nov 30, 2022

@ferranbt i think it is time to revisit this issue. Wdyt?

@ajb
Copy link
Author

ajb commented Nov 30, 2022

@temaniarpit27 @Raneet10 @0xsharma

you guys seem to be the active contributors, can you please take a look at this? I think it is still relevant and as needed as ever.

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@fbd2de7). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff            @@
##             master     #292   +/-   ##
=========================================
  Coverage          ?   56.70%           
=========================================
  Files             ?      578           
  Lines             ?    68316           
  Branches          ?        0           
=========================================
  Hits              ?    38737           
  Misses            ?    26225           
  Partials          ?     3354           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@temaniarpit27
Copy link
Contributor

@temaniarpit27 @Raneet10 @0xsharma

you guys seem to be the active contributors, can you please take a look at this? I think it is still relevant and as needed as ever.

Sure. Will check

@ajb
Copy link
Author

ajb commented Dec 3, 2022

@temaniarpit27 @Raneet10 @0xsharma
you guys seem to be the active contributors, can you please take a look at this? I think it is still relevant and as needed as ever.

Sure. Will check

Thanks. Let me know if you would like some more or updated background info.

I am very confident that this change (or another change that effectively does the same thing - makes sentries always broadcast txs to the validator) would drastically reduce the load on the blockchain during times of high volatility.

@temaniarpit27 temaniarpit27 changed the base branch from master to develop December 4, 2022 15:50
@github-actions
Copy link

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 26, 2022
@ajb
Copy link
Author

ajb commented Dec 26, 2022

Can we get an updated review from the Polygon team?

@temaniarpit27
Copy link
Contributor

@ajb: the team is currently on global off and we were working on a release v0.3.2. We will try to act upon this in Jan 1st week and try to include this in the version after that. Thanks for your patience

@ajb
Copy link
Author

ajb commented Dec 26, 2022 via email

@0xKrishna 0xKrishna requested a review from a team January 12, 2023 06:49
@JekaMas
Copy link
Contributor

JekaMas commented Jan 13, 2023

Started work on it. I agree with @ferranbt that the problem can be addressed with more general solution, although that PR can be researched as a first and easy step dealing with spam transactions. I believe, we can start with introducing spam transaction metrics as a service and then test the PR against this metric.

@ajb Thank you for your help. Really appreciate it. We'll keep the PR updated.

@JekaMas JekaMas self-requested a review January 13, 2023 11:23
@ajb
Copy link
Author

ajb commented Jan 13, 2023

@ajb Thank you for your help. Really appreciate it. We'll keep the PR updated.

sounds good, thanks for the update!

@ajb
Copy link
Author

ajb commented Jan 26, 2023

@JekaMas @ferranbt any updated thoughts on this? (now that the hardfork has shipped)

@JekaMas
Copy link
Contributor

JekaMas commented Jan 27, 2023

@ajb the metric is still in the list to do in nearest strints. HF work still in progress, hoping to finish this week.

@ajb
Copy link
Author

ajb commented Jan 27, 2023

What's the metric that you're looking at? I might have some data I can share already. The problem is that this PR would in theory change the behavior that the network is incentivizing, but you won't actually see a change in how much spam the MEV bots are sending until the change is deployed across the network.

Alternatively, you could try to intstrument metrics around how much time / memory / ingress / egress / CPU load the validators and sentries spend passing txhashes back and forth to each other, when they could just be sending the full transactions instead. I think this would be a valuable data point to decide whether or not this deserves implementation.

@github-actions
Copy link

github-actions bot commented Mar 2, 2023

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 2, 2023
@ajb
Copy link
Author

ajb commented Mar 2, 2023

It's disappointing to see this issue get continuously marked stale... any word on the "MEV working group"? When's the next meeting?

@temaniarpit27
Copy link
Contributor

@ajb thanks for keeping the patience. We are working continuously in exploring MeV and how to take this forward internally. It might take some time but we will make sure to keep you in loop

@github-actions
Copy link

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@temaniarpit27
Copy link
Contributor

temaniarpit27 commented Apr 4, 2023

@ajb: Thanks for raising the PR, putting out the results and waiting patiently. We have decided not to merge this PR for now as this particular change can affect the current network regarding resource consumption. We don't have any metrics for that and there is no way to simulate this on a full mainnet. We will be relooking at the current validator sentry arch (not a priority though) and will look to incorporate this change. Thanks for all the info provided in the PR.

Let me know if you have any concerns or suggestions and I will be happy to discuss that.
cc: @JekaMas @0xsharma

Closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants