Skip to content

les: implement new client pool #19745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 3, 2019
Merged

Conversation

zsfelfoldi
Copy link
Contributor

@zsfelfoldi zsfelfoldi commented Jun 20, 2019

This PR implements a new client pool which replaces (and is based on) the old freeClientPool.
Important features:

  • It puts free and prioritized clients in the same pool using a shared priority model
  • Introduces the concept of positive and negative balance. Positive balance is externally assigned and translates to higher connection priority while negative balance (which translates to lower priority) is a virtual cost calculated for free service and it is being exponentially reduced when the client is not connected (just like recentUsage in the old freeClientPool).
  • implements prque.LazyQueue which is a smart priority queue capable of handling continously changing priorities and recalculating them on demand. This also allows calculating not just time cost but request cost too for free clients (which the old pool did not do) which will hopefully improve service availability by penalizing the heavy requesters

Note: this PR is based on top of #19543 so only the second commit is relevant.

Note2: this PR removes the half-finished priority handling API (but leaves the checkpoint functions of course). The API will be polished a bit more and added in a subsequent PR, probabliy in a subsequent release. The back end for most of the interesting functionality is in the new pool but it would be nice to test its basic functionality a bit before adding new and complex possible use cases.

@zsfelfoldi
Copy link
Contributor Author

@karalabe @rjl493456442 I removed the csv logger and added metrics instead

@adamschmideg adamschmideg modified the milestones: 1.9.0, 1.9.1 Jun 27, 2019
func (f *clientPool) finalizeBalance(c *clientInfo, now mclock.AbsTime) {
c.balanceTracker.stop(now)
pos, neg := c.balanceTracker.getBalance(now)
pb := f.getPosBalance(c.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also store the posBalance into database if a client is dropped. Currently we will maintain the positive balance in Queue(which is cache) and flush items if the size of queue exceeds the limit(of course we will flush all positive balances when clientpool is closed). So if the Geth node is crashed, we can lose a lot of positive balances

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We save them immediately when the balance is increased but I think we can also save at disconnection. Will add it soon.


s.fcManager.SetCapacityLimits(s.freeClientCap, s.maxCapacity, s.freeClientCap*2)
s.clientPool = newClientPool(s.chainDb, s.freeClientCap, 10000, mclock.System{}, func(id enode.ID) { go s.protocolManager.removePeer(peerIdToString(id)) })
s.clientPool.setPriceFactors(priceFactors{0, 1, 1}, priceFactors{0, 1, 1})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the time cost is a bit too high.

E.g. for a paid client(cap=1000,000). If the client connects 1s, the time cost is 10^9. While for a normal request cost is much lower.

Besides, we will also consider time cost for estimated balance. The refresh interval in lazyQueue is 10s.
It means the estimated balance will much much higher than the actual balance. It can result in we always need to dump all items from estimated queue and then push them to pop queue to pick item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I think there is probably no "theoretically optimal" value for the ratio of time and request costs and the current default value is just a wild guess but I think it is in a reasonable range. As you wrote, 1s of connection with 1M capcacity costs 10^9 units. 1M capacity means that 10^9 request cost per second is guaranteed so if you use your guaranteed amount then the request cost will be equal to the time cost. If only a few clients are requesting then they also get extra buffer recharge which allows for up 10 times higher request rate so if the client is sucking data at maximum rate then the request cost can be even 10 times higher than the time cost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Can we add a metric here to collect the average number of items be popped for MultiPop function? I have a bad feeling that we will pop the whole queue.

@rjl493456442
Copy link
Member

I am thinking whether it's reasonable to evict a paid client(positive balance > 0) from the clientpool.
If positive balance >0, it means the client pays for the service and the money is not used up now. It's not user-friendly to kick out a paid client.

@zsfelfoldi
Copy link
Contributor Author

@rjl493456442 positive balances can also be very small. I don't think we should treat someone with 1 wei positive balance drastically differently from someone with 0. Kicking someone out (even a paid client) is always acceptable because you can always turn off your server or technical problems can happen. It is always up to the client to decide whether the service was worth the price and whether it wants to pay again. If you must upset one of your paying clients a little bit because you have limited capacity then at least be nice to the best clients.
Also, in my model the server can always put a price tag on letting a certain client in. The client will be able to ping the server to ask its status (maybe a cheap UDP exchange is enough) and the server can tell "pay me this much before attempting to connect". This can save a lot of failed connection attempts and also make it easier for clients to find the currently least overloaded server (a bit similar to the reasoning behind the flow control). This client pool priority design makes it easy to determine the amount to be paid for anyone to connect and this value is going to be a good signal if there is too much competition to connect to a certain server.

@rjl493456442
Copy link
Member

@zsfelfoldi Thanks for the reply.
Regarding the positive balance, yes uniform the logic for paid client and free client can clean the code a bit. So let's do this. Even when a paid client is kicked out, I think the positive balance information can also be kept. So next time when the client connects again, we can still assign it a "high" priority.

@rjl493456442
Copy link
Member

I am running two Geth instances in my local with this PR. If everything works well, I think we can merge this PR in 1.9.2 :))

Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase and fix these issues. And also please save the positive balance when we drop the client. Otherwise, PR looks good to me.

}
c.priority = false
if c.capacity != f.freeClientCap {
f.connectedCapacity += f.freeClientCap - c.capacity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work?

I have a test snippet

a := uint64(64)
a += uint64(10) - uint64(32)
fmt.Println(a)

And the output is constant -22 overflows uint64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, on the machine code level there is absolutely no difference between signed and unsigned addition/subtraction. If you specify constants then the compliler will check for overflows but this is not the case here. I could typecast it to int64 but typecasting is also an implicit assumption about value range and I do not think it would make the code look better. If it an established standard though (or at least done everywhere else in our codebase) then I can add a typecast in these cases. In that case I have to check the rest of my code too though because I think I do this regularly :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other example is prque.Prque which uses a "wrap-around" priority scheme where the comparison is int64(a-b) in order to allow perpetually moving offsets. Binary integers are modulo values technically and I have always considered it perfectly "legal" to think about them and use them that way.

}
if setTotal {
if pb.value+amount > pb.lastTotal {
pb.value += amount - pb.lastTotal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a test case here

pb.value, amount, pb.lastTotal := uint64(5), uint64(10), uint64(14)
amount - pb.lastTotal < 0, constant -4 overflows uint64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@karalabe karalabe modified the milestones: 1.9.1, 1.9.2 Jul 23, 2019
Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zsfelfoldi zsfelfoldi force-pushed the new-clientpool branch 2 times, most recently from 936e30d to b2aa249 Compare August 2, 2019 14:56
@zsfelfoldi zsfelfoldi merged commit a7de796 into ethereum:master Aug 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants