Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBS spawns unlimited amount of goroutine #3754

Open
linux019 opened this issue Jun 14, 2024 · 5 comments
Open

PBS spawns unlimited amount of goroutine #3754

linux019 opened this issue Jun 14, 2024 · 5 comments
Labels

Comments

@linux019
Copy link
Contributor

linux019 commented Jun 14, 2024

On high RPS to /openrtb2/auction endpoint PBS spawns more and more goroutines. To handle the huge amount of traffic I had to put an RPS limit on the load balancer.
During normal work of PBS amount of goroutines is ~5K
image

PBS spaws a goroutine:

  • to make call to bidder adapter (each bid request)
  • call module hooks
    I propose to limit the amount of currently working goroutines and switch to goroutines pools. If we run out of workers PBS will return HTTP 503 instead of taking more and more traffic.
    On a high amount of goroutines golang scheduler spends a lot of CPU time to pick up the next goroutine.
    I can do this in fork but it’s a significant change to PBS core and will cause many merge conflicts
@bretg
Copy link
Contributor

bretg commented Jun 28, 2024

@linux019 - what library for goroutines pools are you proposing?

@zhongshixi has offered to provide a pointer to a potential solution.

@bsardo will coordinate a decision.

@SyntaxNode
Copy link
Contributor

I had to put an RPS limit on the load balancer.

IMHO its good practice to use backpressure limiting layers in front of Prebid Server. This is the approach we use to avoid the situation described here.

I have no issue adding a goroutine limiting feature to PBS provided it doesn't add latency when unused due to either disabled (if we want to provide that option) or using a high limit value.

@Slind14
Copy link

Slind14 commented Jun 28, 2024

I think this is more about reducing the compute spent on mcall at normal usage.

Being able to deal better with traffic spikes would be a side effect.

There should be no need to add a library for this.

@bretg bretg moved this from Triage to Research in Prebid Server Prioritization Jul 1, 2024
@linux019
Copy link
Contributor Author

linux019 commented Jul 2, 2024

@bretg we don't need third party library, many of them are over complicated. There is a good implementation https://github.com/panjf2000/ants it can be taken as example

@zhongshixi
Copy link
Contributor

zhongshixi commented Jul 3, 2024

we use https://github.com/panjf2000/ants

it works very well in our system since it preallocate the resources for go routines you need. Some improvement we did

  1. we have different ants pool in different parts of the system to make sure not all concurrent execution compete on the same pool.
  2. you do not want to shoot your own foot by having strict limit on the number of go routines, you need to have a soft limit and have a capacity to allow it to grow otherwise your execution can be stuck waiting for a go routine to be available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Research
Development

No branches or pull requests

5 participants