Serverless queuing theory model

This implements a simple queuing theory model in a "serverless" setting where we have the option to scale up and down the number of workers at any point in time.

We make the following assumptions:

Any worker can be busy, idle, or starting
Serving time is exponentially distributed with rate parameter R_i
Startup time is exponentially distributed with rate parameter R_s
Arrival rate is a Poisson process with rate parameter R_a

This means the state is "memoryless" and it reduces to the following:

Total number of starting servers
Total number of busy servers
Total number of idle servers
Current queue size

We can further reduce the state space by noting that it never makes sense to have idle servers and a queue size at the same time.

These are unbounded, but we can make the state space smaller by capping the numbers.

In each step, we have the option to

Add another server
Shut down an idle server
Do nothing, wait until next quanta

We can encode this policy as a matrix as well.

Simulating

We simulate this using a Markov chain model, i.e. turn this into a matrix and perform a power iteration to get the equilibrium state. This represents a probability distribution over all states of the model.

Scoring

We can compute the utilization and latency of each model by looking at the equilibrium distribution and multiplying it with the number of workers and queue size.

Optimizing this

In order to find the optimal policy, we use a gradient-based optimizer and solve for the policy that minimizes latency + alpha * waste.

Alpha is an arbitrary multiplier which lets us explore the latency-utilization tradeoff curve.

Running this

This is incredibly compute-intensive, so we use Modal and run this on 100s of GPUs using JAX with GPU support.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
charts.py		charts.py
compress_images.py		compress_images.py
modal.mplstyle		modal.mplstyle
model.py		model.py
pooling.py		pooling.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless queuing theory model

Simulating

Scoring

Optimizing this

Running this

About

Releases

Packages

Languages

modal-labs/serverless-queuing-theory-model

Folders and files

Latest commit

History

Repository files navigation

Serverless queuing theory model

Simulating

Scoring

Optimizing this

Running this

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages