Skip to content

Estimating BinderHub cluster size #60

@TomasBeuzen

Description

@TomasBeuzen

I'm trying to get a good estimate of the required cluster size using this guide in Z2JH.

Here are my assumptions:

Memory

  • Max users = 50
  • Max expected concurrent users = 60% * max users = 30 (because it is not likely that everyone will use at same time)
  • Expected memory usage per user:
    • I used nbresuse to estimate a user's memory usage in the notebook.
    • A notebook by itself is about 120mb I tried to take it to the extreme, executing all the code in multiple chapters and loading in plenty of datasets. I was pushing ~300mb memory usage.
    • A single chapter was more commonly 100-200mb (including data and plots).
    • Let's be conservative and assume 300mb (we can downgrade in future)
    • If a user uses more than the available amount of memory, their notebook kernel will restart and memory will be flushed.

memory = max concurrent users * memory per user + 128mb (for JH overhead) = 30 * 300mb + 128mb = ~9GB

CPU

  • This is harder to estimate but also less of an issue, if we're running low on CPU, things will just run slower but nothing will break.
  • I took a look at the JupyterHub Tiffany set up for MDS and it's had a peak usage of just 5% since we started MDS so obviously a very conservative instance.
  • The JH is using a m5.12xlarge:

Summary

To meet memory and CPU requirements I'm going to start with using 2 x m5.2xlarge instances (the cluster can scale to 4 if needed). I think this is conservative but we'll see. I'll report back.

Here's a comparison of the two instances I mentioned:

Instance CPU RAM Memory (GB)
m5.2xlarge 8 37 32
m5.12xlarge 48 168 192

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions