-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
I'm trying to get a good estimate of the required cluster size using this guide in Z2JH.
Here are my assumptions:
Memory
- Max users = 50
- Max expected concurrent users = 60% * max users = 30 (because it is not likely that everyone will use at same time)
- Expected memory usage per user:
- I used nbresuse to estimate a user's memory usage in the notebook.
- A notebook by itself is about 120mb I tried to take it to the extreme, executing all the code in multiple chapters and loading in plenty of datasets. I was pushing ~300mb memory usage.
- A single chapter was more commonly 100-200mb (including data and plots).
- Let's be conservative and assume 300mb (we can downgrade in future)
- If a user uses more than the available amount of memory, their notebook kernel will restart and memory will be flushed.
memory = max concurrent users * memory per user + 128mb (for JH overhead) = 30 * 300mb + 128mb = ~9GB
CPU
- This is harder to estimate but also less of an issue, if we're running low on CPU, things will just run slower but nothing will break.
- I took a look at the JupyterHub Tiffany set up for MDS and it's had a peak usage of just 5% since we started MDS so obviously a very conservative instance.
- The JH is using a m5.12xlarge:
Summary
To meet memory and CPU requirements I'm going to start with using 2 x m5.2xlarge instances (the cluster can scale to 4 if needed). I think this is conservative but we'll see. I'll report back.
Here's a comparison of the two instances I mentioned:
| Instance | CPU | RAM | Memory (GB) |
|---|---|---|---|
| m5.2xlarge | 8 | 37 | 32 |
| m5.12xlarge | 48 | 168 | 192 |
Metadata
Metadata
Assignees
Labels
No labels