Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enhance JupyterHub Performance: GPU Acceleration and Time-Slicing Support #277

Merged
merged 19 commits into from
Aug 15, 2023
Merged

feat: Enhance JupyterHub Performance: GPU Acceleration and Time-Slicing Support #277

merged 19 commits into from
Aug 15, 2023

Conversation

lusoal
Copy link
Contributor

@lusoal lusoal commented Aug 2, 2023

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

Included GPU support, additional authentication mechanism, and Karpenter integration. With the addition of GPU support, users can now take advantage of GPU resources to accelerate their AI/ML workloads within JupyterHub. To facilitate testing and demonstrations, I've included the dummy authentication mechanism, allowing users without their own domain and certificate to easily try out the Blueprint. This PR installs Karpenter for dynamic provisioning and configure its provisioners for the examples that I'll be adding soon. For GPU instances, I have configured support for Time-Slicing, enabling scheduled workloads on oversubscribed GPUs to interleave with each other, maximizing GPU utilization. Also changed website/docs/blueprints/ai-ml/jupyterhub.md to reflect the terraform change.

Motivation

This PR brings GPU-based instances, configured through the NVIDIA gpu-operator. With the addition of time-slicing support, users can efficiently share GPUs, optimizing resource utilization even in cases where MIG support is limited. These enhancements will be crucial for upcoming blogs and demos, empowering users with accelerated AI/ML workloads within JupyterHub. 🚀

More

  • Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
  • Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
  • Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

@lusoal lusoal temporarily deployed to DoEKS Test August 2, 2023 17:02 — with GitHub Actions Inactive
@lusoal lusoal changed the title Feat/jupyterhub gpu slicing nvidia Enhance JupyterHub Performance: GPU Acceleration and Time-Slicing Support Aug 2, 2023
Copy link
Contributor

@ovaleanu ovaleanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify the PR title. Available types are:

feat: A new feature
fix: A bug fix
docs: Documentation only changes
style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
refactor: A code change that neither fixes a bug nor adds a feature
perf: A code change that improves performance
test: Adding missing tests or correcting existing tests
build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
chore: Other changes that don't modify src or test files
revert: Reverts a previous commit

@lusoal lusoal changed the title Enhance JupyterHub Performance: GPU Acceleration and Time-Slicing Support feat: Enhance JupyterHub Performance: GPU Acceleration and Time-Slicing Support Aug 3, 2023
Copy link
Contributor

@ovaleanu ovaleanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@lusoal lusoal temporarily deployed to DoEKS Test August 4, 2023 15:33 — with GitHub Actions Inactive
@lusoal lusoal requested a review from ovaleanu August 4, 2023 15:34
Copy link
Contributor

@ovaleanu ovaleanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add provider “random” for resource “random_string”.

Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lusoal versions.tf file needs updating with the random provider. Checks are failing with this error

Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lusoal I have added few minor comments and questions for the PR when you get a chance.

ai-ml/jupyterhub/variables.tf Show resolved Hide resolved
ai-ml/jupyterhub/variables.tf Outdated Show resolved Hide resolved
ai-ml/jupyterhub/addons.tf Outdated Show resolved Hide resolved
ai-ml/jupyterhub/addons.tf Outdated Show resolved Hide resolved
ai-ml/jupyterhub/addons.tf Outdated Show resolved Hide resolved
@lusoal lusoal temporarily deployed to DoEKS Test August 14, 2023 18:27 — with GitHub Actions Inactive
@lusoal lusoal temporarily deployed to DoEKS Test August 14, 2023 21:43 — with GitHub Actions Inactive
@lusoal lusoal temporarily deployed to DoEKS Test August 14, 2023 21:51 — with GitHub Actions Inactive
@lusoal lusoal temporarily deployed to DoEKS Test August 14, 2023 21:56 — with GitHub Actions Inactive
@lusoal lusoal temporarily deployed to DoEKS Test August 14, 2023 22:42 — with GitHub Actions Inactive
@lusoal lusoal temporarily deployed to DoEKS Test August 14, 2023 23:06 — with GitHub Actions Inactive
@lusoal
Copy link
Contributor Author

lusoal commented Aug 14, 2023

Adjusted PR based on feedback

Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lusoal left few some minor comments

ai-ml/jupyterhub/variables.tf Show resolved Hide resolved
ai-ml/jupyterhub/variables.tf Outdated Show resolved Hide resolved
ai-ml/jupyterhub/jupyterhub.tf Outdated Show resolved Hide resolved
ai-ml/jupyterhub/jupyterhub.tf Show resolved Hide resolved
@lusoal lusoal temporarily deployed to DoEKS Test August 15, 2023 15:25 — with GitHub Actions Inactive
Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM🔥

Copy link
Contributor

@ovaleanu ovaleanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@ovaleanu ovaleanu merged commit be15f25 into awslabs:main Aug 15, 2023
43 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants