-
Notifications
You must be signed in to change notification settings - Fork 3
feat: support remote sessions on HPC clusters #984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Aug 22, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Aug 22, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
3701ad5 to
a041bdc
Compare
leafty
added a commit
that referenced
this pull request
Sep 2, 2025
This changes the k8s resource name for sessions to be `HpcAmaltheaSession`. It is done to allow for experimenting with the session CRD without impacting parallel work on sessions. This commit should be removed or reverted before the feature PR #984 is merged.
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Sep 3, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Sep 3, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
This was referenced Sep 3, 2025
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Sep 5, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Sep 16, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Sep 18, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Sep 29, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
3571de7 to
b839361
Compare
leafty
added a commit
that referenced
this pull request
Sep 30, 2025
This changes the k8s resource name for sessions to be `HpcAmaltheaSession`. It is done to allow for experimenting with the session CRD without impacting parallel work on sessions. This commit should be removed or reverted before the feature PR #984 is merged.
4070212 to
2eea81e
Compare
leafty
added a commit
that referenced
this pull request
Sep 30, 2025
This changes the k8s resource name for sessions to be `HpcAmaltheaSession`. It is done to allow for experimenting with the session CRD without impacting parallel work on sessions. This commit should be removed or reverted before the feature PR #984 is merged.
2eea81e to
a87a3cb
Compare
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Oct 6, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Oct 6, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
Closes #983. _Feature branch_
This changes the k8s resource name for sessions to be `HpcAmaltheaSession`. It is done to allow for experimenting with the session CRD without impacting parallel work on sessions. This commit should be removed or reverted before the feature PR #984 is merged.
This change adds a new `location` field on the Amalthea session CRD which has two accepted values:
* `local`: the interactive session process runs inside the session pod
* `remote`: the interactive session process runs remotely and is controlled from the session pod
Remote sessions are first implemented to support running sessions in HPC environments, though this can be generalized to many environment types.
Only the `location` field is added, no further change is contained here.
* experimental: remote sessions * update * fix types * error formatting * remove 'not implemented error' * exp: use a dev name for sessions * update * fix e2e? * revert non-important changes * rerun some make targets * feat: support remote sessions on HPC clusters Closes #983. _Feature branch_ * exp: use a dev name for sessions * more updates * feat: support remote sessions on HPC clusters Closes #983. _Feature branch_ * feat: install wstunnel in sidecars * feat: add tunnel using wstunnel and os.exec.Command * feat: add tunnel command to sidecars * feat: basic testing of tunnel in sidecars * refactor: use TARGETOS and TARGETARCH instead of WSTUNNEL_PLATFORM * update * fix e2e? * feat: support remote sessions on HPC clusters Closes #983. _Feature branch_ * feat: support remote sessions on HPC clusters Closes #983. _Feature branch_ * Revert chartpress e2e leftovers --------- Co-authored-by: Flora Thiebaut <flora.thiebaut@sdsc.ethz.ch>
Add the remote session controller sidecar command and start it in the amalthea session.
Runs the tunnel container in remote sessions and setup the HPC job to connect to it. This allows remote HPC sessions to start and have their frontend accessible. Co-authored-by: Salim Kayal <salim.kayal@idiap.ch>
* fix: ensure NVIDIA_VISIBLE_DEVICES is set to void for enroot on eiger * squashme: cosmetic space Co-authored-by: Samuel Gaist <samuel.gaist@idiap.ch> --------- Co-authored-by: Samuel Gaist <samuel.gaist@idiap.ch>
Handles git repositories for remote sessions. 1. The git repositories are collected in the remote session controller from the `RENKU_WORKING_DIR` folder 2. The git repositories are configured in the remote session job --------- Co-authored-by: Salim Kayal <salim.kayal@idiap.ch>
…controller (#1005) Improvements on the remote session controller and remote session: * Improve handling of user-defined environment variables -> use the prefix `USER_ENV_` so that the remote session controller can handle them in a robust way * Handle the system name and partition parameters * Use the `RSC_` prefix for to configure the remote session controller from env vars * Use the project slug to determine the session path at the HPC cluster, e.g. `$SCRATCH/renku/sessions/flora.thiebaut/demo-hpc/flora-thieba-37d8d415cbe9` -> this makes it easier for users to find their files offline (using `ssh` on the HPC cluster) * Write the session script before starting it -> allows users to understand how HPC sessions work * Temporary fix: revert to `wstunnel v10.1.10` on arch64 nodes (issue with memory allocator) --------- Co-authored-by: Salim Kayal <salim.kayal@idiap.ch>
ee826b4 to
2bb97fd
Compare
Mount scratch, project and home directories based on the system response from the FirecREST API. Also, attempt to save and rescue the session if the container is killed and restarted. Session recovery may not be successful, as killing the remote session controller may result in the remote job being cancelled. Though this should help recover the session if the remote session controller goes out of memory. --------- Co-authored-by: Salim Kayal <salim.kayal@idiap.ch>
2bb97fd to
558c331
Compare
Undo all changes related to using "HpcAmaltheaSession" for development.
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Oct 8, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Oct 8, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
olevski
approved these changes
Oct 8, 2025
leafty
added a commit
to SwissDataScienceCenter/renku-data-services
that referenced
this pull request
Oct 8, 2025
Add support for remote sessions on HPC clusters. See SwissDataScienceCenter/amalthea#984 for related changes in Amalthea. See also: SwissDataScienceCenter/amalthea#984 * Add a new `remote` field on Resource Pools which is not set for local resource pools (existing behavior) and can be set to contain the configuration to start remote sessions. When the `remote` field is set, the resource pool will start Amalthea sessions with the `location` field set to `remote`. * Handle setting configuration and other bits to support launching remote sessions.
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Oct 8, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Oct 8, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
leafty
added a commit
to SwissDataScienceCenter/renku
that referenced
this pull request
Oct 8, 2025
See also: SwissDataScienceCenter/amalthea#984 _Feature branch_
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #983.
Feature branch
Add support for remote sessions on HPC clusters.
locationfield with valueslocal(default, current session behavior) andremote(starts sessions on remote compute resources)remote-session-controllerto the sidecar containersremote-session-controllercan start HPC sessions using the FirecREST APIwstunnelto the sidecar containerswstunnelallows the remote session to connect to the Amalthea session resources and allows traffic from the user to be routed to the remote session frontend via the session ingressMore details have been added to new.README.md.
Contents: