Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] Azure Batch worker pool supports managed identity #5670

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Jan 14, 2025

This feature adds an option to tell Azure Batch the worker pool has a managed identity attached.

azcopy needs to be updated, since the current version isn't recent enough.

When using this feature, you can tell Nextflow that a node pool has a managed identity attached:

azure {

    storage {
        accountName = 'seqeralabs'
    }

    batch {
        location = 'eastus'
        accountName = 'seqeralabs'
        copyToolInstallMode = "off"
        pools {
            "hello-world-entra-mi" {
                vmType = "standard_e2ds_v5"
                managedIdentityId = "myManagedIdentityId"
            }

        }
    }
}

it will then use the native azcopy managed identity authentication to download and upload files.

I haven't handled all the old code yet, so Nextflow will generate a SAS which it only uses sometimes.

Importantly, this 'fixes' #5669, because it doesn't need a SAS token to access the files (tested).

To use:

  1. Create a node pool in Azure Batch and attach a user-assigned managed identity. Follow the instructions here or try this if you want to use Terraform
  2. Change the start task to have the following settings to update azcopy:
    • Command line: bash -c "tar -xzvf azcopy.tar.gz && chmod +x azcopy*/azcopy && mkdir -p $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy*/azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/"
    • Resource file URL: https://aka.ms/downloadazcopy-v10-linux, path azcopy.tar.gz
  3. Retrieve the managed identity client ID
  4. Use the following Nextflow config, where the pool name is my-pool:
process.queue = 'my-pool'
azure {

    storage {
        accountName = 'seqeralabs'
    }

    batch {
        location = 'eastus'
        accountName = 'seqeralabs'
        copyToolInstallMode = "off"
        pools {
            "my-pool" {
                managedIdentityId = "myManagedIdentityId"
            }

        }
    }
}
  1. Launch pipeline and 🤞

To do:

  • See if we can apply the managed ID to pools created by Nextflow. This might need an update from Microsoft
  • Documentation
  • Remove creation of SAS when using managed identity to make everything more secure and allow this to work if we switch off SAS tokens

This feature adds an option to tell Azure Batch the worker pool has a managed identity attached.

Currently, the only feature is it passes the managed identity client id to the task as an environment variable but this could be extended to support file staging in or out.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Copy link

netlify bot commented Jan 14, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit c93dc78
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67e59e2e60dfd500083c6449

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
No longer requires SAS tokens for file staging, it can use the managed identity to authenticate.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot adamrtalbot requested a review from a team as a code owner January 14, 2025 18:18
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@Takadonet
Copy link

Hello,
This new feature will be very helpful for our project to avoid long duration SaS token.
Just wondering if a timeline or we can provide some assistance.

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 5a93547 to 27345a6 Compare February 10, 2025 21:46
@adamrtalbot adamrtalbot changed the title feat: Azure Batch worker pool supports managed identity [draft] Azure Batch worker pool supports managed identity Feb 17, 2025
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Comment on lines +500 to +503
// This is a bad solution and breaks Fusion for everyone
if (!(pool.opts.managedIdentityId && it.key == "AZURE_STORAGE_SAS_TOKEN")) {
opts += "-e $it.key=$it.value "
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks Fusion for everyone who isn't using managed identities. Ideally we take the pool.opts when creating the launcher but I've ran out of time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale? remove AZURE_STORAGE_SAS_TOKEN when pool.opts.managedIdentityId is provided?

Comment on lines +529 to +530
env.put('AZCOPY_AUTO_LOGIN_TYPE', 'MSI') // azcopy
env.put('AZCOPY_MSI_CLIENT_ID', pool.opts.managedIdentityId) // azcopy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These looks unrelated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be affected

Comment on lines +500 to +503
// This is a bad solution and breaks Fusion for everyone
if (!(pool.opts.managedIdentityId && it.key == "AZURE_STORAGE_SAS_TOKEN")) {
opts += "-e $it.key=$it.value "
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale? remove AZURE_STORAGE_SAS_TOKEN when pool.opts.managedIdentityId is provided?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants