Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial open-source code drop #1

Merged
merged 3 commits into from
Jul 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: build
on: [push, pull_request]

jobs:

lint:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: deps
run: sudo pip3 install --system pre-commit black flake8
- name: pre-commit
run: pre-commit run --all-files
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
*.swp
package-lock.json
__pycache__
.pytest_cache
.env
.venv
*.egg-info

# CDK asset staging directory
.cdk.staging
cdk.out
17 changes: 17 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
repos:
- repo: local
hooks:
- id: black
name: black
language: system
files: \.py$
verbose: true
entry: black
args: [-l,'100']
- id: flake8
name: flake8
language: system
files: \.py$
verbose: true
entry: flake8
args: [--max-line-length, "100", "--ignore=E501,W503,E722,E203"]
61 changes: 60 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,60 @@
# miniwdl-aws-studio

# miniwdl + GWFCore + SageMaker Studio

This repository is a recipe for deploying **[miniwdl-aws](https://github.com/miniwdl-ext/miniwdl-aws)** and [GWFCore](https://github.com/aws-samples/aws-genomics-workflows) to use within [Amazon SageMaker Studio](https://aws.amazon.com/sagemaker/studio/), a web IDE with a terminal and filesystem browser. You can use the terminal to operate `miniwdl run` against GWFCore's AWS Batch stack, the filesystem browser to manage the inputs and outputs on EFS, and the Jupyter notebooks to further analyze the outputs.

## CDK boilerplate

The `cdk.json` file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the `.venv`
directory. To create the virtualenv it assumes that there is a `python3`
(or `python` for Windows) executable in your path with access to the `venv`
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

```
$ python3 -m venv .venv
```

After the init process completes and the virtualenv is created, you can use the following
step to activate your virtualenv.

```
$ source .venv/bin/activate
```

If you are a Windows platform, you would activate the virtualenv like this:

```
% .venv\Scripts\activate.bat
```

Once the virtualenv is activated, you can install the required dependencies.

```
$ pip install -r requirements.txt
```

At this point you can now synthesize the CloudFormation template for this code.

```
$ cdk synth
```

To add additional dependencies, for example other CDK libraries, just add
them to your `setup.py` file and rerun the `pip install -r requirements.txt`
command.

## Useful commands

* `cdk ls` list all stacks in the app
* `cdk synth` emits the synthesized CloudFormation template
* `cdk deploy` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk docs` open CDK documentation

Enjoy!
103 changes: 103 additions & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#!/usr/bin/env python3
import os
import boto3

from aws_cdk import core as cdk

from miniwdl_gwfcore_studio.miniwdl_gwfcore_studio_stack import (
MiniwdlGwfcoreStudioStack,
)

DEFAULT_GWFCORE_VERSION = "v3.0.7"
gwfcore_version = os.environ.get("GWFCORE_VERSION", DEFAULT_GWFCORE_VERSION)

env = {}
if "CDK_DEFAULT_ACCOUNT" in os.environ:
env["account"] = os.environ["CDK_DEFAULT_ACCOUNT"]
if "CDK_DEFAULT_REGION" in os.environ:
env["region"] = os.environ["CDK_DEFAULT_REGION"]

###################################################################################################
# First perform some ops that aren't convenient to do via CDK/Cfn for various reasons.
###################################################################################################

# Describe existing studio domain+user for VPC/IAM/EFS details
studio_domain_id = os.environ.get("STUDIO_DOMAIN_ID", None)
studio_user_profile_name = os.environ.get("STUDIO_USER_PROFILE_NAME", None)
assert (
studio_domain_id and studio_user_profile_name
), "set environment STUDIO_DOMAIN_ID and STUDIO_USER_PROFILE_NAME to reflect SageMaker Studio"
client_opts = {}
if "region" in env:
client_opts["region_name"] = env["region"]
sagemaker = boto3.client("sagemaker", **client_opts)
domain_desc = sagemaker.describe_domain(DomainId=studio_domain_id)
user_profile_desc = sagemaker.describe_user_profile(
DomainId=studio_domain_id, UserProfileName=studio_user_profile_name
)

# Find the Studio EFS' security group, named "security-group-for-inbound-nfs-{studio_domain_id}"
# Nice-to-have: describe Studio EFS' mount targets to double-check they're in this security group.
ec2 = boto3.client("ec2", **client_opts)
sg_desc = ec2.describe_security_groups(
Filters=[
dict(
Name="group-name",
Values=[f"security-group-for-inbound-nfs-{studio_domain_id}"],
)
]
)
assert (
len(sg_desc.get("SecurityGroups", [])) == 1
), f"Failed to look up SageMaker Studio EFS security group named 'security-group-for-inbound-nfs-{studio_domain_id}'"
studio_efs_sg_id = sg_desc["SecurityGroups"][0]["GroupId"]

# Log the detected details
print(f"studio_domain_id = {studio_domain_id}")
print(f"studio_user_profile_name = {studio_user_profile_name}")
detected = dict(
vpc_id=domain_desc["VpcId"],
studio_efs_id=domain_desc["HomeEfsFileSystemId"],
studio_efs_uid=user_profile_desc["HomeEfsFileSystemUid"],
studio_efs_sg_id=studio_efs_sg_id,
)
for k, v in detected.items():
print(f"{k} = {v}")

# Add necessary policies to the Studio ExecutionRole. We don't do this through CDK because of:
# https://github.com/aws/aws-cdk/blob/486f2e5518ab5abb69a3e3986e4f3581aa42d15b/packages/%40aws-cdk/aws-iam/lib/role.ts#L225-L227
studio_execution_role_arn = user_profile_desc.get("UserSettings", {}).get("ExecutionRole", "")
assert studio_execution_role_arn.startswith(
"arn:aws:iam::"
), "Failed to detect SageMaker Studio ExecutionRole ARN"
studio_execution_role_name = studio_execution_role_arn[studio_execution_role_arn.rindex("/") + 1 :]
iam = boto3.client("iam", **client_opts)

for policy_arn in (
"arn:aws:iam::aws:policy/AmazonSageMakerFullAccess",
"arn:aws:iam::aws:policy/AWSBatchFullAccess",
"arn:aws:iam::aws:policy/AmazonElasticFileSystemFullAccess",
):
# TODO: constrain these to the specific EFS & Batch queues
print(f"Adding to {studio_execution_role_name}: {policy_arn}")
iam.attach_role_policy(
RoleName=studio_execution_role_arn[studio_execution_role_arn.rindex("/") + 1 :],
PolicyArn=policy_arn,
)


###################################################################################################
# CDK stack to do the rest
###################################################################################################


app = cdk.App()
MiniwdlGwfcoreStudioStack(
app,
"MiniwdlGwfcoreStudioStack",
gwfcore_version=gwfcore_version,
env=env,
**detected,
)

app.synth()
17 changes: 17 additions & 0 deletions cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"app": "python3 app.py",
"context": {
"@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
"@aws-cdk/core:enableStackNameDuplicates": "true",
"aws-cdk:enableDiffNoFail": "true",
"@aws-cdk/core:stackRelativeExports": "true",
"@aws-cdk/aws-ecr-assets:dockerIgnoreSupport": true,
"@aws-cdk/aws-secretsmanager:parseOwnedSecretName": true,
"@aws-cdk/aws-kms:defaultKeyPolicies": true,
"@aws-cdk/aws-s3:grantWriteWithoutAcl": true,
"@aws-cdk/aws-ecs-patterns:removeDefaultDesiredCount": true,
"@aws-cdk/aws-rds:lowercaseDbIdentifier": true,
"@aws-cdk/aws-efs:defaultEncryptionAtRest": true,
"@aws-cdk/aws-lambda:recognizeVersionProps": true
}
}
Empty file.
128 changes: 128 additions & 0 deletions miniwdl_gwfcore_studio/miniwdl_gwfcore_studio_stack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
import os
import tempfile
import boto3
from contextlib import ExitStack
from aws_cdk import (
core as cdk,
cloudformation_include as cdk_cfn_inc,
aws_ec2 as cdk_ec2,
aws_iam as cdk_iam,
aws_efs as cdk_efs,
)


class MiniwdlGwfcoreStudioStack(cdk.Stack):
def __init__(
self,
scope: cdk.Construct,
construct_id: str,
*,
vpc_id: str,
studio_efs_id: str,
studio_efs_uid: str,
studio_efs_sg_id: str,
gwfcore_version: str = "latest",
env,
**kwargs,
) -> None:
super().__init__(scope, construct_id, env=env, **kwargs)

# Prepare temp dir
self._cleanup = ExitStack()
self._tmpdir = self._cleanup.enter_context(tempfile.TemporaryDirectory())

# Detect VPC subnets
vpc = cdk_ec2.Vpc.from_lookup(self, "Vpc", vpc_id=vpc_id)
subnet_ids = vpc.select_subnets(subnet_type=cdk_ec2.SubnetType.PUBLIC).subnet_ids

# Deploy gwfcore sub-stacks
batch_sg = self._gwfcore(gwfcore_version, vpc_id, subnet_ids, studio_efs_id, env)

# Modify Studio EFS security group to allow access from gwfcore's Batch compute environment
studio_efs_sg = cdk_ec2.SecurityGroup.from_security_group_id(
self, "StudioEFSSecurityGroup", studio_efs_sg_id
)
studio_efs_sg.add_ingress_rule(batch_sg, cdk_ec2.Port.tcp(2049))

# Add EFS Access Point to help Batch jobs "see" the user's EFS directory in the same way
# SageMaker Studio presents it. Inside Studio, miniwdl-aws can detect this by filtering
# access points for the correct EFS ID, uid, and path.
studio_efs = cdk_efs.FileSystem.from_file_system_attributes(
self,
"StudioEFS",
file_system_id=studio_efs_id,
security_group=studio_efs_sg,
)
fsap = cdk_efs.AccessPoint(
self,
"StudioFSAP",
file_system=studio_efs,
posix_user=cdk_efs.PosixUser(uid=studio_efs_uid, gid=studio_efs_uid),
path="/" + studio_efs_uid + "/miniwdl",
)
assert fsap

def __del__(self):
# clean up temp dir
if self._cleanup:
try:
self._cleanup.close()
except:
pass

def _gwfcore(self, version, vpc_id, subnet_ids, studio_efs_id, env):
# Import gwfcore CloudFormation templates from the aws-genomics-workflows S3 bucket
s3 = boto3.client("s3", region_name="us-east-1")

def _template(basename):
# CfnInclude needs a local filename, so download template to temp dir
tfn = os.path.join(self._tmpdir, basename)
s3.download_file(
"aws-genomics-workflows", f"{version}/templates/gwfcore/{basename}", tfn
)
return tfn

cfn_gwfcore = cdk_cfn_inc.CfnInclude(
self,
"gwfcore",
template_file=_template("gwfcore-root.template.yaml"),
load_nested_stacks=dict(
(s, {"templateFile": _template(fn)})
for (s, fn) in (
("BatchStack", "gwfcore-batch.template.yaml"),
("S3Stack", "gwfcore-s3.template.yaml"),
("IamStack", "gwfcore-iam.template.yaml"),
("CodeStack", "gwfcore-code.template.yaml"),
("LaunchTplStack", "gwfcore-launch-template.template.yaml"),
)
),
parameters={
"VpcId": vpc_id,
"SubnetIds": subnet_ids,
"S3BucketName": f"minwidl-gwfcore-studio-{env['account']}-{env['region']}",
},
)

# Add EFS client access policy to the Batch instance role
included_gwfcore_iam_stack = cfn_gwfcore.get_nested_stack("IamStack")
gwfcore_iam_template = included_gwfcore_iam_stack.included_template
gwfcore_batch_instance_role = gwfcore_iam_template.get_resource("BatchInstanceRole")
assert isinstance(gwfcore_batch_instance_role, cdk_iam.CfnRole)
gwfcore_batch_instance_role.managed_policy_arns.append(
cdk_iam.ManagedPolicy.from_aws_managed_policy_name(
"AmazonElasticFileSystemClientReadWriteAccess"
).managed_policy_arn
)

# Set a tag on the batch queue to help miniwdl-aws identify it as the default
gwfcore_batch_template = cfn_gwfcore.get_nested_stack("BatchStack").included_template
cdk.Tags.of(gwfcore_batch_template.get_resource("DefaultQueue")).add(
"MiniwdlStudioEfsId", studio_efs_id
)

batch_sg = cdk_ec2.SecurityGroup.from_security_group_id(
self,
"BatchSecurityGroup",
gwfcore_batch_template.get_resource("SecurityGroup").attr_group_id,
)
return batch_sg
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-e .
36 changes: 36 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import setuptools


with open("README.md") as fp:
long_description = fp.read()

CDK_MIN_VERSION = "1.110.0"

setuptools.setup(
name="miniwdl-aws-studio",
version="0.0.1",
description="AWS CDK app to add miniwdl+GWFCore to existing SageMaker Studio",
long_description=long_description,
long_description_content_type="text/markdown",
author="Wid L. Hacker",
package_dir={"": "miniwdl_gwfcore_studio"},
packages=setuptools.find_packages(where="miniwdl_gwfcore_studio"),
install_requires=["boto3"]
+ [
f"aws-cdk.{m}>={CDK_MIN_VERSION}"
for m in ("core", "aws_iam", "aws_ec2", "aws_efs", "cloudformation_include")
],
python_requires=">=3.6",
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Programming Language :: JavaScript",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Topic :: Software Development :: Code Generators",
"Topic :: Utilities",
"Typing :: Typed",
],
)
Loading