Component to submit SageMaker Processing jobs directly from a Kubeflow Pipelines workflow. https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html
For running your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation using AWS SageMaker.
Argument | Description | Optional | Data type | Accepted values | Default |
---|---|---|---|---|---|
region | The region where the cluster launches | No | String | ||
endpoint_url | The endpoint URL for the private link VPC endpoint | Yes | String | ||
assume_role | The ARN of an IAM role to assume when connecting to SageMaker | Yes | String | ||
job_name | The name of the Processing job. Must be unique within the same AWS account and AWS region | Yes | String | ProcessingJob-[datetime]-[random id] | |
role | The Amazon Resource Name (ARN) that Amazon SageMaker assumes to perform tasks on your behalf | No | String | ||
image | The registry path of the Docker image that contains the processing script | Yes | String | ||
instance_type | The ML compute instance type | Yes | String | ml.m4.xlarge, ml.m4.2xlarge, ml.m4.4xlarge, ml.m4.10xlarge, ml.m4.16xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.c4.xlarge, ml.c4.2xlarge, ml.c4.4xlarge, ml.c4.8xlarge, ml.p2.xlarge, ml.p2.8xlarge, ml.p2.16xlarge, ml.p3.2xlarge, ml.p3.8xlarge, ml.p3.16xlarge, ml.c5.xlarge, ml.c5.2xlarge, ml.c5.4xlarge, ml.c5.9xlarge, ml.c5.18xlarge and many more | ml.m4.xlarge |
instance_count | The number of ML compute instances to use in each processing job | Yes | Int | ≥ 1 | 1 |
volume_size | The size of the ML storage volume that you want to provision in GB | Yes | Int | ≥ 1 | 30 |
resource_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) | Yes | String | ||
output_encryption_key | The AWS KMS key that Amazon SageMaker uses to encrypt the model artifacts | Yes | String | ||
max_run_time | The maximum run time in seconds per processing job | Yes | Int | ≤ 432000 (5 days) | 86400 (1 day) |
environment | The environment variables to set in the Docker container | Yes | Yes | Dict | Maximum length of 1024. Key Pattern: [a-zA-Z_][a-zA-Z0-9_]* . Value Pattern: [\S\s]* . Upto 16 key and values entries in the map |
container_entrypoint | The entrypoint for the processing job. This is in the form of a list of strings that make a command | Yes | Yes | List of Strings | |
container_arguments | A list of string arguments to be passed to a processing job | Yes | Yes | List of Strings | |
input_config | Parameters that specify Amazon S3 inputs for a processing job | No | List of Dicts | [] | |
output_config | Parameters that specify Amazon S3 outputs for a processing job | No | List of Dict | [] | |
vpc_security_group_ids | A comma-delimited list of security group IDs, in the form sg-xxxxxxxx | Yes | String | ||
vpc_subnets | A comma-delimited list of subnet IDs in the VPC to which you want to connect your hpo job | Yes | String | ||
network_isolation | Isolates the processing container if true | No | Boolean | False, True | True |
traffic_encryption | Encrypts all communications between ML compute instances in distributed processing if true | No | Boolean | False, True | False |
tags | Key-value pairs to categorize AWS resources | Yes | Dict | {} |
Notes:
- You can find more information about how container entrypoint and arguments are used at the Build Your Own Processing Container documentation.
- Each key and value in the
environment
parameter string to string map can have length of up to 1024. SageMaker supports up to 16 entries in the map. - The format for the
input_config
field is:
[
{
'InputName': 'string',
'S3Input': {
'S3Uri': 'string',
'LocalPath': 'string',
'S3DataType': 'ManifestFile'|'S3Prefix',
'S3InputMode': 'Pipe'|'File',
'S3DataDistributionType': 'FullyReplicated'|'ShardedByS3Key',
'S3CompressionType': 'None'|'Gzip'
}
},
]
- The format for the
output_config
field is:
[
{
'OutputName': 'string',
'S3Output': {
'S3Uri': 'string',
'LocalPath': 'string',
'S3UploadMode': 'Continuous'|'EndOfJob'
}
},
]
Name | Description |
---|---|
job_name | Processing job name |
output_artifacts | A dictionary mapping with output_config OutputName as the key and S3Uri as the value |