Skip to content

Conversation

@madelgi
Copy link

@madelgi madelgi commented Sep 22, 2020

Summary

Adds BatchSubmit and AWSClientWait tasks to the AWS task library.

Changes

This PR adds three main components:

  • BatchSubmit: Task for submitting jobs to AWS batch
  • AWSClientWait: A general task-interface for boto3 waiters. Allows users to wait for long-running AWS jobs to complete
  • Custom batch waiters: The original motivation for this PR was adding a BatchSubmit Task to Prefect, and a key feature is waiting on a batch job to complete. Unfortunately, the boto3 library does not have built-in waiters for AWS batch, so this PR also adds three custom waiters (JobExists, JobRunning, and JobComplete) that users can use along with with AWSClientWait task to wait on some exit condition for a batch job.

Although all of the waiter logic was motivated by batch (and only tested with batch job submission), the AWSClientWait task should theoretically be able to use all built-in boto3 waiters + custom user-defined waiters (e.g., you could use it to wait on an EC2 instance to be instantiated, for example)

Importance

Adds AWS Batch and boto waiter functionality to prefect.

Checklist

This PR:

  • adds new tests (if appropriate)
  • adds a change file in the changes/ directory (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

@madelgi madelgi marked this pull request as draft September 22, 2020 17:50
@joshmeek
Copy link

Hi @madelgi thanks for the PR! I haven't taken a look just yet but I want to add that the codecov failures will be fixed with #3360 so you can ignore those for now. I'll take a look at this PR sometime in the next day or two to give you a better answer on the waiter question (if I can answer it, not very familiar with it's implication in AWS Batch)

@madelgi
Copy link
Author

madelgi commented Sep 22, 2020

Hey @joshmeek, thanks for the reply! Really no rush, whenever you get around to it -- the past couple of weeks I've just been formalizing some hacky AWS code in my pipelines into tasks. Hopefully I haven't been bombarding you all with too many PRs.

Just to quickly give you some background on waiters, they are basically what they sound like. For certain long running AWS jobs, you may want to wait until the job reaches a certain state. E.g., in the context of submitting a job to batch, you may want to wait for the job to start running before you proceed with the rest of your code. The manual way to do this is to just repeatedly poll the relevant API for the job state and proceed once you reach the desired state. Waiters just encapsulate this pattern, letting you define what conditions you're waiting for, how many times you want to poll the relevant API, and how long you want to wait between polling the API (along with some other settings). boto provides pre-defined waiters for a bunch of AWS services, but Batch is unfortunately not among them, so I defined a few custom waiters in that attached json file (there is a PR that formally adds these waiters to boto, but its been stale for a while now)

So I guess the question, as far as prefect is concerned, is whether to extract this waiter pattern into its own generalized Waiter task, e.g., convert this:

# submit job and wait for specified exit condition in the same task (this is how the PR currently works)
job_id = batch_submit_task(
    job_name, 
    job_definition, 
    job_queue, 
    exit_condition='running', 
    waiter_delay=10, 
    waiter_max_attempts=100
)

into something like this:

# exit immediately upon job submission
job_id = batch_submit_task(job_name, job_definition, job_queue) 
# wait in a separate task
wait_result = batch_wait_task(job_id, 'running', waiter_kwargs={'delay': 10, 'max_attempts': 100})

or do something completely different! Those are just the two ideas I've considered. The reason I think doing something closer to the second option could be better is because this waiter pattern has implications for a significant number of AWS services.

@joshmeek joshmeek mentioned this pull request Sep 25, 2020
@PrefectHQ PrefectHQ deleted a comment from codecov bot Oct 5, 2020
@codecov
Copy link

codecov bot commented Oct 5, 2020

Codecov Report

Merging #3366 into master will not change coverage.
The diff coverage is n/a.

@joshmeek
Copy link

joshmeek commented Oct 9, 2020

@madelgi Sorry for the delayed response here and thank you for explaining the waiter concept so nicely! I'm on board with separating this into two tasks, one for submitting and one for waiting. This is similar to how something like the Docker tasks operate where the create container task is only responsible for creating the container then the container ID is returned for all other Docker tasks to use.

Also I think the path to the JSON needs to be included in the MANIFEST.in in order for it to be packaged with the pip installation 🙂

@madelgi madelgi marked this pull request as ready for review October 12, 2020 05:14
@madelgi
Copy link
Author

madelgi commented Oct 12, 2020

Hey @joshmeek, this is probably ready to be looked at now, whenever you get a minute

@madelgi madelgi changed the title [WIP] AWS BatchSubmit task AWS BatchSubmit and AWSClientWait tasks Oct 12, 2020
Co-authored-by: Jim Crist-Harif <jcrist@users.noreply.github.com>
Copy link

@joshmeek joshmeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @madelgi

@joshmeek joshmeek merged commit f03d9cb into PrefectHQ:master Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants