Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for heterogeneous concurrency limits #7834

Open
3 tasks done
felix-ht opened this issue Dec 9, 2022 · 4 comments
Open
3 tasks done

Support for heterogeneous concurrency limits #7834

felix-ht opened this issue Dec 9, 2022 · 4 comments
Labels
concurrency enhancement An improvement of an existing feature

Comments

@felix-ht
Copy link

felix-ht commented Dec 9, 2022

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar request and didn't find it.
  • I searched the Prefect documentation for this feature.

Prefect Version

2.x

Describe the current behavior

Currently i can only set concurrency limits flatly per agent or queue.

Describe the proposed behavior

I want to specify that the a machine can support one flow that requires a GPU, but many flows that do not require a GPU.

Example Use

The typical use case for this is a GPU node.

  • Run at most one gpu workflow on the GPU node
  • Run at many cpu only workflow concurrently on the GPU node

Lets take the following example:
-> Preprocessing -> Training-> Postprocessing

Preprocessing and Postprocessing would launch many concurrent flows (on the same machine). While Training might only have one flow that runs on the run on the GPU. It must be ensured that no more than one GPU flows run at the same time. Otherwise there will be resources conflicts.

So the CPU only concurrency limit is say 10 and the one for GPU enabled tasks would be 1.

Additional context

No response

@felix-ht felix-ht added enhancement An improvement of an existing feature status:triage labels Dec 9, 2022
@felix-ht
Copy link
Author

felix-ht commented Dec 15, 2022

i justed noticed that you added https://docs.prefect.io/concepts/tasks/?h=conc#task-run-concurrency-limits

If we could speficy task concurrency limits per agent as well, the issue would be resolved. Might also be quite nicely alinged with how queue concurrency can be limited per queue or per agent.

prefect agent start --tag-concurrency-limit GPU 1

@felix-ht
Copy link
Author

@madkinsz just tagging you to this doesn't get lost

@zanieb
Copy link
Contributor

zanieb commented Jan 12, 2023

This should be addressed fully by the "Work pool" concept that is currently experimental.

@felix-ht
Copy link
Author

felix-ht commented Feb 13, 2023

@madkinsz so Work pool just dropped - however i cannot see how i am supposed to use work pools to achive the desired results.

The only option i see it is to keep doing it as we are currently doing it:

Each node has two agents running

  • one agent that polls from a GPU queue with a concurrency limit of the agent set to 1
  • another agent that polls form CPU queues

This has the big shortcoming that the whole flow running on the GPU machine will have to run with a concurrency limit of 1. And that on a machine that might have as many as 240 CPU threads and 2 TB of RAM. The training itself uses the cores - but that the flow cannot is weird to say the least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concurrency enhancement An improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants