Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain a respectful queue of jobs to be run on Quantum Engine #2821

Open
mpharrigan opened this issue Mar 6, 2020 · 10 comments
Open

Maintain a respectful queue of jobs to be run on Quantum Engine #2821

mpharrigan opened this issue Mar 6, 2020 · 10 comments
Assignees
Labels
area/google area/performance kind/feature-request Describes new functionality priority/p2 Next release should contain it status/needs-agreed-design We want to do this, but it needs an agreed upon design before implementation triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on

Comments

@mpharrigan
Copy link
Collaborator

mpharrigan commented Mar 6, 2020

Sometimes you have a whole host of jobs to run. Instead of submitting lots of them and filling up the queue, you could do one at a time. But you can save on latency / classical processing overhead by keeping a respectful queue. I've been using this function

async def execute_in_queue(func, params, num_workers: int):
    queue = asyncio.Queue()

    async def worker():
        while True:
            param = await queue.get()
            print(f"Processing {param}. Current queue size: {queue.qsize()}")
            await func(param)
            print(f"{param} completed")
            queue.task_done()

    tasks = [asyncio.create_task(worker()) for _ in range(num_workers)]
    for param in params:
        await queue.put(param)
    print("Added everything to the queue. Current queue size: {}".format(queue.qsize()))
    await queue.join()
    for task in tasks:
        task.cancel()
    await asyncio.gather(*tasks, return_exceptions=True)

Would something like this be welcome inside cirq.google?

@dabacon
Copy link
Collaborator

dabacon commented May 8, 2020

Yes!

@kevinsung
Copy link
Collaborator

kevinsung commented May 14, 2020

Can't we achieve the same thing using concurrent.futures.ThreadPoolExecutor? I.e.,

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=num_workers) as executor:
    for param in params:
        executor.submit(func, *param)

@mpharrigan
Copy link
Collaborator Author

I swear I scoured the python async docs looking for something like this!

@kevinsung
Copy link
Collaborator

I didn't know about this until @mrwojtek showed me the other day 😛 .

@mrwojtek
Copy link
Collaborator

The concucrrent.futures.ThreadPoolExecutor is a concept that is conceptually orthogonal to asyncio library and they aim to solve slightly different problem. By design, asyncio is a single threaded library which allows for concurrent execution of python code. ThreadPoolExecutor allows for actual parallel execution where different functions run on different threads. This doesn't matter much for Python code since it's protected by a global lock anyway but matters a lot in our cases: where network I/O happens.

Matthew, your code looks nice but it is dependent on how does func callback is implemented. If all func instances run on the same thread, they'll block on the same I/O operation. Could you give an example of how "func" looks like in your cases?

@balopat balopat added kind/feature-request Describes new functionality triage/needs-more-evidence [Feature requests] Seems plausible, but maintainers are not convinced about the use cases yet area/google area/performance triage/discuss Needs decision / discussion, bring these up during Cirq Cynque and removed triage/needs-more-evidence [Feature requests] Seems plausible, but maintainers are not convinced about the use cases yet labels Sep 21, 2020
@balopat
Copy link
Contributor

balopat commented Sep 21, 2020

I'd be curious to see what you put in func as well, @mpharrigan - the parallel execution will help only when we are network bound / waiting on the remote service's response. Do you have stats on this?
Before introducing any parallel/concurrent processing primitive we should figure out under what circumstances it helps and whether we can leverage existing features. It would be interesting to study the latency / throughput of our jobs depending on job sizes (circuit depth, parameters, measurements, etc).

@mpharrigan
Copy link
Collaborator Author

func is a call to quantum engine. You need a variant of EngineSampler that has an async run method, where instead of blocking polling for a job to complete, you yield. When I had a bunch of 50k shot jobs to run, it gave almost a 2x speedup (heuristic) through a combination of keeping the engine-side queue warm and pipelining client-side classical processing. Now, batching might give you a bigger performance boost but still puts the onus on the developer to batch circuits into appropriately sized chunks and doesn't pipeline the client-side processing.

Really, it would be sweet if we had an auto-batcher that uses async trickery to build up a local queue until it collects enough jobs to batch up and send

@balopat balopat added priority/p2 Next release should contain it status/needs-agreed-design We want to do this, but it needs an agreed upon design before implementation triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on and removed triage/discuss Needs decision / discussion, bring these up during Cirq Cynque labels Oct 1, 2020
@MichaelBroughton
Copy link
Collaborator

Has this been superceded by feature testbed stuff @mpharrigan ?

@mpharrigan
Copy link
Collaborator Author

That would be a natural place for it, although it doesn't exist yet. Probably blocked by #5023

@dstrain115
Copy link
Collaborator

@verult Do we think this feature is obsolete now that we have streaming?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/google area/performance kind/feature-request Describes new functionality priority/p2 Next release should contain it status/needs-agreed-design We want to do this, but it needs an agreed upon design before implementation triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on
Projects
Status: No status
Development

No branches or pull requests

10 participants