-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CONTRIB] PopenPoolExecutor #6959
Conversation
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen. - Only handles invoking functions in tvm namespace. - Unlike multiprocessing, does not require __main__ block, which means it can directly run on jupyter notebook. - Come with timeout and fault tolerant support to timeout long running jobs, and restart the process when an error happens. Recommended usage: it is recommended to create a pool and reuse it in a long running job(e.g. autotuning) so that the process are reused when possible.
Additional note: the system overhead of the popen pool and multiprocess.Pool is around 1e-4 sec/item. Which means they can be used to perform heavy duty tasks like compilation, but are not intended for fine grained parallelism. parallel_for in c++ should be used in those cases |
How does this work when a user registers a function? Will the registered function be available in the subprocess? |
In that case the function will need to be registered at the startup time when tvm is imported (since the popen worker also import tvm during startup). Otherwise it won't be available in the subprocess. Closures can still be passed via cloudpickle. To make registeration of any place work we will need to use fork (note that multiprocessing + spawn only works when registeration happens in global scope as well). We could support an additional closures for registeration during pool creation, if there is really a need to do so. PopenPool is not intended to serve as a general purpose pool, but could be used to solve the particular problem of tir compilation where we can control the behavior inside the tvm |
cc @tkonolige @merrymercy @junrushao1994 let me know if we want to review, merge and try it out |
This seems reasonable, but I'm not really sure how well it will work. Have you tested it with autoscheduler or autotvm? |
It looks promising :-) I can try it out with Jupyter later today |
@tkonolige I do not have bandwith to test it out, prelimary benchmark shows it is close to multprocess.Pool on most platforms. Given that it is mostly self contained we could try to merge it in |
I can try and use it with autotvm today. |
Just tested with Jupyter notebook - it works smoothly! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
going to merge after two days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested on macOS. It appears to work!
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen. - Only handles invoking functions in tvm namespace. - Unlike multiprocessing, does not require __main__ block, which means it can directly run on jupyter notebook. - Come with timeout and fault tolerant support to timeout long running jobs, and restart the process when an error happens. Recommended usage: it is recommended to create a pool and reuse it in a long running job(e.g. autotuning) so that the process are reused when possible.
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen. - Only handles invoking functions in tvm namespace. - Unlike multiprocessing, does not require __main__ block, which means it can directly run on jupyter notebook. - Come with timeout and fault tolerant support to timeout long running jobs, and restart the process when an error happens. Recommended usage: it is recommended to create a pool and reuse it in a long running job(e.g. autotuning) so that the process are reused when possible.
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen. - Only handles invoking functions in tvm namespace. - Unlike multiprocessing, does not require __main__ block, which means it can directly run on jupyter notebook. - Come with timeout and fault tolerant support to timeout long running jobs, and restart the process when an error happens. Recommended usage: it is recommended to create a pool and reuse it in a long running job(e.g. autotuning) so that the process are reused when possible.
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen. - Only handles invoking functions in tvm namespace. - Unlike multiprocessing, does not require __main__ block, which means it can directly run on jupyter notebook. - Come with timeout and fault tolerant support to timeout long running jobs, and restart the process when an error happens. Recommended usage: it is recommended to create a pool and reuse it in a long running job(e.g. autotuning) so that the process are reused when possible.
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen.
__main__
block,which means it can directly run on a jupyter notebook block
long running jobs, and restart the process when an error happens.
Recommended usage: it is recommended to create a pool and reuse
it in a long running job(e.g. autotuning) so that the processes
are reused when possible.