-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to enable threadsafe operation #1064
base: master
Are you sure you want to change the base?
Conversation
Thanks! This is highly appreciated. I haven't been able to make the time to do a careful review yet, but it's upcoming. |
Multithreading support would be great, but even in a single thread, maintaining multiple computational graphs in parallel would help be a lot since it would enable model ensemble without having to reset the CG between querying different models. That would be great too! |
… oir-threadsafe-rebase
Thanks again for contributing, and I'm super-sorry for taking so long to get to this! Here are a few comments/questions:
|
@oir FYI: If you're busy and don't have time to handle this we can pick things up and do the rest on our side. Of course if you're willing to help we'll be happy to have you. |
@neubig Hey! Sorry for the late response, I am willing to pick up. I will go through your comments and address them, as well as attempt a rebase, hopefully soon enough. |
@neubig To keep you updated: This week I am starting to look again at this (possibly alongside NAACL). We have noticed another minor issue with the PR which needs to be fixed (about guarding shared parameter pools), which will also be part of this PR after I do the rebase. |
@oir Great, thanks! |
Has any progress been made on this in the past two years? |
If my comments above could be addressed I'd be happy to merge a PR! |
Please see also oir#1. It doesn't appear that the modifications are sufficient. |
for (size_t t = 0; t < 4; ++t) { threads[t].join(); } | ||
for (size_t t = 0; t < 4; ++t) { | ||
for(size_t i = 1; i < results[t].size(); ++i) { | ||
BOOST_CHECK_CLOSE(results[t][0], results[t][i], 0.0001); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code never runs because results[t].size() is always 1. Thread safety is never tested (unless the test crashes, which it does).
The line dynet/tests/test-exec-dynamic.cc Line 144 in 2da4a05
dynet/tests/test-exec-dynamic.cc Line 143 in 2da4a05
results[t].size() is always 1, so the loop is not entered.
If the code is changed to for (size_t t = 1; t < threadCount; ++t) {
BOOST_CHECK_CLOSE(results[0][0], results[t][0], 0.0001);
} the check will pass when the threads are all processed serially, which is not much of a surprise. When they are processed in parallel, the test crashes and the check is never performed. The PR contains some promising code, but it does not appear to be usable/correct. It should not be merged. |
This PR includes changes to (optionally) enable threadsafe operation of dynet, providing the ability to run multiple dynet models within a single application, or executing a single dynet model over multiple data instances (computation graphs) concurrently.
This includes:
Multithread data parallelism without copying a single model in memory works as follows:
dynet::ParameterCollection
object shared between threads containing (physical) model parametersLSTMBuilder
). This copy causes copies of model parameters but that is okay because these are just shells that contain pointers to the same physical storage.Main motivation was multithreaded inference, but possibly the changes might apply to training-time as well, similar to asynchronous SGD training (which I did not test).
My implementation is limited (and tested on) only the
SimpleExecutionEngine
(so no autobatch) and only for CPU devices.