-
Notifications
You must be signed in to change notification settings - Fork 23
use a process pool to calculate profile_uncertainty in parallel #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
currently the parallel argument doesn't get set in any of the existing usage contexts, so it will use the default value of 0
pkienzle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New ticket: The mapping code belongs in bumps since it is not specific to reflectometry. Even better if we extend the existing mapper so that it can return arbitrary python objects instead of just the nllf so that we can use whatever parallel pool (MPI, multiprocessing) that we already have set up.
|
|
||
|
|
||
| def _worker_eval_point(point): | ||
| return _eval_point(_shared_problem, point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename _shared_problem to _worker_problem. Each worker has its own version, which it needs so that setp doesn't interfere between processes. The global is ugly, but I don't see a nice way to do this.
|
What if we're still fitting? Do we want to use the same pool for making plots as we're currently using for fitting? |
|
There may be some advantages with MPI, particularly when you have several nodes in the allocation. The work is distributed across the nodes so that they each have roughly the same number of points to evaluate. If the node with the root process is busy evaluating functions for plots then it will be slow compared to the other nodes, which will become idle while waiting for the root node to catch up. Worst case is if the children of the root process are tied to a single processor (I don't know enough about slurm/mpi/fork to know that this is likely). Then the whole allocation will wait while that one processor evaluates 50 or 100 points. Granted, this is what happens now, which is we don't generate plots as part of the checkpoint and update! Ignoring the MPI case I'm not sure there is any performance advantage either way. The same amount of work is being done so throughput should be the same. I don't know what cost/complexity is involved in setting up the processing pool, but if it is slow we could keep the pool around for the life of the server. You're right that a plot request coming from the client while the fit thread is running will be much easier to manage with separate pools. So keep what you've got for now, but in future we may move the complexity of using the pool to bumps so that other applications can more easily do the same sort of thing. |
|
A single data point here. The Point being that there may be some complications in terms of using the same pool, or trying to start a new pool, etc. |
|
MPMapper is storing global state as class attributes. I'm not surprised it fails if you have two of them. Brian's separate pool should work fine if you are using only a single node on TACC. Obviously both the fit and the plot will be slower as they compete with each other for resources but this is handled by the unix scheduler. Rather that recompute the function we could keep a running set of extended outputs. These would be available on demand from the fitness function, and we could add a method to the mapper to request them. So long as they are serializable bumps can track them and store them in HDF. This would work for MPI as well as multiprocessing. As a user I can imagine flipping to an old fit while a new fit is running to compare plots. Even better if I could see them side by side. This may require having a different fit problem in the process pool. Again, easy enough with a separate pool. Much harder on MPI. Hmmm... I wonder if we want to keep the plot serialization in HDF as well so that server doesn't need to do any work when showing plots from old fits. |
|
Using an HDF5 dataset as a "live" datastore is a little tricky - there is a SWMR mode (Single Writer, Multiple Reader) which allows concurrent access. I think for contiguous data (not chunked - not compressed) you could probably make it fast. You'd have to know the dataset size at dataset creation to allocate contiguous storage. You can then write to any section of it whenever you want, and read any time. If the data is chunked I think you have to call the refresh method regularly from the consumers to get updates to the metadata (B-tree of chunk addresses, which changes if chunks are invalidated). |
|
The calculated model state would be much like the MCMC state. You wouldn't need to save it as live data to the HDF, but it could be useful to save it at the end of the fit so that you can more quickly produce plots from saved fit files. Make sure that "best" is one of the samples, since that will be needed for the reflectivity and profile plots. This is out of scope for the current PR. Open a new ticket if you think this is worthwhile. |
|
Note: one difference from MCMC state is that the state for the different points will have different size so you won't be able to allocate space for them in advance. It is more like a list of strings of arbitrary length. |
|
I was actually wondering about that - ragged arrays are hard in HDF5. |
|
You could serialize it as a list of dicts so there is only one large json blob to store. |
Profile uncertainty can take a long time to calculate, and this PR adds a "parallel" kw arg to control parallelism.
A Process pool is launched when paralllel is not equal to 1
Note that this change is incompatible with using
bumps.calc_errorsto calculate the profile uncertainty, and so requires that #222 be merged first.