Skip to content

Conversation

@bmaranville
Copy link
Member

Profile uncertainty can take a long time to calculate, and this PR adds a "parallel" kw arg to control parallelism.

A Process pool is launched when paralllel is not equal to 1

Note that this change is incompatible with using bumps.calc_errors to calculate the profile uncertainty, and so requires that #222 be merged first.

currently the parallel argument doesn't get set in any of the existing usage contexts, so it will use the default value of 0
Copy link
Member

@pkienzle pkienzle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New ticket: The mapping code belongs in bumps since it is not specific to reflectometry. Even better if we extend the existing mapper so that it can return arbitrary python objects instead of just the nllf so that we can use whatever parallel pool (MPI, multiprocessing) that we already have set up.



def _worker_eval_point(point):
return _eval_point(_shared_problem, point)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename _shared_problem to _worker_problem. Each worker has its own version, which it needs so that setp doesn't interfere between processes. The global is ugly, but I don't see a nice way to do this.

@bmaranville
Copy link
Member Author

What if we're still fitting? Do we want to use the same pool for making plots as we're currently using for fitting?

@pkienzle
Copy link
Member

There may be some advantages with MPI, particularly when you have several nodes in the allocation. The work is distributed across the nodes so that they each have roughly the same number of points to evaluate. If the node with the root process is busy evaluating functions for plots then it will be slow compared to the other nodes, which will become idle while waiting for the root node to catch up.

Worst case is if the children of the root process are tied to a single processor (I don't know enough about slurm/mpi/fork to know that this is likely). Then the whole allocation will wait while that one processor evaluates 50 or 100 points. Granted, this is what happens now, which is we don't generate plots as part of the checkpoint and update!

Ignoring the MPI case I'm not sure there is any performance advantage either way. The same amount of work is being done so throughput should be the same. I don't know what cost/complexity is involved in setting up the processing pool, but if it is slow we could keep the pool around for the life of the server.

You're right that a plot request coming from the client while the fit thread is running will be much easier to manage with separate pools.

So keep what you've got for now, but in future we may move the complexity of using the pool to bumps so that other applications can more easily do the same sort of thing.

@hoogerheide
Copy link
Contributor

A single data point here. The molgroups plugin generates uncertainty plots during fitting using multiprocessing. I am using the MPMapper to do this. On TACC Stampede (using multiprocessing, not MPI), trying to start the uncertainty pool hangs (at least for several minutes) when the fit is running, but works fine when it's not. On Windows I think this works okay but I haven't tested it recently. There's the fork vs. spawn issue which might be playing a role here.

Point being that there may be some complications in terms of using the same pool, or trying to start a new pool, etc.

@pkienzle
Copy link
Member

MPMapper is storing global state as class attributes. I'm not surprised it fails if you have two of them.

Brian's separate pool should work fine if you are using only a single node on TACC. Obviously both the fit and the plot will be slower as they compete with each other for resources but this is handled by the unix scheduler.

Rather that recompute the function we could keep a running set of extended outputs. These would be available on demand from the fitness function, and we could add a method to the mapper to request them. So long as they are serializable bumps can track them and store them in HDF. This would work for MPI as well as multiprocessing.

As a user I can imagine flipping to an old fit while a new fit is running to compare plots. Even better if I could see them side by side. This may require having a different fit problem in the process pool. Again, easy enough with a separate pool. Much harder on MPI. Hmmm... I wonder if we want to keep the plot serialization in HDF as well so that server doesn't need to do any work when showing plots from old fits.

@bmaranville
Copy link
Member Author

Using an HDF5 dataset as a "live" datastore is a little tricky - there is a SWMR mode (Single Writer, Multiple Reader) which allows concurrent access. I think for contiguous data (not chunked - not compressed) you could probably make it fast. You'd have to know the dataset size at dataset creation to allocate contiguous storage. You can then write to any section of it whenever you want, and read any time.

If the data is chunked I think you have to call the refresh method regularly from the consumers to get updates to the metadata (B-tree of chunk addresses, which changes if chunks are invalidated).

@pkienzle
Copy link
Member

The calculated model state would be much like the MCMC state. You wouldn't need to save it as live data to the HDF, but it could be useful to save it at the end of the fit so that you can more quickly produce plots from saved fit files. Make sure that "best" is one of the samples, since that will be needed for the reflectivity and profile plots.

This is out of scope for the current PR. Open a new ticket if you think this is worthwhile.

@pkienzle
Copy link
Member

Note: one difference from MCMC state is that the state for the different points will have different size so you won't be able to allocate space for them in advance. It is more like a list of strings of arbitrary length.

@bmaranville
Copy link
Member Author

I was actually wondering about that - ragged arrays are hard in HDF5.

@pkienzle
Copy link
Member

You could serialize it as a list of dicts so there is only one large json blob to store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants