Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emcee does not use spawned multithreaded processes #669

Closed
matthiasfabry opened this issue Aug 27, 2020 · 13 comments
Closed

Emcee does not use spawned multithreaded processes #669

matthiasfabry opened this issue Aug 27, 2020 · 13 comments

Comments

@matthiasfabry
Copy link

Description

When using minimizer.emcee(), and supplying a pool of workers, multiple processes are spawned, but do not do any work. In my case (macOS with 16 threads), the sixteen instances of python3.8 appear in Activity Monitor, 15 of which stay idle at 0.0% cpu usage. In practice then, using processes=1 or processes=os.cpu_count() makes no difference in execution time.

A Minimal, Complete, and Verifiable example
import lmfit
import scipy
from multiprocessing.pool import Pool
import os

def cost_fun(params, **kwargs):
    return scipy.optimize.rosen_der([params['a'], params['b']])


if __name__ == '__main__':
    params = lmfit.Parameters()
    params.add('a', 1, min=-5, max=5, vary=True)
    params.add('b', 1, min=-5, max=5, vary=True)

    fitter = lmfit.Minimizer(cost_fun, params)
    with Pool(processes=os.cpu_count()) as pool:
        MC_results = fitter.emcee(workers=pool, steps=10000)
Version information

lmfit: 1.0.1, scipy: 1.5.0, numpy: 1.19.1, asteval: 0.9.16, uncertainties: 3.1.4

@newville
Copy link
Member

newville commented Aug 27, 2020

Is this related to #666?

For both @matthiasfabry and @odstrcilt: you need to read #601 and understand that I, personally, find emcee to be a wart in lmfit. If you want to use it, you want any further changes, you will have to provide them and support them.

@matthiasfabry
Copy link
Author

@newville It is not completely related to #666 since I use the multiprocessing builtin Pool, but the main point here is that the pool subprocesses sit idly while one (probably the main) thread executes the MCMC sampling on its own.

I would say however that it is unrelated to #600. I have a clear use case for posterior sampling, which I believe emcee offers correctly and easily. With this issue I merely raise that the implementation of the underlying parallelizer is incomplete somehow. I'm not at all an expert in these things, so I wouldn't know whether this issue is on the lmfit side or the emcee side (or even deeper, in multiprocessing, but I doubt that), but I would like to use multithreaded capabilities to speed up my MCMC posterior distribution sampling.

@odstrcilt
Copy link

@matthiasfabry it can be an issue on EMCEE side.
Try to change EMCEE source code, in file ensemble.py, function getstate, is necessary to replace
d = self.__dict__
by
d = dict(self.__dict__)

@newville
Copy link
Member

@matthiasfabry "I want it to be faster, so I'll use multiprocessing" while also confusing multiprocessing and multithreading is not at all a reassuring start. Multithreading is almost certainly not worth pursuing. Multiprocessing with lmfit (and possibly with emcee) is definitely complicated by the fact that multiprocessing relies on pickle. You can probably make it work for a silly example like the rosenbrook function, but once you are trying to solve a real problem, dragons will quickly be revealed.

So, sure, maybe multiprocessing will make it be faster, and maybe it will be worth the effort.

@matthiasfabry
Copy link
Author

@newville I agree multiprocessing is what were after. For emcee, I don't actually think dragons should appear when doing this... You can perfectly sample a posterior distribution with workers independently of each other, no matter what function you are minimizing or optimizing for. Again, I have no deep understanding of python or pickle and what that might entail, but at least from a naive mathematical standpoint, it should be possible. In the worst case I can imagine writing a wrapper function spawning subprocesses each calling emcee(steps=totalsteps / cpus), dividing the total number of steps over the different processes. Combining the results into one uncertainty interval however is not trivial with this idea.

@odstrcilt your proposed fix actually slows the execution. The subprocesses do seem to run now, but not at at their full speed (ie cpu usage is not close to 100% per core). Apparently this fix causes a lot over overhead.

@odstrcilt
Copy link

@matthiasfabry yes, multiprocessing has a large overhead. All large arguments of your cost function should be passed at global variables to reduce overhead connected with creating the pickles. If the execution of your cost function is less than 100ms, the multiprocessing can be useless.

I'm using one more trick to reduce the overhead. Instead of executing each function in a pool worker, I split all tasks by a number of workers and then inside of each worker are tasks calculated serially. Here is an example:


class fast_pool:
    #vectorised pool, it reduces multiprocessing overhead
    def __init__(self, pool):
        self.pool = pool


    def map(self, f, arg):
        npool = len(self.pool._pool)

        arg_list = np.array_split(arg, npool)
        vf = np.vectorize(f, signature='(n)->()')

        res_list = self.pool.map(vf,   arg_list)

        return np.hstack(res_list)

from multiprocessing import Pool
global large_data

fitter = lmfit.Minimizer(cost_fun,**fitter_kwds)
fitter.emcee(workers= fast_pool(Pool(n_processes)), steps=10000, **emcee_args)

@newville
Copy link
Member

@matthiasfabry

You can perfectly sample a posterior distribution with workers independently of each other, no matter what function you are minimizing or optimizing for. Again, I have no deep understanding of python or pickle and what that might entail, but at least from a naive mathematical standpoint, it should be possible.

From a "naive mathematical standpoint" one can say almost anything should be possible. We don't deal with naive mathematical standpoints.

Python multiprocessing creates new Python processes and send data from one process to another to do some work and then send data back. If that sounds simple, then you're not thinking very deeply about how to share and send objects between processes.

Like @odstrcilt says, it is not at all unusual for a naive use of multiprocessing to slow down a complex calculation.

@matthiasfabry
Copy link
Author

@newville There is absolutely no need to lecture me. I am only reporting unwanted behavior here, and I never claimed that there is a practical solution for it. As a regular user of your software, I simply noticed the workers argument of minimized.emcee() doesn’t work as advertised in the documentation. It is up to you and the development team to decide whether to fix this. If not, fine, but then remove the feature you claim to provide and then this issue would turn into a feature request. It you do, I’m sure the community would greatly appreciate it, as it will speed up not only my research, but also all other people’s work.

@odstrcilt I will check whether your fix reduces the overhead enough to speed up the execution in my case.

@newville
Copy link
Member

@matthiasfabry @odstrcilt As mentioned earlier and discussed in #601, I lean more to deprecating Minimizer.emcee than
I lean toward trying to make it work better. It simply does not belong with the other methods of Minimizer that are actually solvers - emcee() is not a solver.

Again, if anyone is expecting Minimizer.emcee() to work well and be generally useful then there is real work to do. That will have to be done (and supported) by someone other than me. Perhaps it would make sense to move these routines out of Minimizer. Whether lmfit's emcee method supports multiprocessing could certainly be part of that effort if someone wants to do that.

@matthiasfabry
Copy link
Author

@newville That's fair. emcee is indeed not a solver, but I agree with @reneeotten in #601 that it has its place within lmfit (but indeed maybe not as part of the Minimizer class, that's an implementation issue the developers need to decide on). Doing MCMC posterior distribution sampling is a natural continuation of any high-dimensional minimization problem with a regular minimizer (say Nelder-Mead or Levenberg-Marquardt), where brute forcing is simply too expensive. If I'm not mistaken MCMC also takes autocorrelations naturally into account.

Finally then, emcee does work correctly, mind you, albeit on a single thread. I repeat one last time that I noticed the multiprocessed capabilities don't function as advertised.

@odstrcilt Your custom pool class seems to have a bug in it. array_split() throws the exception:

 File "/Users/matthiasf/Anaconda3/envs/spinOS/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 769, in array_split
    Ntotal = len(ary)
TypeError: object of type 'generator' has no len()

Have you an idea for a fix or workaround?

@odstrcilt
Copy link

@matthiasfabry
ty to replace line
arg_list = np.array_split(arg, npool)
by
arg_list = np.array_split(list(arg), npool)

@newville
Copy link
Member

@matthiasfabry is this resolved? I cannot tell...

@matthiasfabry
Copy link
Author

Not exactly, but it seems this issue is not directly related to lmfit. I will close this, thanks for your input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants