emcee

The Python ensemble sampling toolkit for affine-invariant MCMC

Seriously kick-ass MCMC sampling

Once upon a time, there was an algorithm. This algorithm became very popular but it turned out that there were better ones. Some folks came up with a better one and some other folks implemented it in Python. It's a little crazy how much it rocks.

emcee is an extensible, pure-Python implementation of of Goodman & Weare's Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler. It's designed for Bayesian parameter estimation and it's really sweet!

Prerequisites

  • numpy: Numerical Python (you can use pip to install this)
  • pip (optional): The Python package manager

Standard Install

Do one of the following:

  1. sudo pip install emcee or
  2. Download the source and run

    sudo python setup.py install

    in the unzipped directory.

Then run the tests to make sure that all went as planned:

python -c 'import emcee; emcee.test()'

Bleeding Edge

For all the hackers in the house... you know what to do!

Fork emcee on GitHub

Using emcee is dead simple...
import numpy as np
import emcee

def lnprob(x, ivar):
    return -0.5*np.sum(ivar * x**2)

ndim, nwalkers = 10, 100
ivar = 1./np.random.rand(ndim)
p0 = [np.random.rand(ndim) for i in xrange(nwalkers)]

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=[ivar])
sampler.run_mcmc(p0, 1000)

...and it's blazingly fast!

Warning: If you used a previous version of this code, you'll find that the chain property is a different shape (nwalkers, nsteps, ndim) instead of (nwalkers, ndim, nsteps).

So you want some more info? Why don't you check out some examples?

Simple: Multivariate Gaussian

Sample a Gaussian in 50 dimensions.

Advanced: Rosenbrock Density

Sample a highly anisotropic density in 2-dimensions using multiprocessing.

Parallelization

In principle, adding multiprocessing support is as simple as instantiating the sampler object with the threads argument set to an integer greater than 1.

In practice, this only works when your likelihood function is pickleable.

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=[means, icov], threads=15)

The annotated source code is an excellent place to get information about how this code works. Most of the core functionality is in the EnsembleSampler object that inherits from the Sampler abstract base class. There is also a simple Metropolis-Hastings sampler for comparison. You can also take a look at the unit tests.

emcee.Sampler (dim, lnprobfn, args=[])

An abstract sampler object that implements various helper functions

Arguments

  • dim (int): Number of dimensions in the parameter space.
  • lnpostfn (callable): A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.

Keyword Arguments

  • args (list): Optional list of extra arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args).

Properties

  • acceptance_fraction: An array (length: k) of the fraction of steps accepted for each walker.
  • acor: The autocorrelation time of each parameter in the chain (length: dim) as estimated by the acor module.
  • chain: A pointer to the Markov chain itself. The shape of this array is (k, dim, iterations).
  • flatchain: A shortcut for accessing chain flattened along the zeroth (walker) axis.
  • lnprobability: A pointer to the matrix of the value of lnprobfn produced at each step for each walker. The shape is (k, iterations).
  • random_state: The state of the internal random number generator. In practice, it's the result of calling get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.

Methods

    emcee.Sampler.clear_chain ()

    An alias for reset kept for backwards compatibility.

    emcee.Sampler.get_lnprob (p)

    Return the log-probability at the given position.

    emcee.Sampler.reset ()

    Clear chain, lnprobability and the bookkeeping parameters.

    emcee.Sampler.run_mcmc (pos0, N, rstate0=None, lnprob0=None)

    Iterate sample for N iterations and return the result. The arguments are passed directly to sample so see the parameter details given in sample.

emcee.EnsembleSampler (k, *args, **kwargs)

A generalized Ensemble sampler that uses 2 ensembles for parallelization.

This is a subclass of the Sampler object. See the Sampler object for more details about the inherited properties.

Arguments

  • k (int): The number of Goodman & Weare "walkers".
  • dim (int): Number of dimensions in the parameter space.
  • lnpostfn (callable): A function that takes a vector in the parameter space as input and returns the natural logarithm of the posterior probability for that position.

Keyword Arguments

  • a (float): The proposal scale parameter. (default: 2.0)
  • args (list): Optional list of extra arguments for lnpostfn. lnpostfn will be called with the sequence lnpostfn(p, *args).
  • postargs (list): Alias of args for backwards compatibility.
  • threads (int): The number of threads to use for parallelization. If threads == 1, then the multiprocessing module is not used but if threads > 1, then a Pool object is created and calls to lnpostfn are run in parallel. (default: 1)
  • pool (multiprocessing.Pool): An alternative method of using the parallelized algorithm. If pool is not None, the value of threads is ignored and the provided Pool is used for all parallelization. (default: None)

Exceptions

  • AssertionError: If k < 2*dim or if k is not even.

Properties

  • acceptance_fraction: An array (length: k) of the fraction of steps accepted for each walker.
  • acor: The autocorrelation time of each parameter in the chain (length: dim) as estimated by the acor module.
  • chain: A pointer to the Markov chain itself. The shape of this array is (k, dim, iterations).
  • flatchain: A shortcut for accessing chain flattened along the zeroth (walker) axis.
  • lnprobability: A pointer to the matrix of the value of lnprobfn produced at each step for each walker. The shape is (k, iterations).
  • random_state: The state of the internal random number generator. In practice, it's the result of calling get_state() on a numpy.random.mtrand.RandomState object. You can try to set this property but be warned that if you do this and it fails, it will do so silently.

Methods

    emcee.EnsembleSampler.clear_chain ()

    An alias for reset kept for backwards compatibility.

    emcee.EnsembleSampler.get_lnprob (p)

    Return the log-probability at the given position.

    emcee.EnsembleSampler.reset ()

    Clear chain, lnprobability and the bookkeeping parameters.

    emcee.EnsembleSampler.run_mcmc (pos0, N, rstate0=None, lnprob0=None)

    Iterate sample for N iterations and return the result. The arguments are passed directly to sample so see the parameter details given in sample.

    emcee.EnsembleSampler.sample (p0, lnprob0=None, rstate0=None, iterations=1, **kwargs)

    Advance the chain iterations steps as an iterator.

    Arguments
    • pos0 (numpy.ndarray): A list of the initial positions of the walkers in the parameter space. The shape is (k, dim).
    Keyword Arguments
    • lnprob0 (numpy.ndarray): The list of log posterior probabilities for the walkers at positions given by p0. If lnprob is None, the initial values are calculated. The shape is (k, dim).
    • rstate0 (tuple): The state of the random number generator. See the Sampler.random_state property for details.
    • iterations (int): The number of steps to run. (default: 1)
    Yields
    • pos (numpy.ndarray): A list of the current positions of the walkers in the parameter space. The shape is (k, dim).
    • lnprob (numpy.ndarray): The list of log posterior probabilities for the walkers at positions given by pos. The shape is (k, dim).
    • rstate (tuple): The state of the random number generator.

emcee.Ensemble (sampler)

A sub-ensemble object that actually does the heavy lifting of the likelihood calculations and proposals of a new position.

Arguments

  • sampler (Sampler): The sampler object that this sub-ensemble should be connected to.

Methods

    emcee.Ensemble.get_lnprob (pos=None)

    Calculate the vector of log-probability for the walkers.

    Keyword Arguments
    • pos (numpy.ndarray): The position vector in parameter space where the probability should be calculated. This defaults to the current position unless a different one is provided.
    Returns
    • lnprob (numpy.ndarray): A vector of log-probabilities with one entry for each walker in this sub-ensemble.

    emcee.Ensemble.propose_position (ensemble)

    Propose a new position for another ensemble given the current positions

    Arguments
    • ensemble (Ensemble): The ensemble to be advanced.
    Returns
    • q (numpy.array): The new proposed positions for the walkers in ensemble.
    • newlnprob (numpy.ndarray): The vector of log-probabilities at the positions given by q.
    • accept (numpy.ndarray): A vector of bools indicating whether or not the proposed position for each walker should be accepted.

What are you talking about when you say "walkers"?

Walkers are the members of the ensemble. They are almost like separate Metropolis-Hastings chains but, of course, the proposal distribution for a given walker depends on the positions of all the other walkers in the ensemble. See Goodman & Weare (2010) for more details.

How should I initialize the walkers?

The best technique seems to be to start in a small ball around the a priori preferred position. Don't worry, the walkers quickly branch out and explore the rest of the space.

I'm getting weird spikes in my data/I have low acceptance fractions/both... what should I do?

Double the number of walkers. If that doesn't work, double it again. And again. Until you run out of RAM. At that point, I don't know!

The walkers are getting stuck in "islands" of low likelihood. How can I fix that?

Try increasing the number of walkers. If that doesn't work, you can try pruning using a clustering algorithm like the one found in arxiv:1104.2612.

This is awesome! Who do I cite?

If you find this useful, please cite us. Also, please fork us on GitHub so we can all benefit from you changes!