Integration with emcee

PyCOMPSs can also be used with emcee in order to enable its execution in distributed environments.

Usage

Enabling emcee with PyCOMPSs is easy. Assuming that you have emcee and COMPSs installed, there are two requirements:

Define the sampling function as task
Import the PyCOMPSs map module (from pycompss.functions import map as pycompss_pool) and use it in the EnsembleSampler pool parameter.

Sample Application

The following code (Code 132) shows how to enable emcee applications with PyCOMPSs, highlighting the modifications required.

Code 132 emcee with PyCOMPSs application example (sampling_pycompss.py)

import time
import numpy as np
import emcee
from pycompss.api.task import task
from pycompss.functions import map as pycompss_pool


def execution_params():
    """Define execution parameters."""
    np.random.seed(42)
    initial = np.random.randn(32, 5)
    nwalkers, ndim = initial.shape
    nsteps = 10
    return initial, nwalkers, ndim, nsteps


@task(returns=1)
def log_prob(theta):
    """Sampling function to apply."""
    time.sleep(0.2)  # Computation load simulation
    return -0.5 * np.sum(theta**2)


def emcee_pycompss(params):
    """emcee usage with PyCOMPSs."""
    initial, nwalkers, ndim, nsteps = params
    sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pycompss_pool)
    start = time.time()
    result = sampler.run_mcmc(initial, nsteps, progress=True)
    end = time.time()
    print("PyCOMPSs took {0:.1f} seconds".format(end - start))
    return result

if __name__ == "__main__":
      params = execution_params()
      result_pycompss = emcee_pycompss(params)

Tip

The integration is not limited to its usage with the pycompss_pool. It is possible to define more tasks and invoke them from the emcee_pycompss function in order to parallelize any preprocessing of the initial data or any postprocessing of the result.

Execution

An emcee application parallelized with PyCOMPSs MUST be executed as any COMPSs application (for full description about the execution environments and options please check the Execution Environments Section.).

For example, we can run Code 132 locally (using the PyCOMPSs CLI) with the following script:

pycompss run \
  --graph \
  sampling_pycompss.py

The execution output is:

[ INFO ] Inferred PYTHON language
[ INFO ] Using default location for project file: /opt/COMPSs//Runtime/configuration/xml/projects/default_project.xml
[ INFO ] Using default location for resources file: /opt/COMPSs//Runtime/configuration/xml/resources/default_resources.xml
[ INFO ] Using default execution type: compss

----------------- Executing sampling_pycompss.py --------------------------

WARNING: COMPSs Properties file is null. Setting default values
[(647)    API]  -  Starting COMPSs Runtime v2.10.rc2205 (build 20220527-0842.r791bf7461bad1a1fab8f45853be7ba1c28b7bf93)
100%|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| 10/10 [00:25<00:00,  2.51s/it]
PyCOMPSs took 31.8 seconds
[(34243)    API]  -  Execution Finished

------------------------------------------------------------

And the task dependency graph achieved:

The task dependency graph of the :numref:`code_emcee_pycompss` execution

Figure 6 Task dependency graph of the Code 132 execution

Tip

The larger the initial array, the more parallelism can be achieved (larger width in the task dependency graph enabling to be executed in more resources).

If the nsteps is increased, the more iterations will be performed (larger height in the task dependency graph).