Creating a cluster for distributed processing

esm_analysis supports creating HPC style clusters for distributed data processing using dask-mpi. At the moment only clusters created by the slurm workload manager are supported.

class esm_analysis.MPICluster(script, workdir, submit_time=None, batch_system=None, job_id=None)[source]

Create Cluster of distrbuted workers.

classmethod load(workdir)[source]

Load the information of a running cluster.

This method can be used to connect to an already running cluster.

from esm_analysis import MPICluster
cluster = MPICluster.load('/tmp/old_cluster')
Parameters

workdir (str) – Directory name where information of the previously created cluster is stored. The information on the work directory can be retrieved by calling the workdir property

Returns

Instance of the MPICluster object

Return type

esm_analysis.MPICluster

classmethod slurm(account, queue, *, slurm_extra=[''], memory='140G', workdir=None, walltime='01:00:00', cpus_per_task=48, name='dask_job', nworkers=1, job_extra=None)[source]

Create an MPI cluster using slurm.

This method sets up a cluster with help of the workload manager slurm.

from esm_analysis import MPICluster
cluster = MPICluster.slurm('account', 'express', nworkers=10)

The jobs will immediately be submitted to the workload manager upon creation of the instance.

Parameters
  • account (str) – Account name

  • queue (str) – partition job should be submitted to

  • walltime (str, optional (default: '01:00:00')) – lenth of the job

  • name (str, optional (default: dask_job)) – name of the job

  • workdir (str, optional (default: None)) – name of the workdirectory, if None is given, a temporary directory is used.

  • cpus_per_task (int, optional (default: 48)) – number of cpus per node

  • memory (str, optional (default: 140G)) – allocated memory per node

  • nworkers (int, optional (default: 1)) – number of nodes used in the job

  • job_extra (str, optional (default: None)) – additional commands that should be executed in the run sript

  • slurm_extra (list, optional (default: None)) – additional slurm directives

Returns

Instance of the MPICluster object

Return type

esm_analysis.MPICluster

job_script

A representation of the job script that was submitted

submit_time

datetime.datetime ojbect representing the time the job script was submitted

workdir

The working directory that was used to submit the job to the cluster

job_id

The Id of the submitted job script