Creating a cluster for distributed processing¶
esm_analysis supports creating HPC style clusters for distributed data processing using dask-mpi. At the moment only clusters created by the slurm workload manager are supported.
-
class
esm_analysis.
MPICluster
(script, workdir, submit_time=None, batch_system=None, job_id=None)[source]¶ Create Cluster of distrbuted workers.
-
classmethod
load
(workdir)[source]¶ Load the information of a running cluster.
This method can be used to connect to an already running cluster.
from esm_analysis import MPICluster cluster = MPICluster.load('/tmp/old_cluster')
- Parameters
workdir (str) – Directory name where information of the previously created cluster is stored. The information on the work directory can be retrieved by calling the workdir property
- Returns
Instance of the MPICluster object
- Return type
-
classmethod
slurm
(account, queue, *, slurm_extra=[''], memory='140G', workdir=None, walltime='01:00:00', cpus_per_task=48, name='dask_job', nworkers=1, job_extra=None)[source]¶ Create an MPI cluster using slurm.
This method sets up a cluster with help of the workload manager slurm.
from esm_analysis import MPICluster cluster = MPICluster.slurm('account', 'express', nworkers=10)
The jobs will immediately be submitted to the workload manager upon creation of the instance.
- Parameters
account (str) – Account name
queue (str) – partition job should be submitted to
walltime (str, optional (default: '01:00:00')) – lenth of the job
name (str, optional (default: dask_job)) – name of the job
workdir (str, optional (default: None)) – name of the workdirectory, if None is given, a temporary directory is used.
cpus_per_task (int, optional (default: 48)) – number of cpus per node
memory (str, optional (default: 140G)) – allocated memory per node
nworkers (int, optional (default: 1)) – number of nodes used in the job
job_extra (str, optional (default: None)) – additional commands that should be executed in the run sript
slurm_extra (list, optional (default: None)) – additional slurm directives
- Returns
Instance of the MPICluster object
- Return type
-
job_script
¶ A representation of the job script that was submitted
-
submit_time
¶ datetime.datetime ojbect representing the time the job script was submitted
-
workdir
¶ The working directory that was used to submit the job to the cluster
-
job_id
¶ The Id of the submitted job script
-
classmethod