Reading data files

You will normally access data from a model experiement, which is stored in a directory containing netcdf files. You can load the meta-data which is associated with such an experiement by calling RunDirectory(). Applying RunDirectory() only loads meta data, that is which model, who many data files are present and any other meta-data that is important for the experiment.

class esm_analysis.RunDirectory(run_dir, *, prefix=None, model_type=None, overwrite=False, f90name_list=None, filetype='nc', client=None)[source]

Open data in experiment folder.

__init__(run_dir, *, prefix=None, model_type=None, overwrite=False, f90name_list=None, filetype='nc', client=None)[source]

Create an RunDirecotry object from a given input directory.

run = RunDirectory('/work/mh0066/precip-project/3-hourly/CMORPH')

The RunDirectory object gathers all nesseccary information on the data that is stored in the run directory. Once loaded the most important meta data will be stored in the run directory for faster access the second time.

Parameters
  • run_dir (str) – Name of the directory where the data that should be read is stored.

  • prefix (str, optional (default: None)) – filname prefix

  • model_type (str, optional (default: None)) – model name/ observation porduct that created the data. This will be used to generate a variable lookup table. This can be useful for loading various model datasets and comparing them while only accessing the data with one set of variable names. By default no lookupt table will be generated.

  • overwrite (bool, optional (default : False)) – If true the meta data will be generated again even if it has been stored to disk already.

  • f90name_list (str, optional (default: None)) – Filename to an optional f90 namelist with additional information about the data

  • filetype (str, optional (default: nc)) – Input data file format

  • client (dask.distributed cleint, optional (default: None)) – Configuration that is used the create a dask client which recieves tasks for multiproccessing. By default (None) a local client will be started.

run_dir

The name of the directory that has been loaded

files

Apply a given function to the dataset via the dask scheduling client

close_client()[source]

Close the opened dask client.

restart_client()[source]

Restart the opened dask client.

status

Returns the status of the associated dask worker client

remap(grid_description, inp=None, out_dir=None, *, method='weighted', weightfile=None, options='-f nc4', grid_file=None)[source]

Regrid to a different input grid.

run.remap('echam_griddes.txt', method='remapbil')
Parameters
  • grid_description (str) – Path to file containing the output grid description

  • inp ((collection of) str, xarray.Dataset, xarray.DataArray) – Filenames that are to be remapped.

  • out_dir (str (default: None)) – Directory name for the output

  • weight_file (str (default: None)) – Path to file containing grid weights

  • method (str (default: weighted)) – Remap method that is applyied to the data, can be either weighted (default), bil, con, laf, nn. If weighted is chosen this class should have been instanciated either with a given weightfile or using the gen_weights methods.

  • weightfile (str (default: None)) – File containing the weights for the distance weighted remapping.

  • grid_file (str (default: None)) – file containing the source grid describtion

  • options (str (default: -f nc4)) – additional file options that are passed to cdo

Returns

Collection of output

Return type

(str, xarray.DataArray, xarray.Dataset)

static apply_function(mappable, collection, *, args=None, client=None, **kwargs)[source]

Apply function to given collection.

result = run.apply_function(lambda d, v: d[v].sum(dim='time'),
                            run.dataset, args=('temp',))
Parameters
  • mappable (method) – method that is applied

  • collection (collection) – collection that is distributed in a thread pool

  • args – additional arguments passed into the method

  • client (dask distributed client (default: None)) – worker scheduler client that submits the jobs. If None is given a new client is started

  • progress (bool (default: True)) – display tqdm progress bar

  • **kwargs (optional) – additional keyword arguments controlling the progress bar parameter

Returns

combined output of the thread-pool processes

Return type

collection

close_client()[source]

Close the opened dask client.

property files

Return all files that have been opened.

classmethod gen_weights(griddes, run_dir, *, prefix=None, model_type='ECHAM', infile=None, overwrite=False, client=None)[source]

Create grid weigths from grid description and instanciate class.

run = RunDirectory.gen_weights('echam_grid.txt',
                '/work/mh0066/precip-project/3-hourly/CMORPH/',
                infile='griddes.nc')
Parameters
  • griddess (str) – filename containing the desired output grid information

  • run_dir (str) – path to the experiment directory

  • prefix (str) – filename prefix

  • model_type (str) – Model/Product name of the dataset to be read

  • infile (str) – Path to input file. By default the method looks for appropriate inputfiles

  • overwrite (bool, optional (default: False)) – should an existing weight file be overwritten

Returns

RunDirectory

Return type

RunDirectory object

load_data(filenames=None, **kwargs)[source]

Open a multifile dataset using xrarray open_mfdataset.

dset = run.load_data('*2008*.nc')
Parameters
  • filenames (collection/str) – collection of filenames, filename or glob pattern for filenames that should be read. Default behavior is reading all dataset files

  • **kwargs (optional) – Additional keyword arguments passed to xarray’s open_mfdataset

Returns

Xarray (multi-file) dataset

Return type

xarray.Dataset

remap(grid_description, inp=None, out_dir=None, *, method='weighted', weightfile=None, options='-f nc4', grid_file=None)[source]

Regrid to a different input grid.

run.remap('echam_griddes.txt', method='remapbil')
Parameters
  • grid_description (str) – Path to file containing the output grid description

  • inp ((collection of) str, xarray.Dataset, xarray.DataArray) – Filenames that are to be remapped.

  • out_dir (str (default: None)) – Directory name for the output

  • weight_file (str (default: None)) – Path to file containing grid weights

  • method (str (default: weighted)) – Remap method that is applyied to the data, can be either weighted (default), bil, con, laf, nn. If weighted is chosen this class should have been instanciated either with a given weightfile or using the gen_weights methods.

  • weightfile (str (default: None)) – File containing the weights for the distance weighted remapping.

  • grid_file (str (default: None)) – file containing the source grid describtion

  • options (str (default: -f nc4)) – additional file options that are passed to cdo

Returns

Collection of output

Return type

(str, xarray.DataArray, xarray.Dataset)

restart_client()[source]

Restart the opened dask client.

property run_dir

Get the name of the experiment path.

property status

Query the status of the dask client.

Loading the Data

Creating an instance of the RunDirecotry() object won’t load any data. To get access to the netcdf data the load_data() method has to be apply

class esm_analysis.RunDirectory[source]
load_data(filenames=None, **kwargs)[source]

Open a multifile dataset using xrarray open_mfdataset.

dset = run.load_data('*2008*.nc')
Parameters
  • filenames (collection/str) – collection of filenames, filename or glob pattern for filenames that should be read. Default behavior is reading all dataset files

  • **kwargs (optional) – Additional keyword arguments passed to xarray’s open_mfdataset

Returns

Xarray (multi-file) dataset

Return type

xarray.Dataset

dataset

xarray dataset that contains the model data

remap(grid_description, inp=None, out_dir=None, *, method='weighted', weightfile=None, options='-f nc4', grid_file=None)[source]

Regrid to a different input grid.

run.remap('echam_griddes.txt', method='remapbil')
Parameters
  • grid_description (str) – Path to file containing the output grid description

  • inp ((collection of) str, xarray.Dataset, xarray.DataArray) – Filenames that are to be remapped.

  • out_dir (str (default: None)) – Directory name for the output

  • weight_file (str (default: None)) – Path to file containing grid weights

  • method (str (default: weighted)) – Remap method that is applyied to the data, can be either weighted (default), bil, con, laf, nn. If weighted is chosen this class should have been instanciated either with a given weightfile or using the gen_weights methods.

  • weightfile (str (default: None)) – File containing the weights for the distance weighted remapping.

  • grid_file (str (default: None)) – file containing the source grid describtion

  • options (str (default: -f nc4)) – additional file options that are passed to cdo

Returns

Collection of output

Return type

(str, xarray.DataArray, xarray.Dataset)

static apply_function(mappable, collection, *, args=None, client=None, **kwargs)[source]

Apply function to given collection.

result = run.apply_function(lambda d, v: d[v].sum(dim='time'),
                            run.dataset, args=('temp',))
Parameters
  • mappable (method) – method that is applied

  • collection (collection) – collection that is distributed in a thread pool

  • args – additional arguments passed into the method

  • client (dask distributed client (default: None)) – worker scheduler client that submits the jobs. If None is given a new client is started

  • progress (bool (default: True)) – display tqdm progress bar

  • **kwargs (optional) – additional keyword arguments controlling the progress bar parameter

Returns

combined output of the thread-pool processes

Return type

collection

classmethod gen_weights(griddes, run_dir, *, prefix=None, model_type='ECHAM', infile=None, overwrite=False, client=None)[source]

Create grid weigths from grid description and instanciate class.

run = RunDirectory.gen_weights('echam_grid.txt',
                '/work/mh0066/precip-project/3-hourly/CMORPH/',
                infile='griddes.nc')
Parameters
  • griddess (str) – filename containing the desired output grid information

  • run_dir (str) – path to the experiment directory

  • prefix (str) – filename prefix

  • model_type (str) – Model/Product name of the dataset to be read

  • infile (str) – Path to input file. By default the method looks for appropriate inputfiles

  • overwrite (bool, optional (default: False)) – should an existing weight file be overwritten

Returns

RunDirectory

Return type

RunDirectory object