Reading data files¶
You will normally access data from a model experiement, which is stored in a
directory containing netcdf files. You can load the meta-data which is
associated with such an experiement by calling RunDirectory()
.
Applying RunDirectory()
only loads meta data, that is which model,
who many data files are present and any other meta-data that is important for
the experiment.
-
class
esm_analysis.
RunDirectory
(run_dir, *, prefix=None, model_type=None, overwrite=False, f90name_list=None, filetype='nc', client=None)[source]¶ Open data in experiment folder.
-
__init__
(run_dir, *, prefix=None, model_type=None, overwrite=False, f90name_list=None, filetype='nc', client=None)[source]¶ Create an RunDirecotry object from a given input directory.
run = RunDirectory('/work/mh0066/precip-project/3-hourly/CMORPH')
The RunDirectory object gathers all nesseccary information on the data that is stored in the run directory. Once loaded the most important meta data will be stored in the run directory for faster access the second time.
- Parameters
run_dir (str) – Name of the directory where the data that should be read is stored.
prefix (str, optional (default: None)) – filname prefix
model_type (str, optional (default: None)) – model name/ observation porduct that created the data. This will be used to generate a variable lookup table. This can be useful for loading various model datasets and comparing them while only accessing the data with one set of variable names. By default no lookupt table will be generated.
overwrite (bool, optional (default : False)) – If true the meta data will be generated again even if it has been stored to disk already.
f90name_list (str, optional (default: None)) – Filename to an optional f90 namelist with additional information about the data
filetype (str, optional (default: nc)) – Input data file format
client (dask.distributed cleint, optional (default: None)) – Configuration that is used the create a dask client which recieves tasks for multiproccessing. By default (None) a local client will be started.
-
run_dir
¶ The name of the directory that has been loaded
-
files
¶ Apply a given function to the dataset via the dask scheduling client
-
status
¶ Returns the status of the associated dask worker client
-
remap
(grid_description, inp=None, out_dir=None, *, method='weighted', weightfile=None, options='-f nc4', grid_file=None)[source]¶ Regrid to a different input grid.
run.remap('echam_griddes.txt', method='remapbil')
- Parameters
grid_description (str) – Path to file containing the output grid description
inp ((collection of) str, xarray.Dataset, xarray.DataArray) – Filenames that are to be remapped.
out_dir (str (default: None)) – Directory name for the output
weight_file (str (default: None)) – Path to file containing grid weights
method (str (default: weighted)) – Remap method that is applyied to the data, can be either weighted (default), bil, con, laf, nn. If weighted is chosen this class should have been instanciated either with a given weightfile or using the gen_weights methods.
weightfile (str (default: None)) – File containing the weights for the distance weighted remapping.
grid_file (str (default: None)) – file containing the source grid describtion
options (str (default: -f nc4)) – additional file options that are passed to cdo
- Returns
Collection of output
- Return type
(str, xarray.DataArray, xarray.Dataset)
-
static
apply_function
(mappable, collection, *, args=None, client=None, **kwargs)[source]¶ Apply function to given collection.
result = run.apply_function(lambda d, v: d[v].sum(dim='time'), run.dataset, args=('temp',))
- Parameters
mappable (method) – method that is applied
collection (collection) – collection that is distributed in a thread pool
args – additional arguments passed into the method
client (dask distributed client (default: None)) – worker scheduler client that submits the jobs. If None is given a new client is started
progress (bool (default: True)) – display tqdm progress bar
**kwargs (optional) – additional keyword arguments controlling the progress bar parameter
- Returns
combined output of the thread-pool processes
- Return type
collection
-
property
files
¶ Return all files that have been opened.
-
classmethod
gen_weights
(griddes, run_dir, *, prefix=None, model_type='ECHAM', infile=None, overwrite=False, client=None)[source]¶ Create grid weigths from grid description and instanciate class.
run = RunDirectory.gen_weights('echam_grid.txt', '/work/mh0066/precip-project/3-hourly/CMORPH/', infile='griddes.nc')
- Parameters
griddess (str) – filename containing the desired output grid information
run_dir (str) – path to the experiment directory
prefix (str) – filename prefix
model_type (str) – Model/Product name of the dataset to be read
infile (str) – Path to input file. By default the method looks for appropriate inputfiles
overwrite (bool, optional (default: False)) – should an existing weight file be overwritten
- Returns
RunDirectory
- Return type
RunDirectory object
-
load_data
(filenames=None, **kwargs)[source]¶ Open a multifile dataset using xrarray open_mfdataset.
dset = run.load_data('*2008*.nc')
- Parameters
filenames (collection/str) – collection of filenames, filename or glob pattern for filenames that should be read. Default behavior is reading all dataset files
**kwargs (optional) – Additional keyword arguments passed to xarray’s open_mfdataset
- Returns
Xarray (multi-file) dataset
- Return type
xarray.Dataset
-
remap
(grid_description, inp=None, out_dir=None, *, method='weighted', weightfile=None, options='-f nc4', grid_file=None)[source]¶ Regrid to a different input grid.
run.remap('echam_griddes.txt', method='remapbil')
- Parameters
grid_description (str) – Path to file containing the output grid description
inp ((collection of) str, xarray.Dataset, xarray.DataArray) – Filenames that are to be remapped.
out_dir (str (default: None)) – Directory name for the output
weight_file (str (default: None)) – Path to file containing grid weights
method (str (default: weighted)) – Remap method that is applyied to the data, can be either weighted (default), bil, con, laf, nn. If weighted is chosen this class should have been instanciated either with a given weightfile or using the gen_weights methods.
weightfile (str (default: None)) – File containing the weights for the distance weighted remapping.
grid_file (str (default: None)) – file containing the source grid describtion
options (str (default: -f nc4)) – additional file options that are passed to cdo
- Returns
Collection of output
- Return type
(str, xarray.DataArray, xarray.Dataset)
-
property
run_dir
¶ Get the name of the experiment path.
-
property
status
¶ Query the status of the dask client.
-
Loading the Data¶
Creating an instance of the RunDirecotry()
object won’t load any data. To get
access to the netcdf data the load_data()
method has to be apply
-
class
esm_analysis.
RunDirectory
[source]¶ -
load_data
(filenames=None, **kwargs)[source]¶ Open a multifile dataset using xrarray open_mfdataset.
dset = run.load_data('*2008*.nc')
- Parameters
filenames (collection/str) – collection of filenames, filename or glob pattern for filenames that should be read. Default behavior is reading all dataset files
**kwargs (optional) – Additional keyword arguments passed to xarray’s open_mfdataset
- Returns
Xarray (multi-file) dataset
- Return type
xarray.Dataset
-
dataset
¶ xarray dataset that contains the model data
-
remap
(grid_description, inp=None, out_dir=None, *, method='weighted', weightfile=None, options='-f nc4', grid_file=None)[source]¶ Regrid to a different input grid.
run.remap('echam_griddes.txt', method='remapbil')
- Parameters
grid_description (str) – Path to file containing the output grid description
inp ((collection of) str, xarray.Dataset, xarray.DataArray) – Filenames that are to be remapped.
out_dir (str (default: None)) – Directory name for the output
weight_file (str (default: None)) – Path to file containing grid weights
method (str (default: weighted)) – Remap method that is applyied to the data, can be either weighted (default), bil, con, laf, nn. If weighted is chosen this class should have been instanciated either with a given weightfile or using the gen_weights methods.
weightfile (str (default: None)) – File containing the weights for the distance weighted remapping.
grid_file (str (default: None)) – file containing the source grid describtion
options (str (default: -f nc4)) – additional file options that are passed to cdo
- Returns
Collection of output
- Return type
(str, xarray.DataArray, xarray.Dataset)
-
static
apply_function
(mappable, collection, *, args=None, client=None, **kwargs)[source]¶ Apply function to given collection.
result = run.apply_function(lambda d, v: d[v].sum(dim='time'), run.dataset, args=('temp',))
- Parameters
mappable (method) – method that is applied
collection (collection) – collection that is distributed in a thread pool
args – additional arguments passed into the method
client (dask distributed client (default: None)) – worker scheduler client that submits the jobs. If None is given a new client is started
progress (bool (default: True)) – display tqdm progress bar
**kwargs (optional) – additional keyword arguments controlling the progress bar parameter
- Returns
combined output of the thread-pool processes
- Return type
collection
-
classmethod
gen_weights
(griddes, run_dir, *, prefix=None, model_type='ECHAM', infile=None, overwrite=False, client=None)[source]¶ Create grid weigths from grid description and instanciate class.
run = RunDirectory.gen_weights('echam_grid.txt', '/work/mh0066/precip-project/3-hourly/CMORPH/', infile='griddes.nc')
- Parameters
griddess (str) – filename containing the desired output grid information
run_dir (str) – path to the experiment directory
prefix (str) – filename prefix
model_type (str) – Model/Product name of the dataset to be read
infile (str) – Path to input file. By default the method looks for appropriate inputfiles
overwrite (bool, optional (default: False)) – should an existing weight file be overwritten
- Returns
RunDirectory
- Return type
RunDirectory object
-