So much data from such little models…

It’s been my experience that even simple models can generate lots of data. If you’re a regular reader of this blog, I can imagine you’ve had similar experiences as well. My most recent experience with this is the work I’ve done with the Dynamic Integrated Climate-Economic model (DICE). I had inherited a port of the 2007 version of the model, which would print relevant output to the screen. During my initial runs with the model, I would simply redirect the output to ascii files for post-processing. I knew that eventually I would be adding all sorts of complexity to this model, ultimately leading to high-dimensional model output and rendering the use of ascii files as impractical. I knew that I would need a better way to handle all this data. So in updating the model to the 2013 version, I decided to incorporate support for netCDF file generation.

You can find details about the netCDF file format through Unidata (a University Cooperation for Atmospheric Research [UCAR] Community Program) and through some of our previous blog posts (here, here, and here). What’s important to note here is that netCDF is a self-describing file format designed to manage high-dimensional hierarchical data sets.

I had become accustomed to netCDF files in my previous life as a meteorologist. Output from complex numerical weather prediction models would often come in netCDF format. While I had never needed to generate my own netCDF output files, I found it incredibly easy and convenient to process them in R (my preferred post-processing and visualization software). Trying to incorporate netCDF output support in my simple model seemed daunting at first, but after a few examples I found online and a little persistence, I had netCDF support incorporated into the DICE model.

The goal of this post is to guide you through the steps to generate and process a netCDF file. Some of our earlier posts go through a similar process using the Python and Matlab interfaces to the netCDF library. While I use R for post-processing, I generally use C/C++ for the modeling; thus I’ll step through generating a netCDF file in C and processing the generated netCDF file in R on a Linux machine.

Edit: I originally put a link to following code at the bottom of this post. For convenience, here’s a link to the bitbucket repository that contains the code examples below.

Writing a netCDF file in C…

Confirm netCDF installation

First, be sure that netCDF is installed on your computing platform. Most scientific computing clusters will have the netCDF library already installed. If not, contact your system administrator to install the library as a module. If you would like to install it yourself, Unidata provides the source code and great documentation to step you through the process. The example I provide here isn’t all that complex, so any recent version (4.0+) should be able to handle this with no problem.

Setup and allocation

Include the header files

With the netCDF libraries installed, you can now begin to code netCDF support into your model. Again, I’ll be using C for this example. Begin by including the netCDF header file with your other include statements:

#include <stdlib.h> #include <stdio.h> #include <netcdf.h>

Setup an error handler

The netCDF library includes a nice way of handling possible errors from the various netCDF functions. I recommend writing a simple wrapper function that can take the returned values of the netCDF functions and produce the appropriate error message if necessary:

void ncError(int val) { printf("Error: %s

", nc_strerror(val)); exit(2); }

Generate some example data

Normally, your model will have generated important data at this point. For the sake of the example, let’s generate some data to put into a netCDF file:

// Loop control variables int i, j, k; // Define the dimension sizes for // the example data. int dim1_size = 10; int dim2_size = 5; int dim3_size = 20; // Define the number of dimensions int ndims = 3; // Allocate the 3D vectors of example data float x[dim1_size][dim2_size][dim3_size]; float y[dim1_size][dim2_size][dim3_size]; float z[dim1_size][dim2_size][dim3_size]; // Generate some example data for(i = 0; i < dim1_size; i++) { for(j = 0; j < dim2_size; j++) { for(k = 0; k < dim3_size; k++) { x[i][j][k] = (i+j+k) * 0.2; y[i][j][k] = (i+j+k) * 1.7; z[i][j][k] = (i+j+k) * 2.4; } } }

This generates three variables, each with three different size dimensions. Think of this, for example, as variables on a 3-D map with dimensions of [latitude, longitude, height]. In my modeling application, my dimensions were [uncertain state-of-the-world, BORG archive solution, time].

Allocate variables for IDs

Everything needed in creating a netCDF file depends on integer IDs, so the next step is to allocate variables for the netCDF file id, the dimension ids, and the variable ids:

// Allocate space for netCDF dimension ids int dim1id, dim2id, dim3id; // Allocate space for the netcdf file id int ncid; // Allocate space for the data variable ids int xid, yid, zid;

Each one of these IDs will be returned through reference by the netCDF functions. While we’re at it, let’s make a variable to hold the return status of the netCDF function calls:

// Allocate return status variable int retval;

Define the meta-data

Now we will start to build the netCDF file. This is a two-part process. The first part is defining the meta-data for the file and the second part is assigning the data.

Create an empty netCDF file

First, create the file:

// Setup the netcdf file if((retval = nc_create("example.nc", NC_NETCDF4, &ncid))) { ncError(retval); }

Note that we store the return status of the function call in retval and test the return status for an error. If there’s an error, we pass retval to our error handler. The first parameter to the function call is the name of the netCDF file. The second parameter is a flag that determines the type of netCDF file. Here we use the latest-and-greatest type of NETCDF4, which includes the HDF5/zlib compression features. If you don’t need these features, or you need a version compatible with older versions of netCDF libraries, then use the default or 64-bit offset (NC_64BIT_OFFSET) versions. The third parameter is the netCDF integer ID used for assigning variables to this file.

Add the dimensions

Now that we have a clean netCDF file to work with, let’s add the dimensions we’ll be using:

// Define the dimensions in the netcdf file if((retval = nc_def_dim(ncid, "dim1_size", dim1_size, &dim1id))) { ncError(retval); } if((retval = nc_def_dim(ncid, "dim2_size", dim2_size, &dim2id))) { ncError(retval); } if((retval = nc_def_dim(ncid, "dim3_size", dim3_size, &dim3id))) { ncError(retval); } // Gather the dimids into an array for defining variables in the netcdf file int dimids[ndims]; dimids[0] = dim1id; dimids[1] = dim2id; dimids[2] = dim3id;

Just as before, we catch and test the function return status for any errors. The function nc_def_dim() takes four parameters. First is the netCDF file ID returned when we created the file. The second parameter is the name of the dimension. Here we’re using “dimX_size” – you would want to use something descriptive of this dimension (i.e. latitude, time, solution, etc.). The third parameter is the size of this dimension (i.e. number of latitude, number of solutions, etc.). The last is the ID for this dimension, which will be used in the next step of assigning variables. Note that we create an array of the dimension IDs to use in the next step.

Add the variables

The last step in defining the meta-data for the netCDF file is to add the variables:

// Define the netcdf variables if((retval = nc_def_var(ncid, "x", NC_FLOAT, ndims, dimids, &xid))) { ncError(retval); } if((retval = nc_def_var(ncid, "y", NC_FLOAT, ndims, dimids, &yid))) { ncError(retval); } if((retval = nc_def_var(ncid, "z", NC_FLOAT, ndims, dimids, &zid))) { ncError(retval); }

The nc_def_var() function takes 6 parameters. These include (in order) the netCDF file ID, the variable name to be displayed in the file, the type of data the variable contains, the number of dimensions of the variable, the IDs for each of the dimensions, and the variable ID (which is returned through reference). The type of data in our example is NC_FLOAT, which is a 32-bit floating point. The netCDF documentation describes the full set of data types covered. The IDs for each dimension are passed as that combined array of dimension IDs we made earlier.

Optional: Add variable attributes

This part is optional, but is incredibly useful and true to the spirit of making a netCDF file. When sharing a netCDF file, the person receiving the file should have all the information they need about the data within the file itself. This can be done by adding “attributes”. For example, let’s add a “units” attribute to each of the variables:

// OPTIONAL: Give these variables units if((retval = nc_put_att_text(ncid, xid, "units", 2, "cm"))) { ncError(retval); } if((retval = nc_put_att_text(ncid, yid, "units", 4, "degC"))) { ncError(retval); } if((retval = nc_put_att_text(ncid, zid, "units", 1, "s"))) { ncError(retval); }

The function nc_put_att_text() puts a text-based attribute onto a variable. The function takes the netCDF ID, the variable ID, the name of the attribute, the length of the string of characters for the attribute, and the text associated with the attribute. In this case, we’re adding an attribute called “units”. Variable ‘x’ has units of “cm”, which has a length of 2. Variable ‘y’ has units of “degC”, which has a length of 4 (and so on). You can apply text-based attributes as shown here or numeric-based attributes using the appropriate nc_put_att_X() function (see documentation for the full list of numeric attribute functions). You can also apply attributes to dimensions by using the appropriate dimension ID or set a global attribute using the ID “0” (zero).

End the meta-data definition portion

At this point, we’ve successfully created a netCDF file and defined the necessary meta-data. We can now end the meta-data portion:

// End "Metadata" mode if((retval = nc_enddef(ncid))) { ncError(retval); }

…and move on to the part 2 of the netCDF file creation process.

Populate the file with data

Put your data into the netCDF file

Here, all we do is put data into the variables we defined in the file:

// Write the data to the file if((retval = nc_put_var(ncid, xid, &x[0][0][0]))) { ncError(retval); } if((retval = nc_put_var(ncid, yid, &y[0][0][0]))) { ncError(retval); } if((retval = nc_put_var(ncid, zid, &z[0][0][0]))) { ncError(retval); }

The function nc_put_var() takes three parameters: the netCDF file ID, the variable ID, and the memory address of the start of the multi-dimensional data array. At this point, the data will be written to the variable in the netCDF file. There is a way to write to the netCDF file in data chunks, which can help with memory management, and a way to use parallel I/O for writing data in parallel to the file, but I have no experience with that (yet). I refer those interested in these features to the netCDF documentation.

Finalize the netCDF file

That’s it! We’re done writing to the netCDF file. Time to close it completely:

// Close the netcdf file if((retval = nc_close(ncid))) { ncError(retval); }

Compile and run the code

Let’s compile and run the code to generate the example netCDF file:

gcc -o netcdf_example netcdf_write_example.c -lnetcdf

Some common problems people run into here are not including the netCDF library flag at the end of the compilation call, not having the header files in the include-path, and/or not having the netCDF library in the library-path. Check your user environment to make sure the netCDF paths are included in your C_INCLUDE_PATH and LIBRARY_PATH:

env | grep –i netcdf

Once the code compiles, run it to generate the example netCDF file:

./netcdf_example

If everything goes according to plan, there should be a file called “example.nc” in the same directory as your compiled code. Let’s load this up in R for some post-processing.

Reading a netCDF file in R…

Install and load the “ncdf4” package

To start using netCDF files in R, be sure to install the netCDF package “ncdf4”:

install.packages("ncdf4") library(ncdf4)

Note that there’s also an “ncdf” package. The “ncdf” package reads and writes the classic (default) and 64-bit offset versions of netCDF file. I recommend against using this package as the new package “ncdf4” can handle the old file versions as well as the new netCDF4 version. Turns out the “ncdf” package has been removed from the CRAN repository. It’s just as well since the new “ncdf4” package obsoletes the “ncdf” package.





Open the netCDF file

With the library installed and sourced, let’s open the example netCDF file we just created:

nc <- nc_open("example.nc")

This stores an open file handle to the netCDF file.

View summary of netCDF file

Calling or printing the open file handle will produce a quick summary of the contents of the netCDF file:

print(nc)

This summary produces the names of the available variables, the appropriate dimensions, and any global/dimension/variable attributes.

Extract variables from the netCDF file

To extract those variables, use the command:

x <- ncvar_get(nc, "x") y <- ncvar_get(nc, "y") z <- ncvar_get(nc, "z")

At this point, the data you extracted from the netCDF file are loaded into your R environment as 3-dimensional arrays. You can treat these the same as you would any multi-dimensional array of data (i.e. subsetting, plotting, etc.). Note that the dimensions are reported in reverse order from which you created the variables.

dim(x)

Close the netCDF file

When you’re done, close the netCDF file:

nc_close(nc)

And there you have it! Hopefully this step-by-step tutorial has helped you incorporate netCDF support into your project. The code I described here is available through bitbucket.

Happy computing!

~Greg