The main product delivered by ESMF is the ESMF library that allows application developers to write programs based on the ESMF API. In addition to the programming library, ESMF distributions come with a small set of applications that are of general interest to the community. These applications utilize the ESMF library to implement features such as printing general information about the ESMF installation, or generating regrid weight files. The provided ESMF applications are intended to be used as standard command line tools.
The bundled ESMF applications are built and installed during the usual ESMF installation process, which is described in detail in the ESMF User's Guide section "Building and Installing the ESMF". After the installation the applications will be located in the ESMF_APPSDIR directory, which can be found as a Makefile variable in the esmf.mk file. The esmf.mk file can be found in the ESMF_INSTALL_LIBDIR directory after a successful installation. The ESMF User's Guide discusses the esmf.mk mechanism to access the bundled applications in more detail in section "Using Bundled ESMF Applications".
The following sections provide in-depth documentation of the bundled ESMF
applications. In addition, each application supports the standard
--help
command line argument, providing a brief description of how
to invoke the program.
The ESMF_Info application prints basic information about the ESMF installation to stdout.
The application usage is as follows:
ESMF_Info [--help] where --help prints a brief usage message
This section describes the offline regrid weight generation application provided by ESMF (for a description of ESMF regridding in general see Section 24.2). Regridding, also called remapping or interpolation, is the process of changing the grid that underlies data values while preserving qualities of the original data. Different kinds of transformations are appropriate for different problems. Regridding may be needed when communicating data between Earth system model components such as land and atmosphere, or between different data sets to support operations such as visualization.
Regridding can be broken into two stages. The first stage is generation of an interpolation weight matrix that describes how points in the source grid contribute to points in the destination grid. The second stage is the multiplication of values on the source grid by the interpolation weight matrix to produce values on the destination grid. This occurs through a parallel sparse matrix multiply.
There are two options for accessing ESMF regridding functionality: integrated and offline. Integrated regridding is a process whereby interpolation weights are generated via subroutine calls during the execution of the user's code. The integrated regridding can also perform the parallel sparse matrix multiply. In other words, ESMF integrated regridding allows a user to perform the whole process of interpolation within their code. For a further description of ESMF integrated regridding please see Section 26.3.24. In contrast to integrated regridding, offline regridding is a process whereby interpolation weights are generated by a separate ESMF application, not within the user code. The ESMF offline regridding application also only generates the interpolation matrix, the user is responsible for reading in this matrix and doing the actual interpolation (multiplication by the sparse matrix) in their code. The rest of this section further describes ESMF offline regridding.
For a discussion of installing and accessing ESMF applications such as this one please see the beginning of this part of the reference manual (Section II) or for the quickest approach to just building and accessing the applications please refer to the "Building and using bundled ESMF applications" Section in the ESMF User's Guide.
As described above, this tool reads in two grid files and outputs weights for interpolation between the two grids. The input and output files are all in NetCDF format. The grid files can be defined in four different formats: the SCRIP format 12.4.1 as is used as an input to SCRIP [4], the GRIDSPEC Tile grid file 12.4.3 following the CF metadata conventions, the ESMF unstructured grid format 12.4.2 or the proposed CF unstructured grid data model (UGRID) 12.4.4 in the current ESMF release. GRIDSPEC is a proposed CF extension for the annotation of complex Earth system grids. In the current ESMF release, we only support a single tile grid file for rectangular lat/lon grid. For UGRID, we support the 2D flexible mesh topology with mixed triangles and quadrilaterals and fully 3D unstructured mesh topology with hexahedrons and tetrahedrons.
The weight file is the same format 12.5 as is output by SCRIP. The interpolation weights can be generated with the bilinear, patch, or first order conservative methods described below. Masking is supported for 2D logically rectangular (i.e. with grid_rank=2) grids in the SCRIP format. This application can do regrid weight generation from a global or regional source grid to a global or regional destination grid. It assumes that the source and destination grids are on a sphere and that the coordinates given in the files are latitude and longitude values. The coordinates can either be in degrees or radians (this is indicated by the "units" attribute attached to the value). As is true with many global models, this application currently assumes the latitude and longitude refer to positions on a perfect sphere, as opposed to a more complex and accurate representation of the earth's true shape such as would be used in a GIS system. (ESMF's current user base doesn't require this level of detail in representing the earth's shape, but it could be added in the future if necessary.) This file based regrid weight generation application is parallel.
This application requires the NetCDF library to read the grid files and to write out the weight files in NetCDF format. To compile ESMF with the NetCDF library, please refer to the "Third Party Libraries" Section in the ESMF User's Guide for more information.
Internally this application uses the ESMF public API to generate the interpolation weights. If a source or destination grid is logically rectangular, then ESMF_GridCreate() 31.3.8 is used to create an ESMF_Grid object. The cell center coordinates of the input grid are put into the center stagger location (ESMF_STAGGERLOC_CENTER). In addition, the corner coordinates are also put into the corner stagger location (ESMF_STAGGERLOC_CORNER) for conservative regridding. The method ESMF_MeshCreate() 33.3.6 is used to create an ESMF_Mesh object, if the source or destination grid is a cubed sphere grid or an unstructured grid. When making this call, the flag convert3D is set to TRUE to convert the 2D coordinates into 3D Cartesian coordinates. Currently, ESMF only supports triangle or quadrilateral element types for a 2D Mesh. Therefore, when the cells in an unstructured grid contain more than four edges, they are broken into multiple triangle elements before ESMF_MeshCreate() is called to create the ESMF_Mesh object. After the calculation of the weight matrix based on the broken up cells, the matrix entries for the triangles are merged together, so that the output matrix is in terms of the original cells. Internally ESMF_FieldRegridStore() is used to generate the weight table and indices table representing the interpolation matrix.
The regridding occurs in 3D to avoid problems with periodicity and with the pole singularity. This application supports four options for handling the pole region (i.e. the empty area above the top row of the source grid or below the bottom row of the source grid). Note that all of these pole options currently only work for logically rectangular grids (i.e. SCRIP format grids with grid_rank=2 or GRIDSPEC format grids). The first option is to leave the pole region empty ("-p none"), in this case if a destination point lies above or below the top row of the source grid, it will fail to map, yielding an error (unless "-i" is specified). With the next two options, the pole region is handled by constructing an artificial pole in the center of the top and bottom row of grid points and then filling in the region from this pole to the edges of the source grid with triangles. The pole is located at the average of the position of the points surrounding it, but moved outward to be at the same radius as the rest of the points in the grid. The difference between these two artificial pole options is what value is used at the pole. The default pole option ("-p all") sets the value at the pole to be the average of the values of all of the grid points surrounding the pole. For the other option ("-p N"), the user chooses a number N from 1 to the number of source grid points around the pole. For each destination point, the value at the pole is then the average of the N source points surrounding that destination point. For the last pole option ("-p teeth") no artificial pole is constructed, instead the pole region is covered by connecting points across the top and bottom row of the source Grid into triangles. As this makes the top and bottom of the source sphere flat, for a big enough difference between the size of the source and destination pole regions, this can still result in unmapped destination points. Only pole option "none" is currently supported with the conservative interpolation method ("-m conserve") and with the nearest neighbor interpolation methods ("-m nearestdtos" and "-m neareststod").
Masking is supported for both the logically rectangular grids and the unstructured grids.
If the grid file is in the SCRIP format, the variable "grid_imask" is used as the mask.
If the value is set to 0 for a grid point, then that point is considered masked out and
won't be used in the weights generated by the application. If the grid file is in the ESMF format, the variable "element Mask" is used as the mask. For a grid defined in the GRIDSPEC
Tile grid or in the UGRID convention, there is no mask variable defined.
However, a GRIDSPEC or a UGRID file may contain both the grid definition and the data.
The grid mask is usually constructed using the missing values defined in the data variable.
The regridding application provides the argument "--
src_missingvalue" or
"--
dst_missingvalue" for users to specify the variable name from where the mask can be
constructed.
If a destination point can't be mapped because it falls outside the unmasked source grid, then the default behavior of the application is to stop with an error. By specifying "-i" or the equivalent "--
ignore_unmapped" the user can cause the application to ignore unmapped destination points. In this case, the output matrix won't contain entries for the unmapped destination points. Note, that the unmapped point detection doesn't
currently work for nearest destination to source method ("-m nearestdtos"), so when using that method it is as if ``-i'' is always on.
Another variation in the regridding supported with spherical grids is line type. This is controlled by the "--
line_type" or ``-l'' flag. This switch allows the user to select the path of the line which connects
two points on a sphere surface. This in turn controls the path along which distances are calculated and the shape of
the edges that make up a cell. Both of these quantities can influence how interpolation weights are calculated, for example in
bilinear interpolation the distances are used to calculate the weights and the cell edges are used to determine to which source
cell a destination point should be mapped.
ESMF currently supports two line types: ``cartesian'' and ``greatcircle''. The ``cartesain'' option specifies that the line between two points follows a straight path through the 3D Cartesian space in which the sphere is embedded. Distances are measured along this 3D Cartesian line. Under this option cells are approximated by planes in 3D space, and their boundaries are 3D Cartesian lines between their corner points. The ``greatcircle'' option specifies that the line between two points follows a great circle path along the sphere surface. (A great circle is the shortest path between two points on a sphere.) Distances are measured along the great circle path. Under this option cells are on the sphere surface, and their boundaries are great circle paths between their corner points.
This regridding application can be used to generate bilinear, patch, nearest neighbor, or first-order conservative interpolation weights. The following is a description of these interpolation methods as relevant to the offline weight generation application, for a more in-depth description see Section 24.2.
The default interpolation method for the weight generation application is bilinear. The algorithm used by this application to generate the bilinear weights is the standard one found in many textbooks. Each destination point is mapped to a location in the source Mesh, the position of the destination point relative to the source points surrounding it is used to calculate the interpolation weights. A restriction on bilinear interpolation is that ESMF doesn't support self-intersecting cells (e.g. a cell twisted into a bow tie).
This application can also be used to generate patch interpolation weights. Patch interpolation is the ESMF version of a technique called "patch recovery" commonly used in finite element modeling [19] [15]. It typically results in better approximations to values and derivatives when compared to bilinear interpolation. Patch interpolation works by constructing multiple polynomial patches to represent the data in a source element. For 2D grids, these polynomials are currently 2nd degree 2D polynomials. The interpolated value at the destination point is the weighted average of the values of the patches at that point.
The patch interpolation process works as follows. For each source element containing a destination point we construct a patch for each corner node that makes up the element (e.g. 4 patches for quadrilateral elements, 3 for triangular elements). To construct a polynomial patch for a corner node we gather all the elements around that node. (Note that this means that the patch interpolation weights depends on the source element's nodes, and the nodes of all elements neighboring the source element.) We then use a least squares fitting algorithm to choose the set of coefficients for the polynomial that produces the best fit for the data in the elements. This polynomial will give a value at the destination point that fits the source data in the elements surrounding the corner node. We then repeat this process for each corner node of the source element generating a new polynomial for each set of elements. To calculate the value at the destination point we do a weighted average of the values of each of the corner polynomials evaluated at that point. The weight for a corner's polynomial is the bilinear weight of the destination point with regard to that corner. The patch method has a larger stencil than the bilinear, for this reason the patch weight matrix can be correspondingly larger than the bilinear matrix (e.g. for a quadrilateral grid the patch matrix is around 4x the size of the bilinear matrix). This can be an issue when performing a regrid weight generation operation close to the memory limit on a machine. A restriction on patch interpolation is that ESMF doesn't support self-intersecting cells (e.g. a cell twisted into a bow tie).
The nearest neighbor interpolation options work by associating a point in one set with the closest point in another set. If two points are equally close then the point with the smallest index is arbitrarily used (i.e. the point with that would have the smallest index in the weight matrix). There are two versions of this type of interpolation available in the regrid weight generation application. One of these is the nearest source to destination method ("-m neareststod"). In this method each destination point is mapped to the closest source point. The other of these is the nearest destination to source method ("-m nearestdtos"). In this method each source point is mapped to the closest destination point. Note, that with this method the unmapped destination point detection doesn't work, so no error will be returned even if there destination points which don't map to any source point.
First-order conservative interpolation [22] is also available as a regridding method. This method will typically have
a larger local interpolation error than the previous two methods, but will do a much better job of preserving the value of the
integral of data between the source and destination grid. In this method the value across each source cell
is treated as a constant. The weights for a particular destination cell are the area of intersection of each
source cell with the destination cell divided by the area of the destination cell.
Areas in this case are calculated by connecting the corner coordinates of each grid cell (obtained from the grid file) with great circles. If the user doesn't specify
the user area's option ("--
user_areas"), then the conservation will hold for the great circle areas calculated by
ESMF (and these are output to the weight file). This means the following equation will hold: sum-over-all-source-cells(Vsi*Asi) = sum-over-all-destination-cells(Vdj*A'dj), where
V is the variable being regridded and A' is the area of a cell as calculated by ESMF. The subscripts s and d refer to source and destination values, and the i and j are the source
and destination grid cell indices (flattening the arrays to 1 dimension). If the user does specify the user area's option, then the conservation will be adjusted to work for the areas
provided by the user in the grid files (and these areas are output to the weight file). This means the following equation will hold: sum-over-all-source-cells(Vsi*Asi) = sum-over-all-destination-cells(Vdj*Adj), where A is the area of a cell as provided by the user.
Note that since the conservative assumes great circle edges to cells, the edges of a cell won't necessarily be the same as a straight line in latitude longitude. For small edges, this difference will be small, but for long edges it could be significant. This means if the user expects cell edges as straight lines in latitude longitude space, they should avoid using one large cell with long edges to compute an average over a region (e.g. over an ocean basin). The user should also avoid using cells which contain one edge that runs half way or more around the earth, because the regrid weight calculation assumes the edge follows the shorter great circle path. Also, there isn't a unique great circle edge defined between points on the exact opposite side of the earth from one another (antipodal points). However, the user can work around both of these problem by breaking the long edge into two smaller edges by inserting an extra node, or by breaking the large target grid cells into two or more smaller grid cells. This allows the application to resolve the ambiguity in edge direction. Another restriction is that ESMF doesn't support self-intersecting cells (e.g. a cell twisted into a bow tie).
It is important to note that by default (i.e. using destination area normalization) conservative regridding doesn't normalize the interpolation weights by the destination fraction. This means that for a destination grid which only partially overlaps the source grid the destination field which is output from the regrid operation should be divided by the corresponding destination fraction to yield the true interpolated values for cells which are only partially covered by the source grid.
The fraction also needs to be included when computing the total source and destination integrals. To include the fraction in the conservative weights, the user can specify the fraction area normalization type. This can be done by specifying "--
norm_type fracarea'' on the command line.
For weights generated using destination area normalization (either by not specifying any normalization type or by specifying "--
norm_type dstarea"),
the following pseudo-code shows how to adjust a destination field (dst_field) by the destination fraction (dst_frac) called frac_b in the weight file:
for each destination element i if (dst_frac(i) not equal to 0.0) then dst_field(i)=dst_field(i)/dst_frac(i) end if end for
For weights generated using destination area normalization (either by not specifying any normalization type or by specifying "--
norm_type dstarea"),
the following pseudo-code shows how to compute the total destination integral (dst_total) given the destination field values (dst_field) resulting
from the sparse matrix multiply of the weights in the weight file by the source field, the destination area (dst_area) called area_b in the
weight file, and the destination fraction (dst_frac) called frac_b in the weight file. As in the previous paragraph, it also
shows how to adjust the destination field (dst_field) resulting from the sparse matrix multiply by the fraction
(dst_frac) called frac_b in the weight file:
dst_total=0.0 for each destination element i if (dst_frac(i) not equal to 0.0) then dst_total=dst_total+dst_field(i)*dst_area(i) dst_field(i)=dst_field(i)/dst_frac(i) ! If mass computed here after dst_field adjust, would need to be: ! dst_total=dst_total+dst_field(i)*dst_area(i)*dst_frac(i) end if end for
For weights generated using fraction area normalization (set by specifying "--
norm_type fracarea"), no adjustment of the destination field (dst_field) by the destination fraction is necessary. The following pseudo-code shows how to compute the total destination integral (dst_total) given the destination field values (dst_field) resulting
from the sparse matrix multiply of the weights in the weight file by the source field, the destination area (dst_area) called area_b in the
weight file, and the destination fraction (dst_frac) called frac_b in the weight file:
dst_total=0.0 for each destination element i dst_total=dst_total+dst_field(i)*dst_area(i)*dst_frac(i) end for
For either normalization type, the following pseudo-code shows how to compute the total source integral (src_total) given the source field values (src_field), the source area (src_area) called area_a in the weight file, and the source fraction (src_frac) called frac_a in the weight file:
src_total=0.0 for each source element i src_total=src_total+src_field(i)*src_area(i)*src_frac(i) end for
The interpolation weights generated by this application are output to a NetCDF file (specified by the "-w" or "--
weight"
keywords). The format of this file is the same as that generated by SCRIP. See Section 12.5 for a description of the format.
Note that the sequence of the weights in the file can
vary with the number of processors used to run the application. This means that two weight files generated by using different
numbers of processors can contain exactly the same interpolation matrix, but can appear different in a direct line by line
comparison (such as would be done by ncdiff).
The command line arguments are all keyword based. Both the long keyword prefixed with '--'
or the
one character short keyword prefixed with '-' are supported. The format to run the application is
as follows:
ESMF_RegridWeightGen --source|-s src_grid_filename --destination|-d dst_grid_filename --weight|-w out_weight_file [--method|-m bilinear|patch|nearestdtos|neareststod|conserve] [--pole|-p none|all|teeth|1|2|..] [--line_type|-l cartesian|greatcircle] [--norm_type dstarea|fracarea] [--ignore_unmapped|-i] [--ignore_degenerate] [--src_type SCRIP|GRIDSPEC|ESMF|UGRID] [--dst_type SCRIP|GRIDSPEC|ESMF|UGRID] [-t SCRIP|GRIDSPEC|ESMF|UGRID] [-r] [--src_regional] [--dst_regional] [--64bit_offset] [--netcdf4] [--src_meshname dummy_var_name] [--dst_meshname dummy_var_name] [--src_missingvalue var_name] [--dst_missingvalue var_name] [--src_coordinates lon_name,lat_name] [--dst_coordinates lon_name,var_name] [--src_loc center|corner] [--dst_loc center|corner] [--user_areas] [--check] [--no_log] [--help] [--version] [-V] where: --source or -s - a required argument specifying the source grid file name --destination or -d - a required argument specifying the destination grid file name --weight or -w - a required argument specifying the output regridding weight file name --method or -m - an optional argument specifying which interpolation method is used. The value can be one of the following: bilinear - for bilinear interpolation, also the default method if not specified. patch - for patch recovery interpolation conserve - for first-order conservative interpolation --pole or -p - an optional argument indicating what to do with the pole. The value can be one of the following: none - No pole, the source grid ends at the top (and bottom) row of nodes specified in <source grid>. all - Construct an artificial pole placed in the center of the top (or bottom) row of nodes, but projected onto the sphere formed by the rest of the grid. The value at this pole is the average of all the pole values. This is the default option. teeth - No new pole point is constructed, instead the holes at the poles are filled by constructing triangles across the top and bottom row of the source Grid. This can be useful because no averaging occurs, however, because the top and bottom of the sphere are now flat, for a big enough mismatch between the size of the destination and source pole regions, some destination points may still not be able to be mapped to the source Grid. <N> - Construct an artificial pole placed in the center of the top (or bottom) row of nodes, but projected onto the sphere formed by the rest of the grid. The value at this pole is the average of the N source nodes next to the pole and surrounding the destination point (i.e. the value may differ for each destination point. Here N ranges from 1 to the number of nodes around the pole. --line_type or -l - an optional argument indicating the type of path lines (e.g. cell edges) follow on a spherical surface. The default value depends on the regrid method. For non-conservative methods the default is cartesian. For conservative methods the default is greatcircle. --norm_type - an optional argument indicating the type of normal- ization to do when generating conservative weights. The default value is dstarea. --ignore_unmapped or -i - ignore unmapped destination points. If not specified the default is to stop with an error if an unmapped point is found. --ignore_degenerate - ignore degenerate cells in the input grids. If not specified the default is to stop with an error if an degenerate cell is found. --src_type - an optional argument specifying the source grid file type. The value could be one of SCRIP, GRIDSPEC, ESMF or UGRID. The SCRIP file can be either structured or unstructured grid. The GRIDSPEC is the only for the structured grid defined in the CF convention. The ESMF and UGRID file types are only available for the unstructured grid. The default option is SCRIP. --dst_type - an optional argument specifying the destination grid file type. The value could be one of SCRIP, GRIDSPEC, ESMF or UGRID. The SCRIP file can be either structured or unstructured grid. The GRIDSPEC is the only for the structured grid defined in the CF convention. UGRID. The ESMF and UGRID file types are only available for the unstructured grid. The default option is SCRIP. -t - an optional argument specifying the file types for both the source and the destination grid files. The default option is SCRIP. If both -t and --src_type or --dst_type are given at the same time and they disagree with each other, an error message will be generated. -r - an optional argument specifying that the source and destination grids are regional grids. If the argument is not given, the grids are assumed to be global. --src_regional - an optional argument specifying that the source is a regional grid and the destination is a global grid. --dst_regional - an optional argument specifying that the destination is a regional grid and the source is a global grid. --64bit_offset - an optional argument specifying that the weight file will be created in the NetCDF 64-bit offset format to allow variables larger than 2GB. Note the 64-bit offset format is not supported in the NetCDF version earlier than 3.6.0. An error message will be generated if this flag is specified while the application is linked with a NetCDF library earlier than 3.6.0. --netcdf4 - an optional argument specifying that the output weight will be created in the NetCDF4 format. This option only works with NetCDF library version 4.1 and above that was compiled with the NetCDF4 file format enabled (with HDF5 compression). An error message will be generated if these conditions are not met. --src_meshname - this argument is required if the source grid type is UGRID. It defines the dummy variable name that has all the topology information stored in its attributes. --dst_meshname - this argument is required if the destination grid type is UGRID. It defines the dummy variable name that has all the topology information stored in its attributes. --src_missingvalue - an optional argument that defines the variable name in the source grid file if the file type is GRIDSPEC or UGRID. The regridder will generate a mask using the missing values of the data variable. The missing value is defined using an attribute called "_FillValue" or "missing_value". --dst_missingvalue - an optional argument that defines the variable name in the destination grid file if the file type is GRIDSPEC or UGRID. The regridder will generate a mask using the missing values of the data variable. The missing value is defined using an attribute called "_FillValue" or "missing_value" --src_coordinates - an optional argument that defines the longitude and latitude variable names in the source grid file if the file type is GRIDSPEC. The variable names are separated by comma. This argument is required in case there are multiple sets of coordinate variables defined in the file. Without this argument, the offline regrid application will terminate with an error message when multiple coordinate variables are found in the file. --dst_coordinates - an optional argument that defines the longitude and latitude variable names in the destination grid file if the file type is GRIDSPEC. The variable names are separated by comma. This argument is required in case there are multiple sets of coordinate variables defined in the file. Without this argument, the offline regrid application will terminate with an error message when multiple coordinate variables are found in the file. --src_center - an optional argument that specifies whether to use the center coordinates or corner coordinates to do the regridding. Currently, this flag is only required for non-conservative regridding when the source grid is an unstructured grid in ESMF or UGRID format. For all other cases, only the center coordinates is supported and that is also the default value if this argument is not specified. --dst_center - an optional argument that specifies whether to use the center coordinates or corner coordinates to do the regridding. Currently, this flag is only required for non-conservative regridding when the destination grid is an unstructured grid in ESMF or UGRID format. For all other cases, only the center coordinates is supported and that is also the default value if this argument is not specified. --user_areas - an optional argument specifying that the conservation is adjusted to hold for the user areas provided in the grid files. If not specified, then the conservation will hold for the ESMF calculated (great circle) areas. Whichever areas the conservation holds for are output to the weight file. --check - Check that the generated weights produce reasonable regridded fields. This is done by calling ESMF_Regrid() on an analytic source field using the weights generated by this application. The mean relative error between the destination and analytic field is computed, as well as the relative error between the mass of the source and destination fields in the conservative case. --no_log - Turn off the ESMF Log files. By default, ESMF creates multiple log files, one per PET. --help - Print the usage message and exit. --version - Print ESMF version and license information and exit. -V - Print ESMF version number and exit.
The example below shows the command to generate a set of conservative interpolation weights between a global SCRIP format source grid file (src.nc) and a global SCRIP format destination grid file (dst.nc). The weights are written into file w.nc. In this case the ESMF library and applications have been compiled using an MPI parallel communication library (e.g. setting ESMF_COMM to openmpi) to enable it to run in parallel. To demonstrate running in parallel the mpirun script is used to run the application in parallel on 4 processors.
mpirun -np 4 ./ESMF_RegridWeightGen -s src.nc -d dst.nc -m conserve -w w.nc
The next example below shows the command to do the same thing as the previous example except for three changes. The first
change is this time the source grid is regional ("--
src_regional"). The second change is that
for this example bilinear interpolation ("-m bilinear") is being used. Because bilinear is the default, we could also
omit the "-m bilinear". The third change is that in this example some of the destination points are expected to
not be found in the source grid, but the user is ok with that and just wants those points to not appear in the weight file instead of causing an error ("-i").
mpirun -np 4 ./ESMF_RegridWeightGen -i --src_regional -s src.nc -d dst.nc \ -m bilinear -w w.nc
The default grid file format is SCRIP, to use a grid file in another grid format, you
need to use the switches "--
src_type", "--
dst_type" or "-t". For example, if the
source grid is in UGRID format and the destination grid is in GRIDSPEC format, the command
to run the application is:
mpirun -np 4 ./ESMF_RegridWeightGen -s src.nc -d dst.nc \ -m conserve -w w.nc --src_type UGRID --dst_type GRIDSPEC \ --src_meshname mesh_dummy
Since the source grid is a UGRID, an additional argument "--
src_meshname" needs to be provided. This is the dummy variable used to define all the mesh topology information in the
grid file.
The last example shows how to use the missing values of a data variable to generate the
grid mask for a GRIDSPEC file, how to specify the coordinate variable names
using "--
src_coordinates"
and use user defined area for the conservative regridding.
mpirun -np 4 ./ESMF_RegridWeightGen -s src.nc -d dst.nc -m conserve \ -w w.nc --src_type GRIDSPEC --src_missingvalue datavar \ --src_coordinates lon,lat --user_areas
In the above example, "datavar" is the variable name defined in the source grid that will be used to construct the mask using its missing values. In addition, "lon" and "lat" are the variable names for the longitude and latitude values, respectively.
This section describes the grid file formats supported by ESMF. These are typically used either to describe grids to ESMF_RegridWeightGen or to create grids within ESMF. The following table summarizes the features supported by each of the grid file formats.
Feature | SCRIP | ESMF Unstruct. | GRIDSPEC | UGRID |
Unstructured Grids | YES | YES | NO | YES |
Logically-Rectangular Grids | YES | NO | YES | NO |
2D Grids | YES | YES | YES | YES |
3D Grids | NO | YES | NO | YES |
Spherical Coordinates | YES | YES | YES | YES |
Cartesian Coordinates | NO | YES | NO | NO |
Non-Conserv Regrid on Corners | NO | YES | NO | YES |
The rest of this section contains a detailed descriptions of each grid file format along with a simple example of the format.
A SCRIP format grid file is a NetCDF file for describing grids. This format is the same as is used by the SCRIP [4] package, and so grid files which work with that package should also work here. SCRIP format files are capable of storing either 2D logically rectangular grids or 2D unstructured grids. The basic format for both of these grids is the same and they are distinguished by the value of the grid_rank variable. Logically rectangular grids have grid_rank set to 2, whereas unstructured grids have this variable set to 1.
The following is a sample header of a logically rectangular grid file:
netcdf remap_grid_T42 { dimensions: grid_size = 8192 ; grid_corners = 4 ; grid_rank = 2 ; variables: int grid_dims(grid_rank) ; double grid_center_lat(grid_size) ; grid_center_lat:units = "radians"; double grid_center_lon(grid_size) ; grid_center_lon:units = "radians" ; int grid_imask(grid_size) ; grid_imask:units = "unitless" ; double grid_corner_lat(grid_size, grid_corners) ; grid_corner_lat:units = "radians" ; double grid_corner_lon(grid_size, grid_corners) ; grid_corner_lon:units ="radians" ; // global attributes: :title = "T42 Gaussian Grid" ; }
The grid_size dimension is the total number of cells in the grid; grid_rank refers to the number of dimensions. In this case grid_rank is 2 for a 2D logically rectangular grid. The integer array grid_dims gives the number of grid cells along each dimension. The number of corners (vertices) in each grid cell is given by grid_corners. The grid corner coordinates need to be listed in an order such that the corners are in counter-clockwise order. Also, note that if your grid has a variable number of corners on grid cells, then you should set grid_corners to be the highest value and use redundant points on cells with fewer corners.
The integer array grid_imask is used to mask out grid cells which should not participate in the regridding. The array should by zero for any points that do not participate in the regridding and one for all other points. Coordinate arrays provide the latitudes and longitudes of cell centers and cell corners. The unit of the coordinates can be either "radians" or "degrees".
Here is a sample header from a SCRIP unstructured grid file:
netcdf ne4np4-pentagons { dimensions: grid_size = 866 ; grid_corners = 5 ; grid_rank = 1 ; variables: int grid_dims(grid_rank) ; double grid_center_lat(grid_size) ; grid_center_lat:units = "degrees" ; double grid_center_lon(grid_size) ; grid_center_lon:units = "degrees" ; double grid_corner_lon(grid_size, grid_corners) ; grid_corner_lon:units = "degrees"; grid_corner_lon:_FillValue = -9999. ; double grid_corner_lat(grid_size, grid_corners) ; grid_corner_lat:units = "degrees" ; grid_corner_lat:_FillValue = -9999. ; int grid_imask(grid_size) ; grid_imask:_FillValue = -9999. ; double grid_area(grid_size) ; grid_area:units = "radians^2" ; grid_area:long_name = "area weights" ; }
The variables are the same as described above, however, here (grid_rank = 1). In this format there is no notion of which cells are next to which, so to construct the unstructured mesh the connection between cells is defined by searching for cells with the same corner coordinates. (e.g. the same grid_corner_lat and grid_corner_lon values).
Both the SCRIP grid file format and the SCRIP weight file format work with the SCRIP 1.4 tools.
ESMF supports a custom unstructured grid file format for describing meshes. This format is more compatible than the SCRIP format with the methods used to create an ESMF Mesh object, so less conversion needs to be done to create a Mesh. The ESMF format is thus more efficient than SCRIP when used with ESMF codes (e.g. the ESMF_RegridWeightGen application). However, this format doesn't currently support conversion to a dual mesh which means that non-conservative regridding (e.g. Bilinear) is only supported on nodes.
The following is a sample header in the ESMF format followed by a description:
netcdf mesh-esmf { dimensions: nodeCount = 9 ; elementCount = 5 ; maxNodePElement = 4 ; coordDim = 2 ; variables: double nodeCoords(nodeCount, coordDim); nodeCoords:units = "degrees" ; int elementConn(elementCount, maxNodePElement) ; elementConn:long_name = "Node Indices that define the element / connectivity"; elementConn:_FillValue = -1 ; byte numElementConn(elementCount) ; numElementConn:long_name = "Number of nodes per element" ; double centerCoords(elementCount, coordDim) ; centerCoords:units = "degrees" ; double elementArea(elementCount) ; elementArea:units = "radians^2" ; elementArea:long_name = "area weights" ; int elementMask(elementCount) ; elementMask:_FillValue = -9999. ; // global attributes: :gridType="unstructured"; :version = "0.9" ;
In the ESMF format the netCDF dimensions have the following meanings. The nodeCount dimension is the number of nodes in the mesh. The elementCount dimension is the number of elements in the mesh. The maxNodePElement dimension is the maximum number of nodes in any element in the mesh. For example, in a mesh containing just triangles, then maxNodePElement would be 3. However, if the mesh contained one quadrilateral then maxNodePElement would need to be 4. The coordDim dimension is the number of dimensions of the points making up the mesh (i.e. the spatial dimension of the mesh). For example, a 2D planar mesh would have coordDim equal to 2.
In the ESMF format the netCDF variables have the following meanings. The nodeCoords variable contains the coordinates for each node. nodeCoords is a two-dimensional array of dimension (nodeCount,coordDim). For a 2D Grid, coordDim is 2 and the grid can be either spherical or Cartisian. If the units attribute is either degrees or radians, it is spherical. nodeCoords(:,1) contains the longitude coordinates and nodeCoords(:,2) contains the latitude coordinates. If the value of the units attribute is km, kilometers or meters, the grid is in 2D Cartesian coordinates. nodeCoords(:,1) contains the x coordinates and nodeCoords(:,2) contains the y coordinates. The same order applies to centerCoords. For a 3D Grid, coordDim is 3 and the grid is assumed to be Cartesian. nodeCoords(:,1) contains the x coordinates, nodeCoords(:,2) contains the y coordinates, and nodeCoords(:,3) contains the z coordinates. The same order applies to centerCoords. A 2D grid in the Cartesian coordinate can only be regridded into another 2D grid in the Cartesian coordinate.
The elementConn variable describes how the nodes are connected together to form each element. For each element (i.e. each iteration of the first dimension of the variable), this variable contains a list of 1-based indices into the nodeCoords variable pointing to the nodes which make up that element. For example, if elementConn(2,1..3) contains (1,2,4), then that means that the second element consists of the nodes whose coordinates are described by nodeCoords(1), nodeCoords(2), and nodeCoords(4). The order of the indices describing the element is important. The proper order for elements available in an ESMF mesh can be found in Section 33.2.1. The file format does support 2D polygons with more corners than those in that section, but internally these are broken into triangles. For these polygons, the corners should be listed such that they are in counter-clockwise order around the element. Because variables have to be rectangular, the second dimension of the elementConn variable has to be the size of the largest number of nodes in any element (i.e. maxNodePElement), the actual number of nodes in an element is given by the numElementConn variable. For a given dimension (i.e. coordDim) the number of nodes in the element indicates the element shape, for example in 2D, if numElementConn is 4 then the element is a quadrilateral. In 3D, if numElementConn is 8 then the element is a hexahedron.
The rest of the variables in the format are optional. The centerCoords variable gives the coordinates of the center of the corresponding element. This variable is currently not used by ESMF. The elementArea variable gives the area (or volume in 3D) of the corresponding element. This area is used by ESMF during conservative interpolation. If not specified, ESMF calculates the area (or volume) based on the coordinates of the nodes making up the element. The final variable is the elementMask variable. This variable allows the user to specify a mask value for the corresponding element. If the value is 1, then the element is unmasked and if the value is 0 the element is masked. If not specified, ESMF assumes that no elements are masked.
The following is a picture of a small example mesh and a sample ESMF format header using non-optional variables describing that mesh:
2.0 7 ------- 8 ------- 9 | | | | 4 | 5 | | | | 1.0 4 ------- 5 ------- 6 | | \ 3 | | 1 | \ | | | 2 \ | 0.0 1 ------- 2 ------- 3 0.0 1.0 2.0 Node indices at corners Element indices in centers netcdf mesh-esmf { dimensions: nodeCount = 9 ; elementCount = 5 ; maxNodePElement = 4 ; coordDim = 2 ; variables: double nodeCoords(nodeCount, coordDim); nodeCoords:units = "degrees" ; int elementConn(elementCount, maxNodePElement) ; elementConn:long_name = "Node Indices that define the element / connectivity"; elementConn:_FillValue = -1 ; byte numElementConn(elementCount) ; numElementConn:long_name = "Number of nodes per element" ; // global attributes: :gridType="unstructured"; :version = "0.9" ; data: nodeCoords= 0.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 0.0, 2.0, 1.0, 2.0, 2.0, 2.0 ; elementConn= 1, 2, 5, 4, 2, 3, 5, -1, 3, 6, 5, -1, 4, 5, 8, 7, 5, 6, 9, 8 ; numElementConn= 4, 3, 3, 4, 4 ; }
ESMF_RegridWeightGen supports NetCDF files that follow the CF GRIDSPEC convention for logically rectangular lat/lon grids.
GRIDSPEC is a draft proposal to extend the Climate and Forecast (CF) metadata conventions for the representation of gridded data for Earth System Models. The original GRIDSPEC standard was proposed by V. Balaji and Z. Liang of GFDL (see ref). It was further developed by D. Kindig and A. Pletzer of Tech X. The ESMF implementation is based on the 02/09/2012 version of the proposal developed at Tech X (Unfortunately, the link to this document is now broken and we have been unable to find a new one that works, so we can't currently point to it).
GRIDSPEC extends the current CF convention to support grid mosaics, i.e., a grid consisting of multiple logically rectangular grid tiles. It also provides a mechanism for storing a grid dataset in multiple files. Therefore, it introduces different types of files, such as a mosaic file that defines the multiple tiles and their connectivity, a host file that aggregates the grid mosiac and the data files, and a tile file for a single tile grid definition.
Currently, ESMF only supports the Grid creation and regridding from a single tile grid file. A tile file
is a CF compliant NetCDF file for a logically rectangular lat/lon grid based on the
CF Metadata Conventions V1.6. An example grid file is shown below.
The cell center coordinate variables are determined by the value of its attribute units. The longitude
variable has the attribute value set to either degrees_east, degree_east, degrees_E, degree_E,
degreesE or degreeE. The latitude variable has the attribute value set to degrees_north, degree_north, degrees_N,
degree_N, degreesN or degreeN. The latitude and the longitude variables are one-dimensional arrays if the grid is a regular lat/lon grid, two-dimensional arrays if the grid is curvilinear. The bound coordinate
variables define the bound or the corner coordinates of a cell. The bound variable name is specified in the
bounds attribute of the latitude and longitude variables. In the following example, the latitude bound
variable is lat_bnds and the longitude bound variable is lon_bnds. The bound variables are 2D
arrays for a regular lat/lon grid and a 3D array for a curvilinear grid. The first dimension of the bound
array is 2 for a regular lat/lon grid and 4 for a curvilinear grid. The bound coordinates for a curvilinear
grid is defined in counterclockwise order. In the example below, the grid is a regular lat/lon
grid, thus the coordinate variables are 1D and the bound variables are 2D with the first dimension equal to 2.
The bound coordinates will be read in and stored in a ESMF Grid object as the corner stagger coordinates when doing a conservative regrid. In case there are multiple sets of coordinate variables defined in a grid file,
the offline regrid application will return an error for duplicate latitude or longitude variables unless the user
uses "--
src_coordinates" or "--
src_coordinates" to specify the coordinate variable names
to be used in the regrid.
Since a GRIDSPEC tile file does not have a way to specify the grid mask, the mask is usually derived by the missing values stored in a data variable. ESMF_RegridWeightGen provides an option for users to derive the grid mask from a data variable's missing values. The value of the missing value is defined by the variable attribute missing_value or _FillValue. If the value of the data point is equal to the missing value, the grid mask for that grid point is set to 0, otherwise, it is set to 1. In the following grid, the variable so can be used to derive the grid mask. A data variable could be a 2D, 3D or 4D. For example, it may have additional depth and time dimensions. It is assumed that the first and the second dimensions of the data variable should be the longitude and the latitude dimension. ESMF_RegridWeightGen will use the first 2D data values to derive the grid mask.
netcdf single_tile_grid { dimensions: time = 1 ; bound = 2 ; lat = 181 ; lon = 360 ; variables: double lat(lat) ; lat:bounds = "lat_bnds" ; lat:units = "degrees_north" ; lat:long_name = "latitude" ; lat:standard_name = "latitude" ; double lat_bnds(lat, bound) ; double lon(lon) ; lon:bounds = "lon_bnds" ; lon:long_name = "longitude" ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; double lon_bnds(lon, bound) ; float so(time, lat, lon) ; so:standard_name = "sea_water_salinity" ; so:units = "psu" ; so:missing_value = 1.e+20f ; }
ESMF_RegridWeightGen supports NetCDF files that follow the UGRID conventions for unstructured grids.
The UGRID file format is a proposed extension to the CF metadata conventions for the unstructured grid data model. The latest proposal can be found at https://github.com/ugrid-conventions/ugrid-conventions. The proposal is still evolving, the Mesh creation API and ESMF_RegridWeightGen in the current ESMF release is based on UGRID Version 0.9.0 published on October 29, 2013.
In the UGRID proposal, a 1D, 2D, or 3D mesh topology can be defined for an unstructured grid. Currently, ESMF supports two types of meshes: (1) the 2D flexible mesh topology where each cell (a.k.a. "face" as defined in the UGRID document) in the mesh is either a triangle or a quadrilateral, and (2) the fully 3D unstructured mesh topology where each cell (a.k.a. "volume" as defined in the UGRID document) in the mesh is either a tetrahedron or a hexahedron. Pyramids and wedges are not currently supported in ESMF, but they can be defined as degenerate hexahedrons.
The main addition of the UGRID extension is a dummy variable that defines the mesh topology. This additional variable has a required attribute cf_role with value "mesh_topology". In addition, it has two more required attributes: topology_dimension and node_coordinates. If it is a 2D mesh (i.e., topology_dimension equals to 2), an additional attribute face_node_connectivity is required. If it is a 3D mesh (i.e., topology_dimension equals to 3), two additional attributes volume_node_connectivity and volume_shape_type are required. The value of attribute node_coordinates is a list of the names of the node longitude and latitude variables, plus the elevation variable if it is a 3D mesh. The value of attribute face_node_connectivity or volume_node_connectivity is the variable name that defines the corner node indices for each mesh cell. The additional attribute volume_shape_type for the 3D mesh points to a flag variable that specifies the shape type of each cell in the mesh.
Below is a sample 2D mesh called. FVCOM_grid2d. the dummy mesh topology variable is fvcom_mesh. As described above, its cf_role attribute has to be mesh_topology and topology_dimension attribute has to be 2 for a 2D mesh. It defines the node coordinate variable names to be lon and lat. It also specifies the face/node connectivity variable name as nv.
The variable nv is a two-dimensional array that defines the node indices of each face. The first dimension defines the maximal number of nodes for each face. In this example, it is a triangle mesh so the number of nodes per face is 3. Since each face may have different number of corner nodes, some of the cells may have fewer nodes than the specified dimension. In that case, it is filled with the missing values defined by the attribute _FillValue. If _FillValue is not defined, the default value is -1. The nodes are in counter clockwise order. An optional attribute start_index defines whether the node index is 1-based or 0-based. If start_index is not defined, the default node index is 0-based.
The coordinate variables follows the CF metadata convention for coordinates. They are 1D array with attribute standard_name being either latitude or longitude. The units of the coordinates can be either degrees or radians.
The UGRID files may also contain data variables. The data may be located at the nodes or at the faces. Two additional attributes are introduced in the UGRID extension for the data variables: location and mesh. The location attribute defines where the data is located, it can be either face or node. The mesh attribute defines which mesh topology this variable belongs to since multiple mesh topologies may be defined in one file. The coordinates attribute defined in the CF conventions can also be used to associate the variables to their locations. ESMF checks both location and coordinates attributes to determine where the data variable is defined upon. If both attributes are present, location attribute takes the precedence. ESMF_RegridWeightGen uses the data variable on the face to derive the element masks for the mesh cell and variable on the node to derive the node masks for the mesh.
When creating a ESMF Mesh from a UGRID file, user has to provide the mesh topology mesh variable name to ESMF_MeshCreate().
netcdf FVCOM_grid2d { dimensions: node = 417642 ; nele = 826866 ; three = 3 ; time = 1 ; variables: // Mesh topology int fvcom_mesh; fvcom_mesh:cf_role = "mesh_topology" ; fvcom_mesh:topology_dimension = 2. ; fvcom_mesh:node_coordinates = "lon lat" ; fvcom_mesh:face_node_connectivity = "nv" ; int nv(nele, three) ; nv:standard_name = "face_node_connectivity" ; nv:start_index = 1. ; // Mesh node coordinates float lon(node) ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; float lat(node) ; lat:standard_name = "latitude" ; lat:units = "degrees_north" ; // Data variable float ua(time, nele) ; ua:standard_name = "barotropic_eastward_sea_water_velocity" ; ua:missing_value = -999. ; ua:location = "face" ; ua:mesh = "fvcom_mesh" ; float va(time, nele) ; va:standard_name = "barotropic_northward_sea_water_velocity" ; va:missing_value = -999. ; va:location = "face" ; va:mesh = "fvcom_mesh" ; }
Following is a sample 3D UGRID file containing hexahedron cells. the dummy mesh topology variable is fvcom_mesh. Its cf_role attribute has to be mesh_topology and topology_dimension attribute has to be 3 for a 3D mesh. There are two additional required attributes: volume_node_connectivity specifies a variable name that defines the corner indices of the mesh cells and volume_shape_type specifies a variable name that defines the type of the mesh cells.
The node coordinates are defined by variables nodelon, nodelat and height. Currently, the units attribute for the height variable is either kilometers, km or meters. The variable vertids is a two-dimensional array that defines the corner node indices of each mesh cell. The first dimension defines the maximal number of nodes for each cell. There is only one type of cells in the sample grid, i.e. hexahedrons, so the maximal number of nodes is 8. The node order is defined in 33.2.1. The index can be either 1-based or 0-based and the default is 0-based. Setting an optional attribute start_index to 1 changed it to 1-based index scheme. The variable meshtype is a one-dimensional integer array that defines the shape type of each cell. Currently, ESMF only supports tetrahedron and hexahedron shapes. There are three attributes in meshtype: flag_range, flag_values, and flag_meanings representing the range of the flag values, all the possible flag values, and the meaning of each flag value, respectively. flag_range and flag_values are either a scalar or an array of integers. flag_meanings is a text string containing a list of shape types separated by space. In this example, there is only one shape type, thus, the values of meshtype are all 1.
netcdf wam_ugrid100_110 { dimensions: nnodes = 78432 ; ncells = 66030 ; eight = 8 ; variables: int mesh ; mesh:cf_role = "mesh_topology" ; mesh:topology_dimension = 3. ; mesh:node_coordinates = "nodelon nodelat height" ; mesh:volume_node_connectivity = "vertids" ; mesh:volume_shape_type = "meshtype" ; double nodelon(nnodes) ; nodelon:standard_name = "longitude" ; nodelon:units = "degrees_east" ; double nodelat(nnodes) ; nodelat:standard_name = "latitude" ; nodelat:units = "degrees_north" ; double height(nnodes) ; height:standard_name = "elevation" ; height:units = "kilometers" ; int vertids(ncells, eight) ; vertids:cf_role = "volume_node_connectivity" ; vertids:start_index = 1. ; int meshtype(ncells) ; meshtype:cf_role = "volume_shape_type" ; meshtype:flag_range = 1. ; meshtype:flag_values = 1. ; meshtype:flag_meanings = "hexahedron" ; }
The regridding weight output file is in NetCDF format and contain some grid information from each grid as well as the regridding indices and weights. Following is the header of a sample output weight file that was generated by regridding a logically rectangular 2D grid to a triangle mesh unstructured grid:
netcdf t42mpas-bilinear { dimensions: n_a = 8192 ; n_b = 20480 ; n_s = 42456 ; nv_a = 4 ; nv_b = 3 ; num_wgts = 1 ; src_grid_rank = 2 ; dst_grid_rank = 1 ; variables: int src_grid_dims(src_grid_rank) ; int dst_grid_dims(dst_grid_rank) ; double yc_a(n_a) ; yc_a:units = "degrees" ; double yc_b(n_b) ; yc_b:units = "radians" ; double xc_a(n_a) ; xc_a:units = "degrees" ; double xc_b(n_b) ; xc_b:units = "radians" ; double yv_a(n_a, nv_a) ; yv_a:units = "degrees" ; double xv_a(n_a, nv_a) ; xv_a:units = "degrees" ; double yv_b(n_b, nv_b) ; yv_b:units = "radians" ; double xv_b(n_b, nv_b) ; xv_b:units = "radians" ; int mask_a(n_a) ; mask_a:units = "unitless" ; int mask_b(n_b) ; mask_b:units = "unitless" ; double area_a(n_a) ; area_a:units = "square radians" ; double area_b(n_b) ; area_b:units = "square radians" ; double frac_a(n_a) ; frac_a:units = "unitless" ; double frac_b(n_b) ; frac_b:units = "unitless" ; int col(n_s) ; int row(n_s) ; double S(n_s) ; // global attributes: :title = "ESMF Offline Regridding Weight Generator" ; :normalization = "destarea" ; :map_method = "Bilinear remapping" ; :conventions = "NCAR-CSM" ; :domain_a = "T42_grid.nc" ; :domain_b = "grid-dual.nc" ; :grid_file_src = "T42_grid.nc" ; :grid_file_dst = "grid-dual.nc" ; :CVS_revision = "5.3.0 beta snapshot" ; }
Variables ended with "_a" are the variables for the source grid and the ones ended with "_b" are the variables for the destination grid. For instance, xc_a and yc_a are corresponding to the grid_center_lon and grid_center_lat variables in the source grid file. The grid information includes the center and corner coordinates and the grid mask arrays (mask_a and mask_b) from the input grid files. The grid areas (area_a and area_b) are either provided by the user or computed by ESMF_RegridWeightGen. The grid area array is only output when the conservative remapping option is used. The values of the area array are set to zero for bilinear and patch remappings. The grid frac arrays (frac_a and frac_b) are calculated by ESMF_RegridWeightGen. For conservative remapping, the grid frac array returns the area fraction of the grid cell which participates in the remapping. For bilinear and patch remapping, the destination grid frac array is one where the grid point participates in the remapping and zero otherwise. For bilinear and patch remapping, the source grid frac array is always set to zero.
The indices and weights generated by ESMF_FieldRegridStore() are stored in the output file as variables col, row and S. Where col and row are the indices to the source and the destination grid cells. These are a one-dimension array with length defined by dimension n_s. S is the weight which is multiplied by the source value indicated by col and then summed with the destination value indicated by row to build the final interpolated value of the destination.
The global attributes in the interpolation weight file describe some information about the process which generated the weights. The title is always fixed at ``ESMF Offline Regridding Weight Generator'' for a weight file generated by the ESMF offline weight generation application. The normalization attribute describes how the conservative weights are calculated, currently this is always set to ``destarea'' because this is the only option which we currently support. The setting ``destarea'' means that the conservative weights are calculated by dividing the area of the intersection of the source and destination cells by the area of the destination cell. This is set even when the weights are not conservative in which case it can be ignored. The map_method attribute indicates the interpolation type. The format of the interpolation weight file was developed by a group outside of ESMF, because of its use by utilities outside of ESMF control, the range of some of the meta data is constrained. The map_method is one of these. Because of this constraint, there is no map method corresponding to patch interpolation. A weight file generated with the ``patch'' interpolation method will have map_method set to ``Bilinear remapping''. The conventions attribute indicates the format of the weight file, currently only ``NCAR-CSM'' is supported. The attributes domain_a and grid_file_src both indicate the source grid file name. The attributes domain_b and grid_file_dst both indicate the destination grid file name. CVS_revision indicates the version of ESMF containing the application which was used to generate the weight file.
The following code shows how to apply the weights in the weight file to interpolate a source field (src_field) defined over the source grid to a destination field (dst_field) defined over the destination grid. The variables n_s, n_b, row, col, and S are from the weight file.
! Initialize destination field to 0.0 do i=1, n_b dst_field(i)=0.0 enddo ! Apply weights do i=1, n_s dst_field(row(i))=dst_field(row(i))+S(i)*src_field(col(i)) enddo
If the first-order conservative interpolation method was specified ("-m conserve") then the destination field may need to be adjusted by the destination fraction (frac_b) if the destination grid extends outside the unmasked source grid. (If it isn't known if the destination extends outside the source, then it doesn't hurt to apply the destination fraction in either case. If it doesn't extend outside, then the fraction will be 1.0 everywhere anyway.) The following code shows how to adjust an already interpolated destination field (dst_field) by the destination fraction. The variables n_b, and frac_b are from the weight file:
! Adjust destination field by fraction do i=1, n_b if (frac_b(i) .ne. 0.0) then dst_field(i)=dst_field(i)/frac_b(i) endif enddo
The ESMF_RegridWeightGen application is used in the ESMF_RegridWeightGenCheck external demo to generate interpolation weights. These weights are then tested by using them for a regridding operation and then comparing them against an analytic function on the destination grid. This external demo is also used to regression test ESMF regridding, and it is run nightly on over 90 combinations of structured and unstructured, regional and global grids, and regridding methods.
This section describes the file-based regridding application provided by ESMF (for a description of ESMF regridding in general see Section 24.2). Regridding, also called remapping or interpolation, is the process of changing the grid that underlies data values while preserving qualities of the original data. Different kinds of transformations are appropriate for different problems. Regridding may be needed when communicating data between Earth system model components such as land and atmosphere, or between different data sets to support operations such as visualization.
Regridding can be broken into two stages. The first stage is generation of an interpolation weight matrix that describes how points in the source grid contribute to points in the destination grid. The second stage is the multiplication of values on the source grid by the interpolation weight matrix to produce values on the destination grid. This occurs through a parallel sparse matrix multiply.
The ESMF_RegridWeightGen application described in Section 12 performs the first stage of the regridding process - generate the interpolation weight matrix. This tool not only calculates the interpolation weights, it also applies the weights to a variable stored in the source grid file and produces the interpolated values on the destination grid. The interpolated output variable is written out to the destination grid file. This tool only supports two CF compliant file formats: the GRIDSPEC Tile grid file format( 12.4.3) for a logically rectangular grid and the UGRID file format( 12.4.4) for unstructured grid. The SCRIP format( 12.4.1) and the ESMF unstructured grid format( 12.4.2) are not supported because there is no way to define a variable field using these two formats. Currently, the tool only works with 2D grids, the support for the 3D grid will be made available in the future release. The variable array can be up to four dimensions. The variable type is currently limited to single or double precision real numbers. The support for other data types, such as integer or short will be added in the future release.
The user interface of this tool is greatly simplified from ESMF_RegridWeightGen. User only needs to provide two input file names, the source and the destination variable names and the regrid method. The tool will figure out the type of the grid file automatically based on the attributes of the variable. If the variable has a coordinates attribute, the grid file is a GRIDSPEC file and the value of the coordinates defines the longitude and latitude variable's names. For example, following is a simple GRIDSPEC file with a variable named PSL and coordinate variables named lon and lat.
netcdf simple_gridspec { dimensions: lat = 192 ; lon = 288 ; variables: float PSL(lat, lon) ; PSL:time = 50. ; PSL:units = "Pa" ; PSL:long_name = "Sea level pressure" ; PSL:cell_method = "time: mean" ; PSL:coordinates = "lon lat" ; double lat(lat) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; double lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; }
If the variable has a mesh attribute and a location attribute, the grid file is in UGRID format( 12.4.4). The value of mesh attribute is the name of a dummy variable that defines the mesh topology. If the application performs a conservative regridding, the value of the location attribute has to be face, otherwise, it has to be node. This is because ESMF only supports non-conservative regridding on the data stored at the nodes of a ESMF_Mesh object, and conservative regridding on the data stored at the cells of a ESMF_Mesh object.
Here is an example 2D UGRID file:
netcdf simple_ugrid { dimensions: node = 4176 ; nele = 8268 ; three = 3 ; time = 2 ; variables: float lon(node) ; lon:units = "degrees_east" ; float lat(node) ; lat:units = "degrees_north" ; float lonc(nele) ; lonc:units = "degrees_east" ; float latc(nele) ; latc:units = "degrees_north" ; int nv(nele, three) ; nv:standard_name = "face_node_connectivity" ; nv:start_index = 1. ; float zeta(time, node) ; zeta:standard_name = "sea_surface_height_above_geoid" ; zeta:_FillValue = -999. ; zeta:location = "node" ; zeta:mesh = "fvcom_mesh" ; float ua(time, nele) ; ua:standard_name = "barotropic_eastward_sea_water_velocity" ; ua:_FillValue = -999. ; ua:location = "face" ; ua:mesh = "fvcom_mesh" ; float va(time, nele) ; va:standard_name = "barotropic_northward_sea_water_velocity" ; va:_FillValue = -999. ; va:location = "face" ; va:mesh = "fvcom_mesh" ; int fvcom_mesh(node) ; fvcom_mesh:cf_role = "mesh_topology" ; fvcom_mesh:dimension = 2. ; fvcom_mesh:locations = "face node" ; fvcom_mesh:node_coordinates = "lon lat" ; fvcom_mesh:face_coordinates = "lonc latc" ; fvcom_mesh:face_node_connectivity = "nv" ; }
There are three variables defined in the above UGRID file - zeta on the node of the mesh, ua and va on the face of the mesh. All three variables have one extra time dimension.
If the variable specified for the destination file does not already exist in the file, the file type is determined as follows: First search for a variable that has a cf_role attribute of value mesh_topology. If successful, the file is a UGRID file. The destination variable will be created on the nodes if the regrid method is non-conservative and the source variable is located on the nodes. Otherwise, the destination variable will be created on the face. If the destination file is not a UGRID file, check if there is a variable with its units attribute set to degrees_east and another variable with it's units attribute set to degrees_west. If such a pair is found, the file is a GRIDSPEC file and the above two variables will be used as the coordinate variables for the variable to be created. If more than one pair of coordinate variables are found in the file, the application will fail with an error message.
If the destination variable exists in the destination grid file, it has to have the same number of dimensions and the same type as the source variable. Except for the latitude and longitude dimensions, the size of the destination variable's extra dimensions (e.g., time and vertical layers) has to match with the source variable. If the destination varialbe does not exist in the destination grid file, a new variable will be created with the same type and matching dimensions as the source variable. All the attributes of the source variable will be copied to the destination variable except those related to the grid definition (i.e. coordinates attribute if the destination file is in GRIDSPEC format or mesh and location attributes if the destination file is in UGRID format.
Additional rules beyond the CF convention are adopted to determine whether there is a time dimension defined in the source and destination files. In this application, only a dimension with a name time is considered as a time dimension. If the source variable has a time dimension and the destination variable is not already defined, the application first checks if there is a time dimension defined in the destination file. If so, the values of the time dimension in both files have to be identical. If the time dimension values don't match, the application terminates with an error message. The application does not check the existence of a time variable or if the units attribute of the time variable match in two input files. If the destination file does not have a time dimension, it will be created. UNLIMITED time dimension is allowed in the source file, but the time dimension created in the destination file is not UNLIMITED.
This application requires the NetCDF library to read the grid files and write out the interpolated variable. To compile ESMF with the NetCDF library, please refer to the "Third Party Libraries" Section in the ESMF User's Guide for more information.
Internally this application uses the ESMF public API to perform regridding. If a source or destination grid is logically rectangular, then ESMF_GridCreate()( 31.6.13) is used to create an ESMF_Grid object from the file. The coordinate variables are stored at the center stagger location (ESMF_STAGGERLOC_CENTER). If the application performs a conservative regridding, the addCornerStager argument is set to TRUE and the bound variables in the grid file will be read in and stored at the corner stagger location (ESMF_STAGGERLOC_CORNER). If the variable has an _FillValue attribute defined, a mask will be generated using the missing values of the variable. The data variable is defined as a ESMF_Field object at the center stagger location (ESMF_STAGGERLOC_CENTER) of the grid.
If the source grid is an unstructured grid and the the regrid method is nearest neighbor, or if the destination grid is unstructured and the regrid method is non-conservative, ESMF_LocStreamCreate()( 32.4.14 is used to create an ESMF_LocStream object. Otherwise, ESMF_MeshCreate()( 33.4.7) is used to create an ESMF_Mesh object for the unstructured input grids. Currently, only the 2D unstructured grid is supported. If the application performs a conservative regridding, the variable has to be defined on the face of the mesh cells, i.e., its location attribute has to be set to face. Otherwise, the variable has to be defined on the node and its (location attribute is set to node).
Similar to ESMF_RegridWeightGen application (Section 12), this application supports bilinear, patch, nearest neighbor, or first-order conservative interpolation. The descriptions of different interpolation methods can be found at Section 24.2 and Section 12. It also supports different pole methods for non-conservative interpolation and allows user to choose to ignore the errors when some of the destination points cannot be mapped by any source points.
The command line arguments are all keyword based. Both the long keyword prefixed with '--'
or the
one character short keyword prefixed with '-' are supported. The format to run the application is
as follows:
ESMF_Regrid --source|-s src_grid_filename --destination|-d dst_grid_filename --src_var var_name --dst_var var_name [--method|-m bilinear|patch|nearestdtos|neareststod|conserve] [--pole|-p none|all|teeth|1|2|..] [--ignore_unmapped|-i] [--ignore_degenerate] [-r] [--src_regional] [--dst_regional] [--no_log] [--help] [--version] [-V] where --source or -s - a required argument specifying the source grid file name --destination or -d - a required argument specifying the destination grid file name -- src_var - a required argument specifying the name of the variable in the src grid file to be interpolated from -- dst_var - a required argument specifying the name of the variable to be interpolated to. The variable may or may not exist in the destination grid file --method or -m - an optional argument specifying which interpolation method is used. The value can be one of the following: bilinear - for bilinear interpolation, also the default method if not specified. patch - for patch recovery interpolation nearstdtos - for nearest destination to source interpolation nearststod - for nearest source to destination interpolation conserve - for first-order conservative interpolation --pole or -p - an optional argument indicating what to do with the pole. The value can be one of the following: none - No pole, the source grid ends at the top (and bottom) row of nodes specified in <source grid>. all - Construct an artificial pole placed in the center of the top (or bottom) row of nodes, but projected onto the sphere formed by the rest of the grid. The value at this pole is the average of all the pole values. This is the default option. teeth - No new pole point is constructed, instead the holes at the poles are filled by constructing triangles across the top and bottom row of the source Grid. This can be useful because no averaging occurs, however, because the top and bottom of the sphere are now flat, for a big enough mismatch between the size of the destination and source pole regions, some destination points may still not be able to be mapped to the source Grid. <N> - Construct an artificial pole placed in the center of the top (or bottom) row of nodes, but projected onto the sphere formed by the rest of the grid. The value at this pole is the average of the N source nodes next to the pole and surrounding the destination point (i.e. the value may differ for each destination point. Here N ranges from 1 to the number of nodes around the pole. --ignore_unmapped or -i - ignore unmapped destination points. If not specified the default is to stop with an error if an unmapped point is found. --ignore_degenerate - ignore degenerate cells in the input grids. If not specified the default is to stop with an error if an degenerate cell is found. -r - an optional argument specifying that the source and destination grids are regional grids. If the argument is not given, the grids are assumed to be global. --src_regional - an optional argument specifying that the source is a regional grid and the destination is a global grid. --dst_regional - an optional argument specifying that the destination is a regional grid and the source is a global grid. --no_log - Turn off the ESMF error log. --help - Print the usage message and exit. --version - Print ESMF version and license information and exit. -V - Print ESMF version number and exit.
The example below regrids the node variable zeta defined in the sample UGRID file(13.1) to the destination grid defined in the sample GRIDSPEC file(13.1) using bilinear regridding method and write the interpolated data into a variable named zeta.
mpirun -np 4 ESMF_Regrid -s simple_ugrid.nc -d simple_gridspec.nc \ --src_var zeta --dst_var zeta
In this case, the destination variable does not exist in simple_ugrid.nc and the time dimension is not defined in the destination file. The resulting output file has a new time dimension and a new variable zeta. The attributes from the source variable zeta are copied to the destination variable except for mesh and location. A new attribute coordinates is created for the destination variable to specify the names of the coordinate variables. The header of the output file looks like:
netcdf simple_gridspec { dimensions: lat = 192 ; lon = 288 ; time = 2 ; variables: float PSL(lat, lon) ; PSL:time = 50. ; PSL:units = "Pa" ; PSL:long_name = "Sea level pressure" ; PSL:cell_method = "time: mean" ; PSL:coordinates = "lon lat" ; double lat(lat) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ; double lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float zeta(time, lat, lon) ; zeta:standard_name = "sea_surface_height_above_geoid" ; zeta:_FillValue = -999. ; zeta:coordinates = "lon lat" ; }
The next example shows the command to do the same thing as the previous example but for a different variable ua. Since ua is defined on the face, we can only do a conservative regridding.
mpirun -np 4 ESMF_Regrid -s simple_ugrid.nc -d simple_gridspec.nc \ --src_var ua --dst_var ua -m conserve
The ESMF_Scrip2Unstruct application is a parallel program that converts a SCRIP format grid file 12.4.1 into an unstructured grid file in the ESMF Unstructured file format 12.4.2 or in the UGRID file format 12.4.4. This application program can be used together with ESMF_RegridWeightGen 12 application for the unstructured SCRIP format grid files. An unstructured SCRIP grid file will be converted into the ESMF unstrutured file format internally in ESMF_RegridWeightGen. The conversion subroutine used in ESMF_RegridWeightGen is sequential and could be slow if the grid file is very big. It will be more efficient to run the ESMF_Scrip2Unstruct first and then regrid the output ESMF or UGRID file using ESMF_RegridWeightGen. Note that a logically rectangular grid file in the SCRIP format (i.e. the dimension grid_rank is equal to 2) can also be converted into an unstructured grid file with this application.
The application usage is as follows:
ESMF_Scrip2Unstruct inputfile outputfile dualflag [fileformat] where inputfile - a SCRIP format grid file outputfile - the output file name dualflag - 0 for straight conversion and 1 for dual mesh. A dual mesh is a mesh constructed by putting the corner coordinates in the center of the elements and using the center coordinates to form the mesh corner vertices. fileformat - an optional argument for the output file format. It could be either ESMF or UGRID. If not specified, the output file is in the ESMF format.