API¶

Classes¶

ESMPy uses a Field object to represent data variables built on an underlying spatial discretization, which is represented by a Grid, Mesh or LocStream. Regridding between Fields is accomplished with the Regrid class. All of these classes are explained in more detail in the sections provided by the links in the following table.

Class	Description
`Manager`	A manager class to initialize and finalize ESMF
`Field`	A data field built on a `Grid`, `Mesh`, or `LocStream`
`Grid`	A class to represent a logically rectangular grid
`Mesh`	A class to represent an unstructured grid
`LocStream`	A class to represent observational data as a collection of disconnected points
`Regrid`	The regridding utility
`RegridFromFile`	The from file regridding utility

Named Constants¶

ESMPy follows the ESMF convention of using “named constants” to represent the available options for parameters that expect a variety of specific inputs. The following table lists the available named constants and provides links to pages that further explain the available values.

Named constants	Description
`CoordSys`	Specify the coordinate system of a `Grid`
`ExtrapMethod`	Specify the extrapolation method
`FileFormat`	Specify the format of a data file
`FileMode`	Specify the mode of a data file
`GridItem`	Specify a mask or area item on a `Grid`
`LineType`	Specify the type of line that connects two points on a sphere
`LogKind`	Specify how much logging should be done
`MeshElemType`	Specify the type of the `Mesh` elements
`MeshLoc`	Specify a nodal or elemental `Mesh`
`NormType`	Specify the type of normalization to use for conservative regridding weights
`PoleKind`	Specify the type of connection that appears at the poles of the `Grid`
`PoleMethod`	Specify which type of artificial pole to construct on the source `Grid` for regridding
`Region`	Specify various regions in the data layout of
`RegridMethod`	Specify which interpolation method to use regridding
`StaggerLoc`	Specify the position for data in a `Grid` cell
`TypeKind`	Specify the type and kind of data
`UnmappedAction`	Specify which action to take with respect to unmapped destination points

Manager¶

The Manager is used by ESMPy to simplify a number of low-level calls used by the underlying ESMF framework to allocate resources, enable logging, and control garbage collection.

Resource Allocation¶

The ESMF Virtual Machine (VM) is created at the beginning of each ESMPy execution, and contains information about the topology and characteristics of the underlying computer. The VM allocates computational resources in the form of Persistent Execution Threads, or PETs. These are equivalent to operating system threads with a lifetime of at least that of the ESMPy execution. In the simplest, and most common case, a PET is equivalent to an MPI process. The number of PETs and the current PET can be queried from the Manager:

mg = esmpy.Manager()
pet_count = mg.pet_count()
local_pet = mg.local_pet()

Refer to the VM Class of the ESMF Reference Manual for more information.

Logging¶

The Manager is also used to enable logging:

mg = esmpy.Manager(debug=True)
local_pet = mg.local_pet

The output will be logged in files named PET<local_pet>.ESMF_LogFile.

Memory management¶

The underlying ESMF framework needs to be initialized and finalized once and only once per execution. This is handled internally by the Manager and does not require any explicit user intervention. However, the ESMF garbage collection feature is not triggered until the finalization routine is invoked, which may not happen until the Manager goes out of scope at the end of the program execution.

If memory deallocation of ESMPy objects is required prior to the end of the program, the class level destroy routines should be invoked:

mg = esmpy.Manager()

mg.destroy()

This is commonly required when reusing a Regrid object to interpolate data between many Field pairs.

MOAB Mesh backend¶

The Manager can be used to enable the MOAB mesh backend to the Mesh.

mg.set_moab(moab_on=True)

The MOAB mesh is an alternative to the native ESMF mesh, and does not yet have full support.

Spatial Discretization Objects¶

There are three different objects used for spatial coordinate representation: Grid, Mesh, and LocStream. Grids are used to represent logically rectangular grids, Meshes are used for unstructured collections of polygons, and LocStreams are used for unstructured collections of individual points. These objects are nearly identical counterparts to the objects of the same name in ESMF, with some simplifications for ease of use in the Python environment.

Grid¶

The Grid is used to represent the geometry and discretization of logically rectangular physical grids. The Grid can also hold information that can used in calculations involving the Grid, like a mask or the cell areas. Refer to the Grid Class of the ESMF Reference Manual for more information.

Staggering¶

Staggering is a finite difference technique in which the values of different physical quantities are placed at different locations within a grid cell.

The ESMF Grid class supports a variety of stagger locations, including cell centers, corners, and edge centers. The default stagger location in ESMF is the cell center, and cell counts in Grid are based on this assumption. Combinations of the 2D ESMF stagger locations are sufficient to specify any of the Arakawa staggers. ESMF also supports staggering in 3D and higher dimensions. There are shortcuts for standard staggers, and interfaces through which users can create custom staggers.

As a default the ESMF Grid class provides symmetric staggering, so that cell centers are enclosed by cell perimeter (e.g. corner) stagger locations. This means the coordinate arrays for stagger locations other than the center will have an additional element of padding in order to enclose the cell center locations. However, to achieve other types of staggering, the user may alter or eliminate this padding by using the appropriate options when adding coordinates to a Grid.

Grid staggers are indicated using StaggerLoc.

grid = esmpy.Grid(np.array([3,4]), staggerloc=esmpy.StaggerLoc.CENTER)

Spherical Coordinates¶

In the case that the Grid is on a sphere (coord_sys = esmpy.api.constants.CoordSys.SPH_DEG or esmpy.api.constants.CoordSys.SPH_RAD) then the coordinates given in the Grid are interpreted as latitude and longitude values. The coordinates can either be in degrees or radians as indicated by the coord_sys flag set during Grid creation. As is true with many global models, this application currently assumes the latitude and longitude refer to positions on a perfect sphere.

The Grid coordinate system is represented using CoordSys.

grid = esmpy.Grid(np.array([3,4]), staggerloc=esmpy.StaggerLoc.CENTER,
                    coord_sys=esmpy.CoordSys.SPH_DEG)

Periodicity¶

A periodic connection can be specified when building Grids in spherical coordinates. The num_peri_dims parameter indicates the total number of periodic dimensions and periodic_dim is used to identify which dimensions should be considered periodic. There must always be at least one non-periodic dimension. For example, to create a global latitude-longitude Grid there would be one periodic dimension, dimension 0 (longitude).

grid = esmpy.Grid(np.array([3,4]), staggerloc=esmpy.StaggerLoc.CENTER,
                    coord_sys=esmpy.CoordSys.SPH_DEG,
                    num_peri_dims=1, periodic_dim=0)

Pole Generation¶

The Grid can generate an artificial pole by using the pole_dim parameter. This can be helpful for regridding operations to smooth out the interpolated values in the polar region. For the example of creating a global latitude-longitude Grid, the pole dimension would be 1 (latitude).

grid = esmpy.Grid(np.array([3,4]), staggerloc=esmpy.StaggerLoc.CENTER,
                    coord_sys=esmpy.CoordSys.SPH_DEG,
                    num_peri_dims=1, periodic_dim=0, pole_dim=1)

Masking¶

Masking is the process used to mark parts of a Grid to be ignored during an operation. Marking Grid cells as masked can affect the Field values that are represented by those cells. Masking is specified by assigning an integer value to a Grid cell. This allows many different masks to be defined on the same Grid, any combination of which may be also activated on the Field by specifying the corresponding integer values. The activation of Field masks with respect to the underlying Grid mask is handled by Regrid, and a more general discussion of masking is covered in the masking section.

In [1]: import numpy as np
   ...: import esmpy
   ...: grid = esmpy.Grid(np.array([3,4]), staggerloc=esmpy.StaggerLoc.CENTER,
   ...:                                coord_sys=esmpy.CoordSys.SPH_DEG,
   ...:                                num_peri_dims=1, periodic_dim=0, pole_dim=1)
   ...:
   ...: mask = grid.add_item(esmpy.GridItem.MASK, staggerloc=esmpy.StaggerLoc.CENTER)
   ...: mask
   ...:
Out[1]:
array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int32)

Cell Areas¶

Grid cell areas can be calculated by ESMPy. Space must first be allocated for this calculation by adding an AREA item to the Grid. Then a Field must be created, and the get_area() function called.

Note

The Grid area calculation assumes the Grid is a unit sphere.

Grid cell areas may also be set to user-defined values after the AREA item has been allocated and retrieved using get_item().

In [1]: grid = esmpy.Grid(np.array([3,4]), staggerloc=[esmpy.StaggerLoc.CENTER, esmpy.StaggerLoc.CORNER],
   ...:                  coord_sys=esmpy.CoordSys.SPH_DEG,
   ...:                  num_peri_dims=1, periodic_dim=0, pole_dim=1)
   ...:
   ...:
   ...: gridLon = grid.get_coords(0)
   ...: gridLat = grid.get_coords(1)
   ...: gridLonCorner = grid.get_coords(0, staggerloc=esmpy.StaggerLoc.CORNER)
   ...: gridLatCorner = grid.get_coords(1, staggerloc=esmpy.StaggerLoc.CORNER)
   ...:
   ...: lon = np.linspace(-120,120,3)
   ...: lat = np.linspace(-67.5, 67.5,4)
   ...: lon_corner = np.arange(-180,180,120)
   ...: lat_corner = np.linspace(-90, 90, 5)
   ...:
   ...: lonm, latm = np.meshgrid(lon, lat, indexing='ij')
   ...: lonm_corner, latm_corner = np.meshgrid(lon_corner, lat_corner, indexing='ij')
   ...:
   ...: gridLon[:] = lonm
   ...: gridLat[:] = latm
   ...: gridLonCorner[:] = lonm_corner
   ...: gridLatCorner[:] = latm_corner
   ...:
   ...: field = esmpy.Field(grid)
   ...: field.get_area()
   ...: field.data
   ...:
Out[1]:
array([[ 0.32224085,  1.02707409,  1.02707409,  0.32224085],
       [ 0.32224085,  1.02707409,  1.02707409,  0.32224085],
       [ 0.32224085,  1.02707409,  1.02707409,  0.32224085]])

Mesh¶

A Mesh is an object for representing unstructured grids. Refer to the Mesh Class of the ESMF Reference Manual for more information.

A Mesh is constructed of nodes and elements. A node, also known as a vertex or corner, is a part of a Mesh which represents a single point. An element, also known as a cell, is a part of a Mesh which represents a small region of space. Elements are described in terms of a connected set of nodes which represent locations along their boundaries.

Field data may be located on either the nodes or elements of a Mesh. Fields created on a Mesh can also be used as either the source or destination or both of a regridding operation.

The dimension of a Mesh in ESMF is specified with two parameters: the parametric dimension and the spatial dimension.

The parametric dimension of a Mesh is the dimension of the topology of the Mesh. This can be thought of as the dimension of the elements which make up the Mesh. For example, a Mesh composed of triangles would have a parametric dimension of 2, and a Mesh composed of tetrahedra would have a parametric dimension of 3.

The spatial dimension of a Mesh is the dimension of the space in which the Mesh is embedded. In other words, it is the number of coordinate dimensions needed to describe the location of the nodes making up the Mesh.

For example, a Mesh constructed of squares on a plane would have a parametric dimension of 2 and a spatial dimension of 2. If that same Mesh were used to represent the 2D surface of a sphere, then the Mesh would still have a parametric dimension of 2, but now its spatial dimension would be 3.

Only Meshes whose number of coordinate dimensions (spatial dimension) is 2 or 3 are supported. The dimension of the elements in a Mesh (parametric dimension) must be less than or equal to the spatial dimension, but also must be either 2 or 3. This means that a Mesh may be either 2D elements in 2D space, 3D elements in 3D space, or a manifold constructed of 2D elements embedded in 3D space.

For a parametric dimension of 2, the native supported element types are triangles and quadrilaterals. In addition to these, ESMF supports 2D polygons with any number of sides. Internally these are represented as sets of triangles, but to the user should behave like any other element. For a parametric dimension of 3, the supported element types are tetrahedrons and hexahedrons. The Mesh supports any combination of element types within a particular dimension, but types from different dimensions may not be mixed. For example, a Mesh cannot be constructed of both quadrilaterals and tetrahedra.

Mesh Creation¶

To create a Mesh we need to set some properties of the Mesh as a whole, some properties of each node in the Mesh and then some properties of each element which connects the nodes.

For the Mesh as a whole we set its parametric dimension and spatial dimension. A Mesh's parametric dimension can be thought of as the dimension of the elements which make up the Mesh. A Mesh's spatial dimension, on the other hand, is the number of coordinate dimensions needed to describe the location of the nodes making up the Mesh.

The structure of the per node and element information used to create a Mesh is influenced by the Mesh distribution strategy. The Mesh class is distributed by elements. This means that a node must be present on any PET that contains an element associated with that node, but not on any other PET (a node can’t be on a PET without an element “home”). Since a node may be used by two or more elements located on different PETs, a node may be duplicated on multiple PETs. When a node is duplicated in this manner, one and only one of the PETs that contain the node must “own” the node. The user sets this ownership when they define the nodes during Mesh creation. When a Field is created on a Mesh (i.e. on the Mesh nodes), on each PET the Field is only created on the nodes which are owned by that PET. This means that the size of the Field memory on the PET can be smaller than the number of nodes used to create the Mesh on that PET.

Three properties need to be defined for each Mesh node: the global id of the node (node_ids), node coordinates (node_coords), and which PET owns the node (node_owners). The node id is a unique (across all PETs) integer attached to the particular node. It is used to indicate which nodes are the same when connecting together pieces of the Mesh on different PETs. The node coordinates indicate the location of a node in space and are used in the Regrid functionality when interpolating. The node owner indicates which PET is in charge of the node. This is used when creating a Field on the Mesh to indicate which PET should contain a Field location for the data.

Three properties need to be defined for each Mesh element: the global id of the element (element_ids), the topology type of the element (element_types), and which nodes are connected together to form the element (element_conn). The element id is a unique (across all PETs) integer attached to the particular element. The element type describes the topology of the element (e.g. a triangle vs. a quadrilateral). The range of choices for the topology of the elements in a Mesh are restricted by the Mesh's parametric dimension (e.g. a Mesh can’t contain a 2D element like a triangle, when its parametric dimension is 3D), but it can contain any combination of elements appropriate to its dimension. In particular, in 2D ESMF supports two native element types triangle and quadrilateral, but also provides support for polygons with any number of sides. These polygons are represented internally as sets of triangles, but to the user should behave like other elements. To specify a polygon with more than four sides, the element type should be set to the number of corners of the polygon (e.g. element type=6 for a hexagon). The element connectivity indicates which nodes are to be connected together to form the element. The number of nodes connected together for each element is implied by the elements topology type (element_types). It is IMPORTANT to note, that the entries in this list are NOT the global ids of the nodes, but are indices into the PET local lists of node info used in the Mesh creation. In other words, the element connectivity isn’t specified in terms of the global list of nodes, but instead is specified in terms of the locally described node info. One other important point about connectivities is that the order of the nodes in the connectivity list of an element is important. In general, when specifying an element with parametric dimension 2, the nodes should be given in counterclockwise order around the element.

The three step Mesh creation process starts with a call to the Mesh constructor. It is then followed by the add_nodes() call to specify nodes, and then the add_elements() call to specify elements.

#  2.5        8        10 --------11
#          /     \   /            |
#  2.1   7         9              12
#        |         |      5       /
#        |    4    |            /
#        |         |          /
#  1.0   4 ------- 5 ------- 6
#        |         |  \   3  |
#        |    1    |    \    |
#        |         |  2   \  |
# -0.1   1 ------- 2 ------- 3
#
#      -0.1       1.0       2.1   2.5
#
#          Node Ids at corners
#          Element Ids in centers

# Two parametric dimensions, and two spatial dimensions
mesh = esmpy.Mesh(parametric_dim=2, spatial_dim=2, coord_sys=coord_sys)

num_node = 12
num_elem = 5
nodeId = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
nodeCoord = np.array([-0.1,-0.1,  #node id 1
                      1.0,-0.1,  #node id 2
                      2.1,-0.1,  #node id 3
                      0.1, 1.0,  #node id 4
                      1.0, 1.0,  #node id 5
                      2.1, 1.0,  #node id 6
                      0.1, 2.1,  #node id 7
                      0.5, 2.5,  #node id 8
                      1.0, 2.1,  #node id 9
                      1.5, 2.5,  #node id 10
                      2.5, 2.5,  #node id 11
                      2.5, 2.1]) #node id 12


nodeOwner = np.zeros(num_node)

elemId = np.array([1,2,3,4,5])
elemType=np.array([esmpy.MeshElemType.QUAD,
                   esmpy.MeshElemType.TRI,
                   esmpy.MeshElemType.TRI, 5, 6])

elemConn=np.array([0,1,4,3,         # elem id 1
                   1,2,4,           # elem id 2
                   2,5,4,           # elem id 3
                   3,4,8,7,6,       # elem id 4
                   4,5,11,10,9,8])  # elem id 5

mesh.add_nodes(num_node,nodeId,nodeCoord,nodeOwner)

mesh.add_elements(num_elem,elemId,elemType,elemConn)

Masking¶

There are two types of masking available in Mesh: node masking and element masking. These both work in a similar manner, but vary slightly in the details of setting the mask information during Mesh creation.

For node masking, the mask information is set using the node_mask parameter. When a Regrid object is created the mask values arguments src_mask_values and dst_mask_values can then be used to indicate which particular values set in the node_mask array indicate that the node should be masked. For example, if dst_mask_values has been set to 1, then any node in the destination Mesh whose corresponding node_mask value is 1 will be masked out (a node with any other value than 1 will not be masked).

For element masking, the mask information is set using the element_mask parameter when adding elements to the Mesh. In a similar manner to node masking, the mask values parameters to Regrid, src_mask_values and dst_mask_values can then be used to indicate which particular values set in the element_mask array indicate that the element should be masked. For example, if dst_mask_values has been set to 1, then any element in the destination Mesh whose corresponding element_mask value is 1 will be masked out (an element with any other value than 1 will not be masked).

Areas¶

Mesh cell areas can be specified using the element_areas parameter to add_elements().

If cell areas are not specified by the user they can be calculated by ESMPy using get_area().

LocStream¶

A LocStream can be used to represent the locations of a set of data points. For example, in the data assimilation world, LocStreams can be used to represent a set of observations. The values of the data points are stored within a Field created using the LocStream. Refer to the LocStream Class of the ESMF Reference Manual for more information.

The locations are generally described using Cartesian (x, y, z), or (lat, lon, radius) coordinates. The coordinates are stored using constructs called keys. A key is essentially a list of point descriptors, one for each data point. They may hold other information besides the coordinates - a mask, for example. They may also hold a second set of coordinates. Keys are referenced by name. Each key must contain the same number of elements as there are data points in the LocStream. While there is no assumption in the ordering of the points, the order chosen must be maintained in each of the keys.

A LocStream can be very large. Data assimilation systems might use LocStreams with up to 10^8 observations, so efficiency is critical. LocStreams can be created from file.

A LocStream is similar to a Mesh in that both are collections of irregularly positioned points. However, the two structures differ because a Mesh also has connectivity: each data point represents either a center or corner of a cell. There is no requirement that the points in a LocStream have connectivity, in fact there is no requirement that any two points have any particular spatial relationship at all.

locstream = esmpy.LocStream(16, coord_sys=coord_sys)

deg_rad = pi
if coord_sys == esmpy.CoordSys.SPH_DEG:
    deg_rad = 180

locstream["ESMF:Lon"] = [0.0, 0.5*deg_rad, 1.5*deg_rad, 2*deg_rad, 0.0, 0.5*deg_rad, 1.5*deg_rad, 2*deg_rad, 0.0, 0.5*deg_rad, 1.5*deg_rad, 2*deg_rad, 0.0, 0.5*deg_rad, 1.5*deg_rad, 2*deg_rad]
locstream["ESMF:Lat"] = [deg_rad/-2.0, deg_rad/-2.0, deg_rad/-2.0, deg_rad/-2.0, -0.25*deg_rad, -0.25*deg_rad, -0.25*deg_rad, -0.25*deg_rad, 0.25*deg_rad, 0.25*deg_rad, 0.25*deg_rad, 0.25*deg_rad, deg_rad/2.0, deg_rad/2.0, deg_rad/2.0, deg_rad/2.0]
if domask:
    locstream["ESMF:Mask"] = np.array([1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=np.int32)

Create a Grid or Mesh from File¶

File Formats¶

ESMPy can create Grid or Mesh objects from NetCDF files in a variety of formats. A Mesh can be created from files in SCRIP, ESMFMESH, and UGRID formats. Grid files can be in SCRIP and GRIDSPEC format.

SCRIP¶

This file format is used by the SCRIP [1], package, grid files that work with that package should also work here. SCRIP format files are capable of storing either 2D logically rectangular grids or 2D unstructured grids. More information can be found in the ESMF Reference Manual.

ESMFMESH¶

ESMF has a custom unstructured grid file format for describing Meshes. This format is more compatible than the SCRIP format with the methods used to create a Mesh object, so less conversion needs to be done to create a Mesh. The ESMFMESH format is thus more efficient than SCRIP when used with ESMPy. More information can be found in the ESMF Reference Manual.

GRIDSPEC¶

GRIDSPEC is an extension to the Climate and Forecast (CF) metadata conventions for the representation of gridded data for Earth System Models. ESMPy supports NetCDF files that follow the CF GRIDSPEC convention to support logically rectangular lat/lon grids. More information can be found in the ESMF Reference Manual.

UGRID¶

UGRID is an extension to the CF metadata conventions for the unstructured grid data model. ESMPy support NetCDF files that follow the CF UGRID convention for unstructured grids. More information can be found in the ESMF Reference Manual.

Meshes from File¶

When creating a Mesh from a SCRIP format file, there are a number of options to control the output Mesh. The data is located at the center of the grid cell in a SCRIP grid. Therefore, when the Mesh will be part of a conservative regridding operation, the convert_to_dual flag must be set to True to properly generate coordinates at the the cell corners.

A Mesh may also be created with boolean flags to specify whether or not to add an area property to the Mesh add_user_area, or to add a mask add_mask held by the NetCDF variable indicated in the optional argument, varname. These argument are only valid for UGRID formatted files. The mask generated for a Mesh created from file will have 0 for the masked values and 1 for the unmasked values.

Grids from File¶

A number of optional boolean arguments are also supported to create a structured Grid from a file. These include is_sphere to indicate whether the grid is spherical or regional, add_corner_stagger to add the corner stagger information to the Grid for conservative regridding, and add_user_area to specify whether to read in the cell area from the NetCDF file or to calculate them.

For GRIDSPEC formated files there is the add_mask optional argument to add a mask held by the NetCDF variable indicated in optional argument, varname, and the coord_names argument to specify the longitude and latitude variable names in a GRIDSPEC file containing multiple sets of coordinates.

For SCRIP formated files the integer array grid_imask is used to mask out grid cells which should not participate in the regridding.

The mask generated for a Grid created from file (any format) will have 0 for the masked values and 1 for the unmasked values.

Regridding¶

The following table describe the regridding methods and options that are available in ESMPy, the flag that is required to use it and a short description. More information can be found on these options in the ESMF Reference Manual.

Class	Description
`BILINEAR`	Linear regridding in two dimensions
`PATCH`	Higher-order least squares method
`NEAREST_STOD`	Nearest source point used for each destination
`NEAREST_DTOS`	Nearest destination point used for each source
`CONSERVE`	First-order conservative
`CONSERVE_2ND`	Second-order conservative
`NormType`	Normalization options for integral conservation
`LineType`	Line types for spherical and Cartesian space
`UnmappedAction`	Unmapped destination point handling options
`CoordSys`	Spherical grids and pole handling

Great Circle Cells¶

For Grids and Meshes on a sphere some combinations of interpolation options (e.g. first and second-order conservative methods) use cells whose edges are great circles. This section describes some behavior that the user may not expect from these cells and some potential solutions. A great circle edge isn’t necessarily the same as a straight line in latitude longitude space. For small edges, this difference will be small, but for long edges it could be significant. This means if the user expects cell edges as straight lines in latitude longitude space, they should avoid using one large cell with long edges to compute an average over a region (e.g. over an ocean basin).

Also, the user should also avoid using cells that contain one edge that runs half way or more around the earth, because the regrid weight calculation assumes the edge follows the shorter great circle path. There isn’t a unique great circle edge defined between points on the exact opposite side of the earth from one another (antipodal points). However, the user can work around both of these problem by breaking the long edge into two smaller edges by inserting an extra node, or by breaking the large target grid cells into two or more smaller grid cells. This allows the application to resolve the ambiguity in edge direction.

Masking¶

Masking is the process whereby parts of a Grid, Mesh or LocStream can be marked to be ignored during an operation, such as when they are used in regridding. Masking can be used on a Field created from a regridding source to indicate that certain portions should not be used to generate regridded data. This is useful, for example, if a portion of the source contains unusable values. Masking can also be used on a Field created from a regridding destination to indicate that a certain portion should not receive regridded data. This is useful, for example, when part of the destination isn’t being used (e.g. the land portion of an ocean grid).

The user may mask out points in the source Field or destination Field or both. To do masking the user sets mask information in the Grid, Mesh, or LocStream upon which the Fields passed into the Regrid call are built. The src_mask_values and dst_mask_values arguments to that call can then be used to specify which values in that mask information indicate that a location should be masked out. For example, if dst_mask_values is set to [1,2], then any location that has a value of 1 or 2 in the mask information of the Grid, Mesh or LocStream upon which the destination Field is built will be masked out.

Masking behavior differs slightly between regridding methods. For non-conservative regridding methods (e.g. bilinear or high-order patch), masking is done on points. For these methods, masking a destination point means that the point will not participate in regridding. For these methods, masking a source point means that the entire source cell using that point is masked out. In other words, if any corner point making up a source cell is masked then the cell is masked. For conservative regridding methods masking is done on cells. Masking a destination cell means that the cell won’t participate in regridding. Similarly, masking a source cell means that the cell won’t participate in regridding. For any type of interpolation method (conservative or non-conservative) the masking is set on the location upon which the Fields passed into the regridding call are built. For example, if Fields built on StaggerLoc.CENTER are passed into Regrid then the masking should also be set on StaggerLoc.CENTER.

The mask generated for a Grid, Mesh or LocStream created from file will have 0 for the masked values and 1 for the unmasked values.

Note

The Region.SELECT flag to the

zero_region parameter of Regrid can be used to maintain Fields values on locations that do not participate in the regridding operation. This is useful when setting an uninitialized value to help identify masked locations within the Fields data.

Numpy Slicing and Indexing¶

Numpy arrays are used to represent Grid, Mesh and LocStream coordinates and Field data, among other things. Standard numpy conventions for array indexing and slicing can be expected. There are some exceptions when it comes to fancy indexing, index arrays, and multi-dimensional slicing. Significant effort has been put into raising exceptions where inappropriate indexing or slicing operations are attempted.

It is very important to remember that all indexing and slicing operations apply ONLY to the ESMPy level objects, and these operations do not propagate down to the lower-level Fortran- and C-based representations of the ESMF objects. One example of where this could come up is when passing a Field slice into regridding. The entire original Field will still be run through the ESMF regridding engine, and only the appropriate portion of the Field slice will be updated with the regridded values.

Dimension Ordering¶

Warning

The underlying ESMF library is built with a mix of Fortran and C/C++ and follows Fortran conventions with respect to array indexing and dimension ordering. Some effort has been made to make ESMPy feel more natural to the Python user where possible. This means that ESMPy uses 0-based indexing, which is translated to the 1-based indexing used by the ESMPy backend. The only exception to this is NetCDF files that are generated by ESMPy will continue to use the 1-based indexing from the underlying ESMF code. Likewise, the dimension ordering still follows Fortran conventions. This means that longitude comes before latitude, which also comes before temporal dimensions, when in use.

In [1]: import numpy as np
   ...: import esmpy
   ...:
   ...: grid = esmpy.Grid(np.array([3,4]), staggerloc=esmpy.StaggerLoc.CENTER)
   ...:
   ...: gridLon = grid.get_coords(0)
   ...: gridLat = grid.get_coords(1)
   ...:
   ...: lon = np.linspace(-120,120,3)
   ...: lat = np.linspace(-67.5, 67.5,4)
   ...:
   ...: lonm, latm = np.meshgrid(lon, lat, indexing='ij')
   ...:
   ...: gridLon[:] = lonm
   ...: gridLat[:] = latm
   ...:

In [2]: grid.coords[esmpy.StaggerLoc.CENTER][0].shape
Out[2]: (3, 4)

In [3]: lon.shape
Out[3]: (3,)

In [4]: lat.shape
Out[4]: (4,)

In [5]: grid.coords[esmpy.StaggerLoc.CENTER][0]
Out[5]:
array([[-120., -120., -120., -120.],
       [   0.,    0.,    0.,    0.],
       [ 120.,  120.,  120.,  120.]])

In [6]: grid.coords[esmpy.StaggerLoc.CENTER][1]
Out[6]:
array([[-67.5, -22.5,  22.5,  67.5],
       [-67.5, -22.5,  22.5,  67.5],
       [-67.5, -22.5,  22.5,  67.5]])

In [7]: field = esmpy.Field(grid, ndbounds=[10]) # create a Field with a time dimension

In [8]: field.data.shape
Out[8]: (3, 4, 10)

Decomposition Elements (DEs)¶

In ESMF, data classes are distributed over DEs, or Decomposition Elements. DEs are virtual units, not necessarily having a 1-to-1 correspondence to the Persistent Execution Threads (PETs) of a VM or the physical Processing Elements (PEs) in the underlying physical machine. In ESMPy data classes, there is typically exactly one DE per PET. The main exception to this is cubed-sphere grids that are represented as multi-tile grids; for these, there can be multiple DEs on a single PET (with the default decomposition, this occurs for < 6 PETs) or zero DEs on a PET (with the default decomposition, this occurs for > 6 PETs). (In the case of a non-cubed-sphere grid that is simply too small to decompose across the PETs, there is still 1 DE on each PET but some of these DEs will have data with size 0.)

For ESMPy classes that support multiple DEs per PET (Grid and Field), there are two versions of data accessor properties for any data that is decomposed across DEs. Examples are a Field’s data vs. all_data and a Grid’s area vs. all_areas. Versions of these properties without the all_ prefix can only be used in the common case where there is exactly one DE per PET; these properties return the data on the single DE for the current PET. Versions of these properties with the all_ prefix return lists indexed by DE. For example, to work with a Field’s data in the situation where there may be multiple DEs per PET, user code should loop over DEs like:

for de in range(myfield.local_de_count):
    this_data = myfield.all_data[de]
    # operate on this_data

Some methods, such as a Grid’s get_item(), default to retrieving values for the first DE (localde=0). Thus, for the common case with 1 DE per PET, these can be called without a localde argument. But in cases where there may be multiple DEs per PET, these should be called in a loop like in the above example. Furthermore, these methods with an optional localde argument should not be called in cases where there are zero DEs on a given PET; this can be handled via the same looping structure, or if it is known that the only possible cases are zero or one DE per PET, then via a conditional like if grid.local_de_count == 1.

Finally, note that slicing of a Field or Grid is currently not supported with multiple DEs per PET.

Parallel Execution¶

ESMPy is a thin wrapper on top of ESMF, which was designed for high performance and scalable computing. The ESMF virtual machine is used to manage the available resources of the execution environment in a layer that is transparent to the ESMPy user. This allows the full power of the high performance computing environment to be utilized by the ESMPy user with little use of specialized parallel programming techniques.

ESMPy objects will be distributed across the available computing resources with no additional parameters required. The Grid, Mesh, LocStream, and Field classes will all be transparently “parallelized” with no need for user calls to a message passing interface. Likewise, the Regrid class will compute and apply the interpolation weights using all available computing resources with no need for user intervention.

However, it is useful to remember that resulting Field values will only be accessible on certain PETs. The mpi4py package may be necessary for post processing tasks that require access to global Field values.

mpirun vs. MPI.Spawn¶

There are a few different options for using ESMPy in a parallel environment. Using mpirun to specify the desired number of computing cores is probably the easiest way to start a parallel ESMPy job. Another option is to call the MPI.Spawn() function from the mpi4py Python package from within a serial Python script or interpreter. It has been observed that MPI.Spawn() may not work properly when mpi4py is built with an underlying mpich library, openmpi has seen better success. A third option is to call mpirun using a system call from within a serial Python script or interpreter, however this method is not highly recommended.

The following two examples demonstrate how to execute an ESMPy script in parallel. Any of the scripts found in the examples directory of the ESMPy source code can be run in parallel using mpirun as well as in serial mode.

mpirun¶

mpirun -n 4 python hello_world.py

MPI.Spawn¶

import sys
from mpi4py import MPI

# Parent
if len(sys.argv) == 1:

    # Spawn workers
    comm = MPI.COMM_WORLD.Spawn(
        sys.executable,
        args=[sys.argv[0], 'worker'],
        maxprocs=4)

    # Shutdown
    comm.Disconnect()

# Worker
elif sys.argv[1] == 'worker':

    # Connect to parent
    try:
        comm = MPI.Comm.Get_parent()
        rank = comm.Get_rank()
    except:
        raise ValueError('Could not connect to parent - ' + usage)

    # worker code goes here, regridding etc..
    print("Hello World from PET #"+str(rank))

    # Shutdown
    comm.Disconnect()

# Catch
else:
    raise ValueError('Program should be started without arguments')

A more detailed example of using MPI.Spawn() can be found in the Tutorials section of the documentation.