The ESMF utilities are a set of tools for quickly assembling modeling applications.
The ESMF Info class enables models to be self-describing via metadata, which are instances of JSON-compatible key-value pairs.
The Time Management Library provides utilities for time and time interval representation and calculation, and higher-level utilities that control model time stepping, via clocks, as well as alarming.
The ESMF Config class provides configuration management based on NASA DAO's Inpak package, a collection of methods for accessing files containing input parameters stored in an ASCII format.
The ESMF LogErr class consists of a variety of methods for writing error, warning, and informational messages to log files. A default Log is created during ESMF initialization. Other Logs can be created later in the code by the user.
The DELayout class provides a layer of abstraction on top of the Virtual Machine (VM) layer. DELayout does this by introducing DEs (Decomposition Elements) as logical resource units. The DELayout object keeps track of the relationship between its DEs and the resources of the associated VM object. A DELayout can be shaped by the user at creation time to best match the computational problem or other design criteria.
The ESMF VM (Virtual Machine) class is a generic representation of hardware and system software resources. There is exactly one VM object per ESMF Component, providing the execution environment for the Component code. The VM class handles all resource management tasks for the Component class and provides a description of the underlying configuration of the compute resources used by a Component. In addition to resource description and management, the VM class offers the lowest level of ESMF communication methods.
The ESMF Fortran I/O utilities provide portable methods to access capabilities which are often implemented in different ways amongst different environments. Currently, two utility methods are implemented: one to find an unopened unit number, and one to flush an I/O buffer.
All ESMF base objects (i.e. Array, ArrayBundle, Field, FieldBundle, Grid, Mesh, DistGrid) contain a key-value attribute storage object called ESMF_Info. ESMF_Info objects may also be created independent of a base object. ESMF_Info supports setting and getting key-value pairs where the key is a string and the value is a scalar or a list of common data types. An ESMF_Info object may have a flat or nested data structure. The purpose of ESMF_Info is to support I/O-compatible metadata structures (i.e. netCDF), internal record-keeping for model execution (NUOPC), and provide a mechanism for custom user metadata attributes.
ESMF_Info is designed for interoperability. To achieve this goal, ESMF_Info adopted the JSON (Javascript Object Notation) specification. Internally, ESMF_Info uses JSON for Modern C++ [1] to manage its storage map. There are numerous resources for JSON on the web [6]. Quoting from the json.org site [6] when it introduces the format:
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language. JSON is built on two structures:
These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.
By adopting JSON compliance for ESMF_Info, ESMF made its core metadata capabilities explicitly interoperable with a widely used data structure. If data may be represented with JSON, then it is compatible with ESMF_Info.
There are some aspects of the ESMF_Info implementation related to JSON and JSON for Modern C++ that should be noted:
Below are examples for setting and getting an attribute using ESMF_Info and the legacy ESMF_Attribute. The ESMF_Info interfaces are not overloaded for ESMF object types but rather work off a handle retrieved via a get call.
call ESMF_AttributeSet(array, "aKey", 15, rc=rc)With ESMF_Info:
call ESMF_InfoGetFromHost(array, info, rc=rc) call ESMF_InfoSet(info, "aKey", 15, rc=rc)
Notice that the legacy ESMF_Attribute API expects the usage of what was called an "Attribute Package". This essentially corresponds to a namespace similar to what ESMF_Info provides for keys via the JSON Pointer syntax (see 40.2). In the above ESMF_AttributeSet() call, without specification of convention and purpose arguments, the resulting JSON pointer of the key is "/ESMF/General/aKey". This is important to account for when mixing deprecated ESMF_Attribute calls with the ESMF_Info API.
call ESMF_AttributeGet(array, "aKey", aKeyValue, rc=rc)With ESMF_Info:
call ESMF_InfoGetFromHost(array, info, rc=rc) call ESMF_InfoGet(info, "aKey", aKeyValue, rc=rc)
Notice again that the ESMF_Attribute API automatically prepends "/ESMF/General/" to the JSON pointer used for key in the absence of convention and purpose arguments.
Some examples for valid "key" arguments:
The ESMF Time Manager utility includes software for time and date representation and calculations, model time advancement, and the identification of unique and periodic events. Since multi-component geophysical applications often require synchronization across the time management schemes of the individual components, the Time Manager's standard calendars and consistent time representation promote component interoperability.
Key Features |
Drift-free timekeeping through an integer-based internal time representation. Both integers and reals can be specified at the interface. |
The ability to represent time as a rational fraction, to support exact timekeeping in applications that involve grid refinement. |
Support for many calendar kinds, including user-customized calendars. |
Support for both concurrent and sequential modes of component execution. |
Support for varying and negative time steps. |
In the remainder of this section, we briefly summarize the functionality that the Time Manager classes provide. Detailed descriptions and usage examples precede the API listing for each class.
TimeIntervals and Time instants (simply called Times) are the computational building blocks of the Time Manager utility. TimeIntervals support operations such as add, subtract, compare size, reset value, copy value, and subdivide by a scalar. Times, which are moments in time associated with specific Calendars, can be incremented or decremented by TimeIntervals, compared to determine which of two Times is later, differenced to obtain the TimeInterval between two Times, copied, reset, and manipulated in other useful ways. Times support a host of different queries, both for values of individual Time components such as year, month, day, and second, and for derived values such as day of year, middle of current month and Julian day. It is also possible to retrieve the value of the hardware realtime clock in the form of a Time. See Sections 43.1 and 44.1, respectively, for use and examples of Times and TimeIntervals.
Since climate modeling, numerical weather prediction and other Earth and space applications have widely varying time scales and require different sorts of calendars, Times and TimeIntervals must support a wide range of time specifiers, spanning nanoseconds to years. The interfaces to these time classes are defined so that the user can specify a time using a combination of units selected from the list shown in Table 41.4.
Unit | Meaning |
<yy|yy_i8> | Year. |
mm | Month of the year. |
dd | Day of the month. |
<d|d_i8|d_r8> | Julian or Modified Julian day. |
<h|h_r8> | Hour. |
<m|m_r8> | Minute. |
<s|s_i8|s_r8> | Second. |
<ms|ms_r8> | Millisecond. |
<us|us_r8> | Microsecond. |
<ns|ns_r8> | Nanosecond. |
O | Time zone offset in integer number of hours and minutes. |
<sN|sN_i8> | Numerator for times of the form s , where s is seconds and s, sN, and sD are integers. This format provides a mechanism for supporting exact behavior. |
<sD|sD_i8 | Denominator for times of the form s , where s is seconds and s, sN, and sD are integers. |
The result of this strategy is that Time Intervals and Times gain a consistent core representation of time as well a set of basic methods.
The BaseTime class can be designed with a minimum number of elements to represent any required time. The design is based on the idea used in the real-time POSIX 1003.1b-1993 standard. That is, to represent time simply as a pair of integers: one for seconds (whole) and one for nanoseconds (fractional). These can then be converted at the interface level to any desired format.
For ESMF, this idea can be modified and extended, in order to handle the requirements for a large time range (> 200,000 years) and to exactly represent any rational fraction, not just nanoseconds. To handle the large time range, a 64-bit or greater integer is used for whole seconds. Any rational fractional second is expressed using two additional integers: a numerator and a denominator. Both the whole seconds and fractional numerator are signed to handle negative time intervals and instants. For arithmetic consistency both must carry the same sign (both positive or both negative), except, of course, for zero values. The fractional seconds element (numerator) is bounded with respect to whole seconds. If the absolute value of the numerator becomes greater than or equal to the denominator, whole seconds are incremented or decremented accordingly and the numerator is reset to the remainder. Conversions are performed upon demand by interface methods within the TimeInterval and Time classes. This is done because different applications require different representations of time intervals and time instances. Floating point values as well as integers can be specified for the various time units in the interfaces, see Table 41.4. Floating point values are represented internally as integer-based rational fractions.
The BaseTime class defines increment and decrement methods for basic TimeInterval calculations between Time instants. It is done here rather than in the Calendar class because it can be done with simple second-based arithmetic that is calendar independent.
Comparison methods can also be defined in the BaseTime class. These perform equality/inequality, less than, and greater than comparisons between any two TimeIntervals or Times. These methods capture the common comparison logic between TimeIntervals and Times and hence are defined here for sharing.
The following is a simplified UML diagram showing the structure of the Time Manager utility. See Appendix A, A Brief Introduction to UML, for a translation table that lists the symbols in the diagram and their meaning.
The Calendar class represents the standard calendars used in geophysical modeling: Gregorian, Julian, Julian Day, Modified Julian Day, no-leap, 360-day, and no-calendar. It also supports a user-customized calendar. Brief descriptions are provided for each calendar below. For more information on standard calendars, see [20] and [17].
DESCRIPTION:
Supported calendar kinds.
The type of this flag is:
type(ESMF_CalKind_Flag)
The valid values are:
MJD = JD - 2400000.5
The half day is subtracted so that the day starts at midnight.
In most multi-component Earth system applications, the timekeeping in each component must refer to the same standard calendar in order for the components to properly synchronize. It therefore makes sense to create as few ESMF Calendars as possible, preferably one per application. A typical strategy would be to create a single Calendar at the start of an application, and use that Calendar in all subsequent calls that accept a Calendar, such as ESMF_TimeSet.
The following example shows how to set up an ESMF Calendar.
A Time represents a specific point in time. In order to accommodate the range of time scales in Earth system applications, Times in the ESMF can be specified in many different ways, from years to nanoseconds. The Time interface is designed so that you select one or more options from a list of time units in order to specify a Time. The options for specifying a Time are shown in Table 41.4.
There are Time methods defined for setting and getting a Time, incrementing and decrementing a Time by a TimeInterval, taking the difference between two Times, and comparing Times. Special quantities such as the middle of the month and the day of the year associated with a particular Time can be retrieved. There is a method for returning the Time value as a string in the ISO 8601 format YYYY-MM-DDThh:mm:ss [15].
A Time that is specified in hours, minutes, seconds, or subsecond intervals does not need to be associated with a standard calendar; a Time whose specification includes time units of a day and greater must be. The ESMF representation of a calendar, the Calendar class, is described in Section 42.1. The ESMF_TimeSet method is used to initialize a Time as well as associate it with a Calendar. If a Time method is invoked in which a Calendar is necessary and one has not been set, the ESMF method will return an error condition.
In the ESMF the TimeInterval class is used to represent time periods. This class is frequently used in combination with the Time class. The Clock class, for example, advances model time by incrementing a Time with a TimeInterval.
Times are most frequently used to represent start, stop, and current model times. The following examples show how to create, initialize, and manipulate Time.
For fractional seconds, a signed 64-bit integer will handle a resolution of +/- -1, or +/- 9,223,372,036,854,775,807 parts of a second.
There are TimeInterval methods defined for setting and getting a TimeInterval, for incrementing and decrementing a TimeInterval by another TimeInterval, and for multiplying and dividing TimeIntervals by integers, reals, fractions and other TimeIntervals. Methods are also defined to take the absolute value and negative absolute value of a TimeInterval, and for comparing the length of two TimeIntervals.
The class used to represent time instants in ESMF is Time, and this class is frequently used in operations along with TimeIntervals. For example, the difference between two Times is a TimeInterval.
When a TimeInterval is used in calculations that involve an absolute reference time, such as incrementing a Time with a TimeInterval, calendar dependencies may be introduced. The length of the time period that the TimeInterval represents will depend on the reference Time and the standard calendar that is associated with it. The calendar dependency becomes apparent when, for example, adding a TimeInterval of 1 day to the Time of February 28, 1996, at 4:00pm EST. In a 360 day calendar, the resulting date would be February 29, 1996, at 4:00pm EST. In a no-leap calendar, the result would be March 1, 1996, at 4:00pm EST.
TimeIntervals are used by other parts of the ESMF timekeeping system, such as Clocks (Section 45.1) and Alarms (Section 46.1).
A typical use for a TimeInterval in a geophysical model is representation of the time step by which the model is advanced. Some models change the size of their time step as the model run progresses; this could be done by incrementing or decrementing the original time step by another TimeInterval, or by dividing or multiplying the time step by an integer value. An example of advancing model time using a TimeInterval representation of a time step is shown in Section 45.1.
The following brief example shows how to create, initialize and manipulate TimeInterval.
For fractional seconds, a signed 64-bit integer will handle a resolution of +/- -1, or +/- 9,223,372,036,854,775,807 parts of a second.
The Clock class advances model time and tracks its associated date on a specified Calendar. It stores start time, stop time, current time, previous time, and a time step. It can also store a reference time, typically the time instant at which a simulation originally began. For a restart run, the reference time can be different than the start time, when the application execution resumes.
A user can call the ESMF_ClockSet method and reset the time step as desired.
A Clock also stores a list of Alarms, which can be set to flag events that occur at a specified time instant or at a specified time interval. See Section 46.1 for details on how to use Alarms.
There are methods for setting and getting the Times and Alarms associated with a Clock. Methods are defined for advancing the Clock's current time, checking if the stop time has been reached, reversing direction, and synchronizing with a real clock.
DESCRIPTION:
Specifies the time-stepping direction of a clock. Use with "direction"
argument to methods ESMF_ClockSet() and ESMF_ClockGet().
Cannot be used with method ESMF_ClockCreate(), since it only
initializes a clock in the default forward mode; a clock must be advanced
(timestepped) at least once before reversing direction via
ESMF_ClockSet(). This also holds true for negative timestep clocks
which are initialized (created) with stopTime < startTime, since "forward"
means timestepping from startTime towards stopTime
(see ESMF_DIRECTION_FORWARD below).
"Forward" and "reverse" directions are distinct from positive and negative timesteps. "Forward" means timestepping in the direction established at ESMF_ClockCreate(), from startTime towards stopTime, regardless of the timestep sign. "Reverse" means timestepping in the opposite direction, back towards the clock's startTime, regardless of the timestep sign.
Clocks and alarms run in reverse in such a way that the state of a clock and its alarms after each time step is precisely replicated as it was in forward time-stepping mode. All methods which query clock and alarm state will return the same result for a given timeStep, regardless of the direction of arrival.
The type of this flag is:
type(ESMF_Direction_Flag)
The valid values are:
The following is a typical sequence for using a Clock in a geophysical model.
At initialize:
At run:
At finalize:
The following code example illustrates Clock usage.
The Alarm class identifies events that occur at specific Times or specific TimeIntervals by returning a true value at those times or subsequent times, and a false value otherwise.
DESCRIPTION:
Specifies the characteristics of Alarms that populate
a retrieved Alarm list.
The type of this flag is:
type(ESMF_AlarmList_Flag)
The valid values are:
Alarms are used in conjunction with Clocks (see Section 45.1). Multiple Alarms can be associated with a Clock. During the ESMF_ClockAdvance() method, a Clock iterates over its internal Alarms to determine if any are ringing. Alarms ring when a specified Alarm time is reached or exceeded, taking into account whether the time step is positive or negative. In ESMF_DIRECTION_REVERSE (see Section 45.1), alarms ring in reverse, i.e., they begin ringing when they originally ended, and end ringing when they originally began. On completion of the time advance call, the Clock optionally returns a list of ringing alarms.
Each ringing Alarm can then be processed using Alarm methods for identifying, turning off, disabling or resetting the Alarm.
Alarm methods are defined for obtaining the ringing state, turning the ringer on/off, enabling/disabling the Alarm, and getting/setting associated times.
The following example shows how to set and process Alarms.
The Alarm class is designed as a deep, dynamically allocatable class, based on a pointer type. This allows for both indirect and direct manipulation of alarms. Indirect alarm manipulation is where ESMF_Alarm API methods, such as ESMF_AlarmRingerOff(), are invoked on alarm references (pointers) returned from ESMF_Clock queries such as "return ringing alarms." Since the method is performed on an alarm reference, the actual alarm held by the clock is affected, not just a user's local copy. Direct alarm manipulation is the more common case where alarm API methods are invoked on the original alarm objects created by the user.
For consistency, the ESMF_Clock class is also designed as a deep, dynamically allocatable class.
An additional benefit from this approach is that Clocks and Alarms can be created and used from anywhere in a user's code without regard to the scope in which they were created. In contrast, statically created Alarms and Clocks would disappear if created within a user's routine that returns, whereas dynamically allocated Alarms and Clocks will persist until explicitly destroyed by the user.
ESMF Configuration Management is based on NASA DAO's
Inpak 90 package, a Fortran 90 collection of routines/functions
for accessing Resource Files in ASCII format.The package
is optimized for minimizing formatted I/O, performing all of its
string operations in memory using Fortran intrinsic functions.
The ESMF Configuration Management Package was evolved by Leonid Zaslavsky and Arlindo da Silva from Ipack90 package created by Arlindo da Silva at NASA DAO.
Back in the 70's Eli Isaacson wrote IOPACK in Fortran 66. In June of 1987 Arlindo da Silva wrote Inpak77 using Fortran 77 string functions; Inpak 77 is a vastly simplified IOPACK, but has its own goodies not found in IOPACK. Inpak 90 removes some obsolete functionality in Inpak77, and parses the whole resource file in memory for performance.
A Resource File (RF) is a text file consisting of list of label-value pairs. There is a buffer limit of 256,000 characters for the entire Resource File. Each label is limited to 1,000 characters. Each label should be followed by some data, the value. An example Resource File follows. It is the file used in the example below.
# This is an example Resource File. # It contains a list of <label,value> pairs. # The colon after the label is required. # The values after the label can be an list. # Multiple types are authorized. my_file_names: jan87.dat jan88.dat jan89.dat # all strings constants: 3.1415 25 # float and integer my_favorite_colors: green blue 022 # Or, the data can be a list of single value pairs. # It is simplier to retrieve data in this format: radius_of_the_earth: 6.37E6 parameter_1: 89 parameter_2: 78.2 input_file_name: dummy_input.nc # Or, the data can be located in a table using the following # syntax: my_table_name:: 1000 3000 263.0 925 3000 263.0 850 3000 263.0 700 3000 269.0 500 3000 287.0 400 3000 295.8 300 3000 295.8 ::
Note that the colon after the label is required and that the double colon is required to declare tabular data.
Resource files are intended for random access (except between ::'s in a table definition). This means that order in which a particular label-value pair is retrieved is not dependent upon the original order of the pairs. The only exception to this, however, is when the same label appears multiple times within the Resource File.
The ESMF HConfig class implements a hierarchical configuration facility that is compatible with YAML Ain't Markup Language (YAMLTM). ESMF HConfig can be understood as a Fortran interface to YAML. However, no claim is made that all YAML language features are supported in their entirety.
The purpose of the HConfig class under ESMF is to provide a migration path toward more standard configuration management for ESMF applications. To this end ESMF_HConfig integrates with the traditional ESMF_Config class. Through this integration the traditional Config class API offers basic access to YAML configuration files, in addition to providing backward compatible support of the traditional config file format. This is discussed in more detail in the Config class section. For more complete YAML support, applications are encouraged to migrate to the HConfig API discussed in this section.
DESCRIPTION:
Indicates the level to which two HConfig variables match.
The type of this flag is:
type(ESMF_HConfigMatch_Flag)
The valid values in ascending order are:
The following examples demonstrate how a user typically interacts with the HConfig API. The HConfig class introduces two derived types:
ESMF_HConfig objects can be created explicitly by the user, or they can be accessed from an existing ESMF_Config object, e.g. queried from a Component. They can play a number of roles when interacting with a HConfig hierarchy:
ESMF_HConfigIter objects are iterators, referencing a specific node within the hierarchy. They are created from ESMF_HConfig objects. The iterator approach allows convenient sequential traversal of a particular location in the HConfig hierarchy. There are two flavors of iterators in HConfig: sequence and map iterators. Both are represented by the same ESMF_HConfigIter derived type, and the distinction is made at run-time.
Notice that there are redundancies built into the HConfig API, where different ways are available to achieve the same goal. This is mostly done for convenience, allowing the user to pick the approach most suitable to their needs.
For instance, while it can be convenient to use iterators in some cases, in others, it might be more appropriate to access elements directly by index (for sequences) or key (for maps). Both options are available.
The ESMF HConfig class is implemented on top of YAML-CPP (https://github.com/jbeder/yaml-cpp). A copy of YAML-CPP is included in the ESMF source tree under ./src/prologue/yaml-cpp. It is used by a number of ESMF/NUOPC functions, including HConfig.
The Log class consists of a variety of methods for writing error, warning, and informational messages to files. A default Log is created at ESMF initialization. Other Logs can be created later in the code by the user. Most Log methods take a Log as an optional argument and apply to the default Log when another Log is not specified. A set of standard return codes and associated messages are provided for error handling.
Log provides capabilities to store message entries in a buffer, which is flushed to a file, either when the buffer is full, or when the user calls an ESMF_LogFlush() method. Currently, the default is for the Log to flush after every ten entries. This can easily be changed by using the ESMF_LogSet() method and setting the maxElements property to another value. The ESMF_LogFlush() method is automatically called when the program exits by any means (program completion, halt on error, or when the Log is closed).
The user has the capability to abort the program on conditions such as an error or on a warning by using the ESMF_LogSet() method with the logmsgAbort argument. For example if the logmsgAbort array is set to (ESMF_LOGMSG_ERROR,ESMF_LOGMSG_WARNING), the program will stop on any and all warning or errors. When the logmsgAbort argument is set to ESMF_LOGMSG_ERROR, the program will only abort on errors. Lastly, the user can choose to never abort by using ESMF_LOGMSG_NONE; this is the default.
Log will automatically put the PET number into the Log. Also, the user can either specify ESMF_LOGKIND_SINGLE which writes all the entries to a single Log or ESMF_LOGKIND_MULTI which writes entries to multiple Logs according to the PET number. To distinguish Logs from each other when using ESMF_LOGKIND_MULTI, the PET number (in the format PETx.) will be prepended to the file name where x is the PET number.
Opening multiple log files and writing log messages from all the processors may affect the application performance while running on a large number of processors. For that reason, ESMF_LOGKIND_NONE is provided to switch off the Log capability. All the Log methods have no effect in the ESMF_LOGKIND_NONE mode.
A tracing capability may be enabled by setting the trace flag by using the ESMF_LogSet() method. When tracing is enabled, calls to methods such as ESMF_LogFoundError, ESMF_LogFoundAllocError, and ESMF_LogFoundDeallocError are logged in the default log file. This can result in voluminous output. It is typically used only around areas of code which are being debugged.
Other options that are planned for Log are to adjust the verbosity of output, and to optionally write to stdout instead of file(s).
The valid values are:
DESCRIPTION:
Specifies a single log file, multiple log files (one per PET), or no log files.
The type of this flag is:
type(ESMF_LogKind_Flag)
The valid values are:
DESCRIPTION:
Specifies a message level
The type of this flag is:
type(ESMF_LogMsg_Flag)
The valid values are:
Valid predefined named array constant values are:
By default ESMF_Initialize() opens a default Log in ESMF_LOGKIND_MULTI mode. ESMF handles the initialization and finalization of the default Log so the user can immediately start using it. If additional Log objects are desired, they must be explicitly created or opened using ESMF_LogOpen().
ESMF_LogOpen() requires a Log object and filename argument. Additionally, the user can specify single or multi Logs by setting the logkindflag property to ESMF_LOGKIND_SINGLE or ESMF_LOGKIND_MULTI. This is useful as the PET numbers are automatically added to the Log entries. A single Log will put all entries, regardless of PET number, into a single log while a multi Log will create multiple Logs with the PET number prepended to the filename and all entries will be written to their corresponding Log by their PET number.
By default, the Log file is not truncated at the start of a new run; it just gets appended each time. Future functionality may include an option to either truncate or append to the Log file.
In all cases where a Log is opened, a Fortran unit number is assigned to a specific Log. A Log is assigned an unused unit number using the algorithm described in the ESMF_IOUnitGet() method.
The user can then set or get options on how the Log should be used with the ESMF_LogSet() and ESMF_LogGet() methods. These are partially implemented at this time.
Depending on how the options are set, ESMF_LogWrite() either writes user messages directly to a Log file or writes to a buffer that can be flushed when full or by using the ESMF_LogFlush() method. The default is to flush after every ten entries because maxElements is initialized to ten (which means the buffer reaches its full state after every ten writes and then flushes).
A message filtering option may be set with ESMF_LogSet() so that only selected message types are actually written to the log. One key use of this feature is to allow placing informational log write requests into the code for debugging or tracing. Then, when the informational entries are not needed, the messages at that level may be turned off — leaving only warning and error messages in the logs.
For every ESMF_LogWrite(), a time and date stamp is prepended to the Log entry. The time is given in microsecond precision. The user can call other methods to write to the Log. In every case, all methods eventually make a call implicitly to ESMF_LogWrite() even though the user may never explicitly call it.
When calling ESMF_LogWrite(), the user can supply an optional line, file and method. These arguments can be passed in explicitly or with the help of cpp macros. In the latter case, a define for an ESMF_FILENAME must be placed at the beginning of a file and a define for ESMF_METHOD must be placed at the beginning of each method. The user can then use the ESMF_CONTEXT cpp macro in place of line, file and method to insert the parameters into the method. The user does not have to specify line number as it is a value supplied by cpp.
An example of Log output is given below running with logkindflag property set to ESMF_LOGKIND_MULTI (default) using the default Log:
(Log file PET0.ESMF_LogFile)
20041105 163418.472210 INFO PET0 Running with ESMF Version 2.2.1
(Log file PET1.ESMF_LogFile)
20041105 163419.186153 ERROR PET1 ESMF_Field.F90 812 ESMF_FieldGet No Grid or Bad Grid attached to Field
The first entry shows date and time stamp. The time is given in microsecond precision. The next item shown is the type of message (INFO in this case). Next, the PET number is added. Lastly, the content is written.
The second entry shows something slightly different. In this case, we have an ERROR. The method name (ESMF_Field.F90) is automatically provided from the cpp macros as well as the line number (812). Then the content of the message is written.
When done writing messages, the default Log is closed by calling ESMF_LogFinalize() or ESMF_LogClose() for user created Logs. Both methods will release the assigned unit number.
The properties for a Log are set with the ESMF_LogSet() method and retrieved with the ESMF_LogGet() method.
Additionally, buffering is enabled. Buffering allows ESMF to manage output data streams in a desired way. Writing to the buffer is transparent to the user because all the Log entries are handled automatically by the ESMF_LogWrite() method. All the user has to do is specify the buffer size (the default is ten) by setting the maxElements property. Every time the ESMF_LogWrite() method is called, a LogEntry element is populated with the ESMF_LogWrite() information. When the buffer is full (i.e., when all the LogEntry elements are populated), the buffer will be flushed and all the contents will be written to file. If buffering is not needed, that is maxElements=1 or flushImmediately=ESMF_TRUE, the ESMF_LogWrite() method will immediately write to the Log file(s).
The following is a simplified UML diagram showing the structure of the Log class. See Appendix A, A Brief Introduction to UML, for a translation table that lists the symbols in the diagram and their meaning.
The DELayout class provides an additional layer of abstraction on top of the Virtual Machine (VM) layer. DELayout does this by introducing DEs (Decomposition Elements) as logical resource units. The DELayout object keeps track of the relationship between its DEs and the resources of the associated VM object.
The relationship between DEs and VM resources (PETs (Persistent Execution Threads) and VASs (Virtual Address Spaces)) contained in a DELayout object is defined during its creation and cannot be changed thereafter. There are, however, a number of hint and specification arguments that can be used to shape the DELayout during its creation.
Contrary to the number of PETs and VASs contained in a VM object, which are fixed by the available resources, the number of DEs contained in a DELayout can be chosen freely to best match the computational problem or other design criteria. Creating a DELayout with less DEs than there are PETs in the associated VM object can be used to share resources between decomposed objects within an ESMF component. Creating a DELayout with more DEs than there are PETs in the associated VM object can be used to evenly partition the computation over the available resources.
The simplest case, however, is where the DELayout contains the same number of DEs as there are PETs in the associated VM context. In this case the DELayout may be used to re-label the hardware and operating system resources held by the VM. For instance, it is possible to order the resources so that specific DEs have best available communication paths. The DELayout will map the DEs to the PETs of the VM according to the resource details provided by the VM instance.
Furthermore, general DE to PET mapping can be used to offer computational resources with finer granularity than the VM does. The DELayout can be queried for computational and communication capacities of DEs and DE pairs, respectively. This information can be used to best utilize the DE resources when partitioning the computational problem. In combination with other ESMF classes, general DE to PET mapping can be used to realize cache blocking, communication hiding and dynamic load balancing.
Finally, the DELayout layer offers primitives that allow a work queue style dynamic load balancing between DEs.
DESCRIPTION:
Specifies which VM resource DEs are pinned to (PETs, VASs, SSIs).
The type of this flag is:
type(ESMF_Pin_Flag)
The valid values are:
DESCRIPTION:
Reply when a PET offers to service a DE.
The type of this flag is:
type(ESMF_ServiceReply_Flag)
The valid values are:
The following examples demonstrate how to create, use and destroy DELayout objects.
The DELayout class is a light weight object. It stores the DE to PET and VAS mapping for all DEs within all PET instances and a list of local DEs for each PET instance. The DELayout does not store the computational and communication weights optionally provided as arguments to the create method. These hints are only used during create while they are available in user owned arrays.
The ESMF VM (Virtual Machine) class is a generic representation of hardware and system software resources. There is exactly one VM object per ESMF Component, providing the execution environment for the Component code. The VM class handles all resource management tasks for the Component class and provides a description of the underlying configuration of the compute resources used by a Component.
In addition to resource description and management, the VM class offers the lowest level of ESMF communication methods. The VM communication calls are very similar to MPI. Data references in VM communication calls must be provided as raw, language-specific, one-dimensional, contiguous data arrays. The similarity between VM and MPI communication calls is striking and there are many equivalent point-to-point and collective communication calls. However, unlike MPI, the VM communication calls support communication between threaded PETs in a completely transparent fashion.
Many ESMF applications do not interact with the VM class directly very much. The resource management aspect is wrapped completely transparent into the ESMF Component concept. Often the only reason that user code queries a Component object for the associated VM object is to inquire about resource information, such as the localPet or the petCount. Further, for most applications the use of higher level communication APIs, such as provided by Array and Field, are much more convenient than using the low level VM communication calls.
The basic elements of a VM are called PETs, which stands for Persistent Execution Threads. These are equivalent to OS threads with a lifetime of at least that of the associated component. All VM functionality is expressed in terms of PETs. In the simplest, and most common case, a PET is equivalent to an MPI process. However, ESMF also supports multi-threading, where multiple PETs run as Pthreads inside the same virtual address space (VAS).
The resource management functions of the VM class become visible when a component, or the driver code, creates sub-components. Section 16.4.3 discusses this aspect from the Superstructure perspective and provides links to the relevant Component examples in the documentation.
There are two parts to resource management, the parent and the child. When the parent component creates a child component, the parent VM object provides the resources on which the child is created with ESMF_GridCompCreate() or ESMF_CplCompCreate(). The optional petList argument to these calls limits the resources that the parent gives to a specific child. The child component, may specify - during its optional ESMF_<Grid/Cpl>CompSetVM() method - how it wants to arrange the inherited resources in its own VM. After this, all standard ESMF methods of the Component, including ESMF_<Grid/Cpl>CompSetServices(), will execute in the child VM. Notice that the ESMF_<Grid/Cpl>CompSetVM() routine, although part of the child Component, must execute before the child VM has been started up. It runs in the parent VM context. The child VM is created and started up just before the user-written set services routine, specified as an argument to ESMF_<Grid/Cpl>CompSetServices(), is entered.
DESCRIPTION:
Specifies the kind of VM Epoch being entered.
The type of this flag is:
type(ESMF_VMEpoch_Flag)
The valid values are:
The concept of the ESMF Virtual Machine (VM) is so fundamental to the framework that every ESMF application uses it. However, for many user applications the VM class is transparently hidden behind the ESMF Component concept and higher data classes (e.g. Array, Field). The interaction between user code and VM is often only indirect. The following examples provide an overview of where the VM class can come into play in user code.
The VM class provides an additional layer of abstraction on top of the POSIX machine model, making it suitable for HPC applications. There are four key aspects the VM class deals with.
Definition of terms used in the diagram
The POSIX machine abstraction, while a very powerful concept, needs augmentation when applied to HPC applications. Key elements of the POSIX abstraction are processes, which provide virtually unlimited resources (memory, I/O, sockets, ...) to possibly multiple threads of execution. Similarly POSIX threads create the illusion that there is virtually unlimited processing power available to each POSIX process. While the POSIX abstraction is very suitable for many multi-user/multi-tasking applications that need to share limited physical resources, it does not directly fit the HPC workload where over-subscription of resources is one of the most expensive modes of operation.
ESMF's virtual machine abstraction is based on the POSIX machine model but holds additional information about the available physical processing units in terms of Processing Elements (PEs). A PE is the smallest physical processing unit and encapsulates the hardware details (Cores, CPUs and SSIs).
There is exactly one physical machine layout for each application, and all VM instances have access to this information. The PE is the smallest processing unit which, in today's microprocessor technology, corresponds to a single Core. Cores are arranged in CPUs which in turn are arranged in SSIs. The setup of the physical machine layout is part of the ESMF initialization process.
On top of the PE concept the key abstraction provided by the VM is the PET. All user code is executed by PETs while OS and hardware details are hidden. The VM class contains a number of methods which allow the user to prescribe how the PETs of a desired virtual machine should be instantiated on the OS level and how they should map onto the hardware. This prescription is kept in a private virtual machine plan object which is created at the same time the associated component is being created. Each time component code is entered through one of the component's registered top–level methods (Initialize/Run/Finalize), the virtual machine plan along with a pointer to the respective user function is used to instantiate the user code on the PETs of the associated VM in form of single- or multi-threaded POSIX processes.
The process of starting, entering, exiting and shutting down a VM is very transparent, all spawning and joining of threads is handled by VM methods "behind the scenes". Furthermore, fundamental synchronization and communication primitives are provided on the PET level through a uniform API, hiding details related to the actual instantiation of the participating PETs.
Within a VM object each PE of the physical machine maps to 0 or 1 PETs. Allowing unassigned PEs provides a means to prevent over-subscription between multiple concurrently running virtual machines. Similarly a maximum of one PET per PE prevents over-subscription within a single VM instance. However, over-subscription is possible by subscribing PETs from different virtual machines to the same PE. This type of over-subscription can be desirable for PETs associated with I/O workloads expected to be used infrequently and to block often on I/O requests.
On the OS level each PET of a VM object is represented by a POSIX thread (Pthread) either belonging to a single– or multi–threaded process and maps to at least 1 PE of the physical machine, ensuring its execution. Mapping a single PET to multiple PEs provides resources for user–level multi–threading, in which case the user code inquires how many PEs are associated with its PET and if there are multiple PEs available the user code can spawn an equal number of threads (e.g. OpenMP) without risking over-subscription. Typically these user spawned threads are short-lived and used for fine-grained parallelization in form of TETs. All PEs mapped against a single PET must be part of a unique SSI in order to allow user–level multi–threading!
In addition to discovering the physical machine the ESMF initialization process sets up the default global virtual machine. This VM object, which is the ultimate parent of all VMs created during the course of execution, contains as many PETs as there are PEs in the physical machine. All of its PETs are instantiated in form of single-threaded MPI processes and a 1:1 mapping of PETs to PEs is used for the default global VM.
The VM design and implementation is based on the POSIX process and thread model as well as the MPI-1.2 standard. As a consequence of the latter standard the number of processes is static during the course of execution and is determined at start-up. The VM implementation further requires that the user starts up the ESMF application with as many MPI processes as there are PEs in the available physical machine using the platform dependent mechanism to ensure proper process placement.
All MPI processes participating in a VM are grouped together by means of an MPI_Group object and their context is defined via an MPI_Comm object (MPI intra-communicator). The PET local process id within each virtual machine is equal to the MPI_Comm_rank in the local MPI_Comm context whereas the PET process id is equal to the MPI_Comm_rank in MPI_COMM_WORLD. The PET process id is used within the VM methods to determine the virtual memory space a PET is operating in.
In order to provide a migration path for legacy MPI-applications the VM offers accessor functions to its MPI_Comm object. Once obtained this object may be used in explicit user-code MPI calls within the same context.
ESMF's built in profiling capability collects runtime statistics of an executing ESMF application through both automatic and manual code instrumentation. Timing information for all phases of all ESMF components executing in an application can be automatically collected using the ESMF_RUNTIME_PROFILE environment variable (see below for settings). Additionally, arbitrary user-defined code regions can be timed by manually instrumenting code with special API calls. Timing profiles of component phases and user-defined regions can be output in several different formats:
The following table lists important environment variables that control aspects of ESMF profiling.
Environment Variable | Description | Example Values | Default |
ESMF_RUNTIME_PROFILE | Enable/disables all profiling functions | ON or OFF | OFF |
ESMF_RUNTIME_PROFILE_PETLIST | Limits profiling to an explicit list of PETs | “0-9 50 99” | profile all PETs |
ESMF_RUNTIME_PROFILE_OUTPUT | Controls output format of profiles; multiple can be specified in a space separated list | TEXT, SUMMARY, BINARY | TEXT |
Whereas profiling collects summary information from an application, tracing records a more detailed set of events for later analysis. Trace analysis can be used to understand what happened during a program's execution and is often used for diagnosing problems, debugging, and performance analysis.
ESMF has a built-in tracing capability that records events into special binary log files. Unlike log files written by the ESMF_Log class, which are primarily for human consumption (see Section 49.1), the trace output files are recorded in a compact binary representation and are processed by tools to produce various analyses. ESMF event streams are recorded in the Common Trace Format (CTF). CTF traces include one or more event streams, as well as a metadata file describing the events in the streams.
Several tools are available for reading in the CTF traces output by ESMF. Of the tools listed below, the first one is designed specifically for analyzing ESMF applications and the second two are general purpose tools for working with all CTF traces.
Events that can be captured by the ESMF tracer include the following. Events are recorded with a high-precision timestamp to allow timing analyses.
The following table lists important environment variables that control aspects of ESMF tracing.
Environment Variable | Description | Example Values | Default |
ESMF_RUNTIME_TRACE | Enable/disables all tracing functions | ON or OFF | OFF |
ESMF_RUNTIME_TRACE_CLOCK | Sets the type of clock for timestamping events (see Section 52.2.6). | REALTIME or MONOTONIC or MONOTONIC_SYNC | REALTIME |
ESMF_RUNTIME_TRACE_PETLIST | Limits tracing to an explicit list of PETs | “0-9 50 99” | trace all PETs |
ESMF_RUNTIME_TRACE_COMPONENT | Enables/disable tracing of Component phase_enter and phase_exit events | ON or OFF | ON |
ESMF_RUNTIME_TRACE_FLUSH | Controls frequency of event stream flushing to file | DEFAULT or EAGER | DEFAULT |
ESMF profiling is disabled by default. To profile an application, set the ESMF_RUNTIME_PROFILE variable to ON prior to executing the application. You do not need to recompile your code to enable profiling.
# csh shell $ setenv ESMF_RUNTIME_PROFILE ON # bash shell $ export ESMF_RUNTIME_PROFILE=ON # (from now on, only the csh shell version will be shown)
Then execute the application in the usual way. At the end of the run the profile information will be available at the end of each PET log (if ESMF Logs are turned on) or in a set of separate files, one per PET, with names ESMF_Profile.XXX where XXX is the PET number. Below is an example timing profile. Some regions are left out for brevity.
Region Count Total (s) Self (s) Mean (s) Min (s) Max (s) [esm] Init 1 1 4.0878 0.0341 4.0878 4.0878 4.0878 [OCN-TO-ATM] IPDv05p6b 1 2.6007 2.6007 2.6007 2.6007 2.6007 [ATM-TO-OCN] IPDv05p6b 1 1.4333 1.4333 1.4333 1.4333 1.4333 [ATM] IPDv00p2 1 0.0055 0.0055 0.0055 0.0055 0.0055 [OCN] IPDv00p2 1 0.0023 0.0023 0.0023 0.0023 0.0023 [ATM] IPDv00p1 1 0.0011 0.0011 0.0011 0.0011 0.0011 [OCN] IPDv00p1 1 0.0009 0.0009 0.0009 0.0009 0.0009 [ATM-TO-OCN] IPDv05p3 1 0.0008 0.0008 0.0008 0.0008 0.0008 [ATM-TO-OCN] IPDv05p1 1 0.0008 0.0008 0.0008 0.0008 0.0008 [ATM-TO-OCN] IPDv05p2b 1 0.0007 0.0007 0.0007 0.0007 0.0007 [ATM-TO-OCN] IPDv05p4 1 0.0007 0.0007 0.0007 0.0007 0.0007 [ATM-TO-OCN] IPDv05p2a 1 0.0007 0.0007 0.0007 0.0007 0.0007 [ATM-TO-OCN] IPDv05p5 1 0.0007 0.0007 0.0007 0.0007 0.0007 [OCN-TO-ATM] IPDv05p3 1 0.0006 0.0006 0.0006 0.0006 0.0006 [OCN-TO-ATM] IPDv05p4 1 0.0006 0.0006 0.0006 0.0006 0.0006 [OCN-TO-ATM] IPDv05p2b 1 0.0006 0.0006 0.0006 0.0006 0.0006 [OCN-TO-ATM] IPDv05p2a 1 0.0006 0.0006 0.0006 0.0006 0.0006 [OCN-TO-ATM] IPDv05p5 1 0.0006 0.0006 0.0006 0.0006 0.0006 [OCN-TO-ATM] IPDv05p1 1 0.0005 0.0005 0.0005 0.0005 0.0005 [esm] RunPhase1 1 2.7423 0.9432 2.7423 2.7423 2.7423 [OCN-TO-ATM] RunPhase1 864 0.6094 0.6094 0.0007 0.0006 0.0179 [ATM] RunPhase1 864 0.5296 0.2274 0.0006 0.0005 0.0011 ATM:ModelAdvance 864 0.3022 0.3022 0.0003 0.0003 0.0005 [ATM-TO-OCN] RunPhase1 864 0.3345 0.3345 0.0004 0.0002 0.0299 [OCN] RunPhase1 864 0.3256 0.3256 0.0004 0.0003 0.0010 [esm] FinalizePhase1 1 0.0029 0.0020 0.0029 0.0029 0.0029 [OCN-TO-ATM] FinalizePhase1 1 0.0006 0.0006 0.0006 0.0006 0.0006 [ATM-TO-OCN] FinalizePhase1 1 0.0002 0.0002 0.0002 0.0002 0.0002 [OCN] FinalizePhase1 1 0.0001 0.0001 0.0001 0.0001 0.0001 [ATM] FinalizePhase1 1 0.0000 0.0000 0.0000 0.0000 0.0000
A timed region is either an ESMF component phase (e.g., initialize, run, or finalize) or a user-defined region of code surrounded by calls to ESMF_TraceRegionEnter() and ESMF_TraceRegionExit(). (See section for more information on instrumenting user-defined regions.) Regions are organized hierarchically with sub-regions nested. For example, in the profile above, the [OCN] RunPhase1 is a sub-region of [esm] RunPhase1 and is entirely contained inside that region. Regions with the same name may appear at multiple places in the hierarchy, and so would appear in multiple rows in the table. The statistics in that row apply to that region at that location in the hierarchy. Component names appear in square brackets, e.g., [ATM], [OCN], and [ATM-TO-OCN]. By default, timings are based on elapsed wall clock time and are collected on a per-PET basis. Therefore, regions timings may differ across PETs. Regions are sorted with the most expensive regions appearing at the top. The following describes the meaning of the statistics in each column:
Count | the number of times the region is executed |
Total | the aggregate time spent in the region, inclusive of all sub-regions |
Self | the aggregate time spend in the region, exclusive of all sub-regions |
Mean | the average amount of time for one execution of the region |
Min | time of the fastest execution of the region |
Max | time of the slowest execution of the region |
By default, separate timing profiles are generated for each PET in the application. The per-PET profiles can be aggregated together and output to a single file, ESMF_Profile.summary, by setting the ESMF_RUNTIME_PROFILE_OUTPUT environment variable as follows:
$ setenv ESMF_RUNTIME_PROFILE ON # turn on profiling $ setenv ESMF_RUNTIME_PROFILE_OUTPUT SUMMARY # specify summary output
Note the ESMF_RUNTIME_PROFILE environment variable must also be set to ON since this controls all profiling capabilities. The ESMF_Profile.summary file will contain a tree of timed regions, but aggregated across all PETs. For example:
Region PETs PEs Count Mean (s) Min (s) Min PET Max (s) Max PET [esm] Init 1 4 4 1 4.0880 4.0878 2 4.0883 1 [OCN-TO-ATM] IPDv05p6b 4 4 1 2.6007 2.6007 2 2.6007 3 [ATM-TO-OCN] IPDv05p6b 4 4 1 1.4335 1.4333 0 1.4337 3 [ATM-TO-OCN] IPDv05p4 4 4 1 0.0037 0.0007 0 0.0060 1 [ATM] IPDv00p2 4 4 1 0.0034 0.0020 1 0.0055 0 [ATM-TO-OCN] IPDv05p1 4 4 1 0.0020 0.0007 2 0.0033 3 [OCN] IPDv00p2 4 4 1 0.0019 0.0015 3 0.0024 2 [ATM-TO-OCN] IPDv05p3 4 4 1 0.0010 0.0008 0 0.0013 1 [ATM-TO-OCN] IPDv05p2a 4 4 1 0.0009 0.0007 0 0.0012 3 [ATM] IPDv00p1 4 4 1 0.0009 0.0007 3 0.0011 0 [ATM-TO-OCN] IPDv05p2b 4 4 1 0.0008 0.0007 0 0.0010 3 [ATM-TO-OCN] IPDv05p5 4 4 1 0.0008 0.0007 0 0.0010 3 [ATM-TO-OCN] IPDv05p6a 4 4 1 0.0008 0.0005 2 0.0012 3 [OCN-TO-ATM] IPDv05p3 4 4 1 0.0008 0.0006 2 0.0010 3 [OCN-TO-ATM] IPDv05p4 4 4 1 0.0008 0.0006 0 0.0009 3 [OCN-TO-ATM] IPDv05p2b 4 4 1 0.0007 0.0006 2 0.0009 3 [OCN] IPDv00p1 4 4 1 0.0007 0.0005 1 0.0009 2 [OCN-TO-ATM] IPDv05p2a 4 4 1 0.0007 0.0006 2 0.0009 1 [OCN-TO-ATM] IPDv05p5 4 4 1 0.0007 0.0006 0 0.0009 3 [OCN-TO-ATM] IPDv05p1 4 4 1 0.0006 0.0005 0 0.0008 1 [OCN-TO-ATM] IPDv05p6a 4 4 1 0.0006 0.0004 2 0.0007 1 [esm] RunPhase1 4 4 1 2.7444 2.7423 0 2.7454 1 [OCN-TO-ATM] RunPhase1 4 4 864 0.6123 0.6004 2 0.6244 1 [ATM] RunPhase1 4 4 864 0.5386 0.5296 0 0.5530 1 ATM:ModelAdvance 4 4 864 0.3038 0.3022 0 0.3065 1 [OCN] RunPhase1 4 4 864 0.3471 0.3256 0 0.3824 1 [ATM-TO-OCN] RunPhase1 4 4 864 0.2843 0.1956 1 0.3345 0 [esm] FinalizePhase1 4 4 1 0.0029 0.0029 1 0.0030 2 [OCN-TO-ATM] FinalizePhase1 4 4 1 0.0007 0.0006 0 0.0008 3 [ATM-TO-OCN] FinalizePhase1 4 4 1 0.0002 0.0001 3 0.0002 1 [OCN] FinalizePhase1 4 4 1 0.0001 0.0001 3 0.0001 0 [ATM] FinalizePhase1 4 4 1 0.0001 0.0000 0 0.0001 2
The meaning of the statistics in each column in as follows:
PETs | the number of reporting PETs that executed the region |
PEs | the number of PEs associated with the reporting PETs that executed the region |
Count | the number of times each reporting PET executed the region or “MULTIPLE” if not all PETs executed the region the same number of times |
Mean | the mean across all reporting PETs of the total time spent in the region |
Min | the minimum across all reporting PETs of the total time spent in the region |
Min PET | the PET that reported the minimum time |
Max | the maximum across all reporting PETs of the total time spent in the region |
Max PET | the PET that reported the maximum time |
Note that setting the ESMF_RUNTIME_PROFILE_PETLIST environment variable (described below) may reduce the number of reporting PETs. Only reporting PETs are included in the summary profile. To output both the per-PET and summary timing profiles, set the ESMF_RUNTIME_PROFILE_OUTPUT environment variable as follows:
$ setenv ESMF_RUNTIME_PROFILE_OUTPUT "TEXT SUMMARY"
By default, all PETs in an application are profiled. It may be desirable to only profile a subset of PETs to reduce the amount of output. An explicit list of PETs can be specified by setting the ESMF_RUNTIME_PROFILE_PETLIST environment variable. The syntax of this environment variable is to list PET numbers separated by spaces. PET ranges are also supported using the “X-Y” syntax where X < Y. For example:
# only profile PETs 0, 20, and 35 through 39 $ setenv ESMF_RUNTIME_PROFILE_PETLIST "0 20 35-39"
When used in conjunction with the SUMMARY option above, the summarized profile will only aggregate over the specified set of PETs. The one exception is that PET 0 is always profiled if ESMF_RUNTIME_PROFILE=ON, regardless of the ESMF_RUNTIME_TRACE_PETLIST setting.
MPI functions can be included in the timing profile to indicate how much time is spent inside communication calls. This can also help to determine load imbalance in the system, since large times spent inside MPI may indicate that communication between PETs is not tightly synchronized. This option includes all MPI calls in the application, whether or not they originate from the ESMF library. Here is a partial example summary profile that contains MPI times:
Region PETs Count Mean (s) Min (s) Min PET Max (s) Max PET [esm] RunPhase1 8 1 4.9307 4.6867 0 4.9656 1 [OCN] RunPhase1 8 1824 0.8344 0.8164 0 0.8652 1 [MED] RunPhase1 8 1824 0.8203 0.7900 5 0.8584 1 [ATM] RunPhase1 8 1824 0.6387 0.6212 5 0.6610 1 [ATM-TO-MED] RunPhase1 8 1824 0.5975 0.5317 0 0.6583 5 MPI_Bcast 8 1824 0.0443 0.0025 4 0.1231 5 MPI_Wait 8 MULTIPLE 0.0421 0.0032 0 0.0998 2 [MED-TO-OCN] RunPhase1 8 1824 0.4879 0.4497 0 0.5362 4 MPI_Wait 8 MULTIPLE 0.0234 0.0030 0 0.0821 4 MPI_Bcast 8 1824 0.0111 0.0024 4 0.0273 5 [OCN-TO-MED] RunPhase1 8 1824 0.4541 0.4075 0 0.4918 4 MPI_Wait 8 MULTIPLE 0.0339 0.0017 0 0.0824 4 MPI_Bcast 8 1824 0.0194 0.0026 4 0.0452 6 [MED-TO-ATM] RunPhase1 8 1824 0.4487 0.4005 0 0.4911 5 MPI_Bcast 8 1824 0.0338 0.0026 4 0.0942 5 MPI_Wait 8 MULTIPLE 0.0241 0.0022 1 0.0817 2 [esm] Init 1 8 1 0.6287 0.6287 1 0.6287 4 [ATM-TO-MED] IPDv05p6b 8 1 0.1501 0.1500 1 0.1501 2 MPI_Barrier 8 242 0.0082 0.0006 3 0.0157 7 MPI_Wait 8 MULTIPLE 0.0034 0.0010 0 0.0053 7 MPI_Allreduce 8 62 0.0030 0.0003 3 0.0063 7 MPI_Alltoall 8 6 0.0015 0.0000 1 0.0022 5 MPI_Allgather 8 21 0.0010 0.0002 1 0.0017 7 MPI_Waitall 8 MULTIPLE 0.0006 0.0001 3 0.0015 7 MPI_Send 8 MULTIPLE 0.0004 0.0001 7 0.0008 6 MPI_Allgatherv 8 6 0.0001 0.0001 4 0.0001 0 MPI_Scatter 8 5 0.0000 0.0000 0 0.0000 7 MPI_Reduce 8 5 0.0000 0.0000 1 0.0000 0 MPI_Recv 8 MULTIPLE 0.0000 0.0000 0 0.0000 3 MPI_Bcast 8 1 0.0000 0.0000 0 0.0000 7
The procedure for including MPI functions in the timing profile depends on whether the application is dynamically or statically linked. Most applications are dynamically linked, however on some systems (such as Cray), static linking may be used. Note that for either option, ESMF must be built with ESMF_TRACE_LIB_BUILD=ON, which is the default.
In dynamically linked applications, the LD_PRELOAD (Linux) or DYLD_INSERT_LIBRARIES (Darwin) environment variable must be used when executing the MPI application. This instructs the dynamic loader to interpose certain MPI symbols so they can be captured by the ESMF profiler. To simplify this process, a script is provided at $(ESMF_INSTALL_LIBDIR)/preload.sh that sets the appropriate variable.
For example, if you typically execute your application as as follows:
$ mpirun -np 8 ./myApp
then you should add the preload.sh script in front of the executable when starting the application as follows:
# replace $(ESMF_INSTALL_LIBDIR) with absolute path # ... to the ESMF installation lib directory $ mpirun -np 8 $(ESMF_INSTALL_LIBDIR)/preload.sh ./myApp
An advantage of this approach is that your application does not need to be recompiled. The MPI timing information will be included in the per-PET profiles and/or the summary profile, depending on the setting of environment variable ESMF_RUNTIME_PROFILE_OUTPUT.
Notice that an additional step is required for dynamically linked applications on Darwin systems with System Integrity Protection (SIP) enabled! In addition to using the $(ESMF_INSTALL_LIBDIR)/preload.sh script during launching of the executable as shown above, the executable must also be linked against the dynamic ESMF trace preload library. This must happen during the link step of the executable. It is most easily accomplished by using variable $(ESMF_F90ESMFPRELOADLINKLIBS) instead of the typical $(ESMF_F90ESMFLINKLIBS) variable for the final link command. Both variables are defined in the esmf.mk file that should be imported by the application Makefile. For example:
# import esmf.mk include $(ESMFMKFILE) # other makefile targets here... # example final link command, with $(ESMF_F90ESMFPRELOADLINKLIBS) myApp: myApp.o driver.o model.o $(ESMF_F90LINKER) $(ESMF_F90LINKOPTS) $(ESMF_F90LINKPATHS) \ $(ESMF_F90LINKRPATHS) -o $@ $^ $(ESMF_F90ESMFPRELOADLINKLIBS)
In statically linked applications, the application must be re-linked with specific options provided to the linker. These options instruct the linker to wrap the MPI symbols with the ESMF profiling functions. The linking flags that must be provided are included in the esmf.mk Makefile fragment that is part of the ESMF installation. These link flags should be imported into your application Makefile, and included in the final link command. To do this, first import the esmf.mk file into your application Makefile. The path to this file is typically stored in the ESMFMKFILE environment variable. Then, pass the variables $(ESMF_TRACE_STATICLINKOPTS) and $(ESMF_TRACE_STATICLINKLIBS) to the final linking command. For example:
# import esmf.mk include $(ESMFMKFILE) # other makefile targets here... # example final link command, with $(ESMF_TRACE_STATICLINKOPTS) # ... and $(ESMF_TRACE_STATICLINKLIBS) added myApp: myApp.o driver.o model.o $(ESMF_F90LINKER) $(ESMF_F90LINKOPTS) $(ESMF_F90LINKPATHS) \ $(ESMF_F90LINKRPATHS) -o $@ $^ $(ESMF_F90ESMFLINKLIBS) \ $(ESMF_TRACE_STATICLINKOPTS) $(ESMF_TRACE_STATICLINKLIBS)
This option will statically wrap all of the MPI functions and include them in the profile output. Execute the application in the normal way with the environment variable ESMF_RUNTIME_PROFILE set to ON. You will see the MPI functions included in the timing profile.
ESMF tracing is disabled by default. To enable tracing, set the ESMF_RUNTIME_TRACE environment variable to ON. You do not need to recompile your code to enable tracing.
# csh shell $ setenv ESMF_RUNTIME_TRACE ON # bash shell $ export ESMF_RUNTIME_TRACE=ON
When enabled, the default behavior is to trace all PETs of the ESMF application. Although the ESMF tracer is designed to write events in a compact form, tracing can produce an extremely large number of events depending on the total number of PETs and the length of the run. To reduce output, it is possible to restrict the PETs that produce trace output by setting the ESMF_RUNTIME_TRACE_PETLIST environment variable. For example, this setting:
$ setenv ESMF_RUNTIME_TRACE_PETLIST "0 101 192-196"
will instruct the tracer to only trace PETs 0, 101, and 192 through 196 (inclusive). The syntax of this environment variable is to list PET numbers separated by spaces. PET ranges are also supported using the “X-Y” syntax where X < Y. For PET counts greater than 100, it is recommended to set this environment variable. The one exception is that PET 0 is always traced, regardless of the ESMF_RUNTIME_TRACE_PETLIST setting.
ESMF's profiling and tracing options can be used together. A typical use would be to set ESMF_RUNTIME_PROFILE=ON for all PETs to capture summary timings, and set ESMF_RUNTIME_TRACE=ON and ESMF_RUNTIME_TRACE_PETLIST to a subset of of PETs, such as the root PET of each ESMF component. This helps to keep trace sizes small while still providing timing summaries over all PETs.
When tracing is enabled, phase_enter and phase_exit events will automatically be recorded for all initialize, run, and finalize phases of all Components in the application. To trace only user-instrumented regions (via the ESMF_TraceRegionEnter() and ESMF_TraceRegionExit() calls), Component-level tracing can be turned off by setting:
$ setenv ESMF_RUNTIME_TRACE_COMPONENT OFF
After running an ESMF application with tracing enabled, a directory called traceout will be created in the run directory and it will contain a metadata file and an event stream file esmf_stream_XXXX for each PET with tracing enabled. Together these files form a valid CTF trace which may be analyzed with any of the tools listed above.
Trace events are flushed to file at a regular interval. If the application crashes, some of the most recent events may not be flushed to file. To maximize the number of events appearing in the trace, an option is available to flush events to file more frequently. Because this option may have negative performance implications due to increased file I/O, it is not recommended unless needed. To turn on eager flushing use:
$ setenv ESMF_RUNTIME_TRACE_FLUSH EAGER
There are three options for the kind of clock to use to timestamp events when profiling/tracing an application. These options are controlled by setting the environment variable ESMF_RUNTIME_TRACE_CLOCK.
REALTIME | The REALTIME clock timestamps events with the current time on the system. This is the default clock if the above environment variable is not set. This setting can be useful when tracing PETs that span multiple physical computing nodes assuming that the system clocks on each node are adequately synchronized. On most HPC systems, system clocks are periodically updated to stay in sync. A disadvantage of this clock is that periodic adjustments mean the clock is not monotonically increasing so some timings may be inaccurate if the system clock jumps forward or backward significantly. Testing has shown that this is not typically an issue on most systems. |
MONOTONIC | The MONOTONIC clock is guaranteed to be monotonically increasing and does not suffer from periodic adjustments. The timestamps represent an amount of time since some arbitrary point in the past. There is no guarantee that these timestamps will be synchronized across physical computing nodes, so this option should only be used for tracing a set of PETs running on a single physical machine. |
MONOTONIC_SYNC | The MONOTONIC_SYNC clock is similar to the MONOTONIC clock in that it is guaranteed to be monotonically increasing. In addition, at application startup, all PET clocks are synchronized to a common time by determining a PET-local offset to be applied to timestamps. Therefore this option can be used to compare trace streams across physical nodes. |
The ESMF Fortran I/O and System utilities provide portable methods to access capabilities which are often implemented in different ways amongst different environments. These utility methods are divided into three groups: command line access, Fortran I/O, and sorting.
Command line arguments may be accessed using three methods: ESMF_UtilGetArg() returns a given command line argument, ESMF_UtilGetArgC() returns a count of the number of command line arguments available. Finally, the ESMF_UtilGetArgIndex() method returns the index of a desired argument value, given its keyword name.
Two I/O methods are implemented: ESMF_IOUnitGet(), to obtain an unopened Fortran unit number within the range of unit numbers that ESMF is allowed to use, and ESMF_IOUnitFlush() to flush the I/O buffer associated with a specific Fortran unit.
Finally, the ESMF_UtilSort() method sorts integer, floating point, and character string data types in either ascending or descending order.
call ESMF_UtilIOUnitGet (unit=grid_unit, rc=rc) open (unit=grid_unit, file='grid_data.dat', status='old', action='read')
By default, unit numbers between 50 and 99 are scanned to find an unopened unit number.
Internally, ESMF also uses ESMF_UtilIOUnitGet() when it needs to open Fortran unit numbers for file I/O. By using the same API for both user and ESMF code, unit number collisions can be avoided.
When integrating ESMF into an application where there are conflicts with other uses of the same unit number range, such as when hard-coded unit number values are used, an alternative unit number range can be specified. The ESMF_Initialize() optional arguments IOUnitLower and IOUnitUpper may be set as needed. Note that IOUnitUpper must be set to a value higher than IOUnitLower, and that both must be non-negative. Otherwise ESMF_Initialize will return a return code of ESMF_FAILURE. ESMF itself does not typically need more than about five units for internal use.
call ESMF_Initialize (..., IOUnitLower=120, IOUnitUpper=140)
All current Fortran environments have preconnected unit numbers, such as units 5 and 6 for standard input and output, in the single digit range. So it is recommended that the unit number range is chosen to begin at unit 10 or higher to avoid these preconnected units.
Fortran run-time libraries generally use buffering techniques to improve I/O performance. However output buffering can be problematic when output is needed, but is “trapped” in the buffer because it is not full. This is a common occurrance when debugging a program, and inserting WRITE statements to track down the bad area of code. If the program crashes before the output buffer has been flushed, the desired debugging output may never be seen — giving a misleading indication of where the problem occurred. It would be desirable to ensure that the output buffer is flushed at predictable points in the program in order to get the needed results. Likewise, in parallel code, predictable flushing of output buffers is a common requirement, often in conjunction with ESMF_VMBarrier() calls.
The ESMF_UtilIOUnitFlush() API is provided to flush a unit as desired. Here is an example of code which prints debug values, and serializes the output to a terminal in PET order:
type(ESMF_VM) :: vm integer :: tty_unit integer :: me, npets call ESMF_Initialize (vm=vm, rc=rc) call ESMF_VMGet (vm, localPet=me, petCount=npes) call ESMF_UtilIOUnitGet (unit=tty_unit) open (unit=tty_unit, file='/dev/tty', status='old', action='write') ... call ESMF_VMBarrier (vm=vm) do, i=0, npets-1 if (i == me) then write (tty_unit, *) 'PET: ', i, ', values are: ', a, b, c call ESMF_UtilIOUnitFlush (unit=tty_unit) end if call ESMF_VMBarrier (vm=vm) end do
When ESMF needs to open a Fortran I/O unit, it calls ESMF_IOUnitGet() to find an unopened unit number. As delivered, the range of unit numbers that are searched are between ESMF_LOG_FORTRAN_UNIT_NUMBER (normally set to 50), and ESMF_LOG_UPPER (normally set to 99.) Unopened unit numbers are found by using the Fortran INQUIRE statement.
When integrating ESMF into an application where there are conflicts with other uses of the same unit number range, an alternative range can be specified in the ESMF_Initialize() call by setting the IOUnitLower and IOUnitUpper arguments as needed. ESMF_IOUnitGet() will then search the alternate range of unit numbers. Note that IOUnitUpper must be set to a value higher than IOUnitLower, and that both must be non-negative. Otherwise ESMF_Initialize will return a return code of ESMF_FAILURE.
Fortran unit numbers are not standardized in the Fortran 90 Standard. The standard only requires that they be non-negative integers. But other than that, it is up to the compiler writers and application developers to provide and use units which work with the particular implementation. For example, units 5 and 6 are a defacto standard for “standard input” and “standard output” — even though this is not specified in the actual Fortran standard. The Fortran standard also does not specify which unit numbers can be used, nor does it specify how many can be open simultaneously.
Since all current compilers have preconnected unit numbers, and these are typically found on units lower than 10, it is recommended that applications use unit numbers 10 and higher.
When ESMF needs to flush a Fortran unit, the ESMF_IOUnitFlush() API is used to centralize the file flushing capability, because Fortran has not historically had a standard mechanism for flushing output buffers. Most compilers run-time libraries support various library extensions to provide this functionality — though, being non-standard, the spelling and number of arguments vary between implementations. Fortran 2003 also provides for a FLUSH statement which is built into the language. When possible, ESMF_IOUnitFlush() uses the F2003 FLUSH statement. With older compilers, the appropriate library call is made.
The ESMF_UtilSort() algorithms are the same as those in the LAPACK sorting procedures SLASRT() and DLASRT(). Two algorithms are used. For small sorts, arrays with 20 or fewer elements, a simple Insertion sort is used. For larger sorts, a Quicksort algorithm is used.
Compared to the original LAPACK code, a full Fortran 90 style interface is supported for ease of use and enhanced compile time checking. Additional support is also provided for integer and character string data types.