5 Infrastructure: Utilities

39 Overview of Infrastructure Utility Classes

The ESMF utilities are a set of tools for quickly assembling modeling applications.

The ESMF Info class enables models to be self-describing via metadata, which are instances of JSON-compatible key-value pairs.

The Time Management Library provides utilities for time and time interval representation and calculation, and higher-level utilities that control model time stepping, via clocks, as well as alarming.

The ESMF Config class provides configuration management based on NASA DAO's Inpak package, a collection of methods for accessing files containing input parameters stored in an ASCII format.

The ESMF LogErr class consists of a variety of methods for writing error, warning, and informational messages to log files. A default Log is created during ESMF initialization. Other Logs can be created later in the code by the user.

The DELayout class provides a layer of abstraction on top of the Virtual Machine (VM) layer. DELayout does this by introducing DEs (Decomposition Elements) as logical resource units. The DELayout object keeps track of the relationship between its DEs and the resources of the associated VM object. A DELayout can be shaped by the user at creation time to best match the computational problem or other design criteria.

The ESMF VM (Virtual Machine) class is a generic representation of hardware and system software resources. There is exactly one VM object per ESMF Component, providing the execution environment for the Component code. The VM class handles all resource management tasks for the Component class and provides a description of the underlying configuration of the compute resources used by a Component. In addition to resource description and management, the VM class offers the lowest level of ESMF communication methods.

The ESMF Fortran I/O utilities provide portable methods to access capabilities which are often implemented in different ways amongst different environments. Currently, two utility methods are implemented: one to find an unopened unit number, and one to flush an I/O buffer.

40 Info Class (Object Attributes)

All ESMF base objects (i.e. Array, ArrayBundle, Field, FieldBundle, Grid, Mesh, DistGrid) contain a key-value attribute storage object called ESMF_Info. ESMF_Info objects may also be created independent of a base object. ESMF_Info supports setting and getting key-value pairs where the key is a string and the value is a scalar or a list of common data types. An ESMF_Info object may have a flat or nested data structure. The purpose of ESMF_Info is to support I/O-compatible metadata structures (i.e. netCDF), internal record-keeping for model execution (NUOPC), and provide a mechanism for custom user metadata attributes.

ESMF_Info is designed for interoperability. To achieve this goal, ESMF_Info adopted the JSON (Javascript Object Notation) specification. Internally, ESMF_Info uses JSON for Modern C++ [1] to manage its storage map. There are numerous resources for JSON on the web [6]. Quoting from the json.org site [6] when it introduces the format:

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language. JSON is built on two structures:

A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.

By adopting JSON compliance for ESMF_Info, ESMF made its core metadata capabilities explicitly interoperable with a widely used data structure. If data may be represented with JSON, then it is compatible with ESMF_Info.

There are some aspects of the ESMF_Info implementation related to JSON and JSON for Modern C++ that should be noted:

JSON supports 64-bit data types for integers and reals ([3], [2]). I4/R4 is converted to I8/R8 and vice versa. ESMF_Info internally tracks 32-bit sets to ensure the data type may be appropriately queried.
The memory overhead per JSON object (e.g. a key-value pair) requires an additional allocator pointer for type generalization [5]. Hence, the JSON map is not suited for big data storage, offering flexibility in exchange.
Keys are stored in an unordered map sorted in lexicographical order.

40.1 Migrating from Attribute

The ESMF_Info class is a replacement for the ESMF_Attribute class and is the preferred way of managing metadata attributes in ESMF moving forward. It is recommended that users migrate existing ESMF_Attribute calls to the new ESMF_Info API. The ESMF_Info class provides the backend for ESMF_Attribute since ESMF version 8.1. The ESMF_Attribute docs are located in appendix 57. In practice, users should experience no friction when migrating client code. Please email ESMF support in the case of a migration issue. Some structural changes to ESMF_Attribute did occur:

Changed behavior when getting fixed-size lists. List size in storage must match the size of the outgoing list.
Removed ability to use a default value with list gets.
Removed attPackInstanceName from all interfaces.
Removed attcopyFlag from all interfaces.
Removed ESMF_Attribute-managed object linking.
Modified ESMF_AttributeAdd to set the target key to a null JSON value.
Modified ESMF_AttributeSet to not require an attribute added to an ESMF_AttPack be added through ESMF_AttributeAdd before setting.
Removed support for attribute XML I/O.
Removed ability to add multiple nested Attribute packages.
Removed retrieval of "internal" ESMF object Attributes.

Below are examples for setting and getting an attribute using ESMF_Info and the legacy ESMF_Attribute. The ESMF_Info interfaces are not overloaded for ESMF object types but rather work off a handle retrieved via a get call.

40.1.1 Setting an Attribute

With ESMF_Attribute:

call ESMF_AttributeSet(array, "aKey", 15, rc=rc)

With ESMF_Info:

call ESMF_InfoGetFromHost(array, info, rc=rc)
call ESMF_InfoSet(info, "aKey", 15, rc=rc)

Notice that the legacy ESMF_Attribute API expects the usage of what was called an "Attribute Package". This essentially corresponds to a namespace similar to what ESMF_Info provides for keys via the JSON Pointer syntax (see 40.2). In the above ESMF_AttributeSet() call, without specification of convention and purpose arguments, the resulting JSON pointer of the key is "/ESMF/General/aKey". This is important to account for when mixing deprecated ESMF_Attribute calls with the ESMF_Info API.

40.1.2 Getting an Attribute

With ESMF_Attribute:

call ESMF_AttributeGet(array, "aKey", aKeyValue, rc=rc)

With ESMF_Info:

call ESMF_InfoGetFromHost(array, info, rc=rc)
call ESMF_InfoGet(info, "aKey", aKeyValue, rc=rc)

Notice again that the ESMF_Attribute API automatically prepends "/ESMF/General/" to the JSON pointer used for key in the absence of convention and purpose arguments.

40.2 Key Format Overview

A key in the ESMF_Info interface provides the location of a value to retrieve from the key-value storage. Keys in the ESMF_Info class use the JSON Pointer syntax [4]. A forward slash is prepended to string keys if it does not exist. Hence, "aKey" and "/aKey" are equivalent. Note the indexing aspect of the JSON Pointer syntax is not supported.

Some examples for valid "key" arguments:

altitude :: A simple key argument with no nesting.
/altitude :: A simple key argument with no nesting with the prepended pointer forward slash.
/altitude/height_above_mean_sea_level :: A key for an attribute "height_above_mean_sea_level" nested in a map identified with key "altitude".

40.3 Usage and Examples

40.4 Class API

41 Time Manager Utility

The ESMF Time Manager utility includes software for time and date representation and calculations, model time advancement, and the identification of unique and periodic events. Since multi-component geophysical applications often require synchronization across the time management schemes of the individual components, the Time Manager's standard calendars and consistent time representation promote component interoperability.

Key Features

Drift-free timekeeping through an integer-based internal time representation. Both integers and reals can be specified at the interface.

The ability to represent time as a rational fraction, to support exact timekeeping in applications that involve grid refinement.

Support for many calendar kinds, including user-customized calendars.

Support for both concurrent and sequential modes of component execution.

Support for varying and negative time steps.

41.1 Time Manager Classes

There are five ESMF classes that represent time concepts:

Calendar A Calendar can be used to keep track of the date as an ESMF Gridded Component advances in time. Standard calendars (such as Gregorian and 360-day) and user-specified calendars are supported. Calendars can be queried for quantities such as seconds per day, days per month, and days per year.
Time A Time represents a time instant in a particular calendar, such as November 28, 1964, at 7:31pm EST in the Gregorian calendar. The Time class can be used to represent the start and stop time of a time integration.
TimeInterval TimeIntervals represent a period of time, such as 300 milliseconds. Time steps can be represented using TimeIntervals.
Clock Clocks collect the parameters and methods used for model time advancement into a convenient package. A Clock can be queried for quantities such as start time, stop time, current time, and time step. Clock methods include incrementing the current time, and determining if it is time to stop.
Alarm Alarms identify unique or periodic events by “ringing” - returning a true value - at specified times. For example, an Alarm might be set to ring on the day of the year when leaves start falling from the trees in a climate model.

$\includegraphics{TimeMgr_desc}$

In the remainder of this section, we briefly summarize the functionality that the Time Manager classes provide. Detailed descriptions and usage examples precede the API listing for each class.

41.2 Calendar

An ESMF Calendar can be queried for seconds per day, days per month and days per year. The flexible definition of Calendars allows them to be defined for planetary bodies other than Earth. The set of supported calendars includes:

Gregorian: The standard Gregorian calendar.
no-leap: The Gregorian calendar with no leap years.
Julian: The standard Julian date calendar.
Julian Day: The standard Julian days calendar.
Modified Julian Day: The Modified Julian days calendar.
360-day: A 30-day-per-month, 12-month-per-year calendar.
no calendar: Tracks only elapsed model time in hours, minutes, seconds.

See Section 42.1 for more details on supported standard calendars, and how to create a customized ESMF Calendar.

41.3 Time Instants and TimeIntervals

TimeIntervals and Time instants (simply called Times) are the computational building blocks of the Time Manager utility. TimeIntervals support operations such as add, subtract, compare size, reset value, copy value, and subdivide by a scalar. Times, which are moments in time associated with specific Calendars, can be incremented or decremented by TimeIntervals, compared to determine which of two Times is later, differenced to obtain the TimeInterval between two Times, copied, reset, and manipulated in other useful ways. Times support a host of different queries, both for values of individual Time components such as year, month, day, and second, and for derived values such as day of year, middle of current month and Julian day. It is also possible to retrieve the value of the hardware realtime clock in the form of a Time. See Sections 43.1 and 44.1, respectively, for use and examples of Times and TimeIntervals.

Since climate modeling, numerical weather prediction and other Earth and space applications have widely varying time scales and require different sorts of calendars, Times and TimeIntervals must support a wide range of time specifiers, spanning nanoseconds to years. The interfaces to these time classes are defined so that the user can specify a time using a combination of units selected from the list shown in Table 41.4.

41.4 Clocks and Alarms

Although it is possible to repeatedly step a Time forward by a TimeInterval using arithmetic on these basic types, it is useful to identify a higher-level concept to represent this function. We refer to this capability as a Clock, and include in its required features the ability to store the start and stop times of a model run, to check when time advancement should cease, and to query the value of quantities such as the current time and the time at the previous time step. The Time Manager includes a class with methods that return a true value when a periodic or unique event has taken place; we refer to these as Alarms. Applications may contain temporary or multiple Clocks and Alarms. Sections 45.1 and 46.1 describe the use of Clocks and Alarms in detail.

**Table 3:** Specifiers for Times and TimeIntervals
Unit	Meaning
<yy\|yy_i8>	Year.
mm	Month of the year.
dd	Day of the month.
<d\|d_i8\|d_r8>	Julian or Modified Julian day.
<h\|h_r8>	Hour.
<m\|m_r8>	Minute.
<s\|s_i8\|s_r8>	Second.
<ms\|ms_r8>	Millisecond.
<us\|us_r8>	Microsecond.
<ns\|ns_r8>	Nanosecond.
O	Time zone offset in integer number of hours and minutes.
<sN\|sN_i8>	Numerator for times of the form s $+ \frac{{\rm sN}}{{\rm sD}}$ , where s is seconds and s, sN, and sD are integers. This format provides a mechanism for supporting exact behavior.
<sD\|sD_i8	Denominator for times of the form s $+ \frac{{\rm sN}}{{\rm sD}}$ , where s is seconds and s, sN, and sD are integers.

41.5 Design and Implementation Notes

Base TimeIntervals and Times on the same integer representation. It is useful to allow both TimeIntervals and Times to inherit from a single class, BaseTime. In C++, this can be implemented by using inheritance. In Fortran, it can be implemented by having the derived types TimeIntervals and Times contain a derived type BaseTime. In both cases, the BaseTime class can be made private and invisible to the user.
The result of this strategy is that Time Intervals and Times gain a consistent core representation of time as well a set of basic methods.
The BaseTime class can be designed with a minimum number of elements to represent any required time. The design is based on the idea used in the real-time POSIX 1003.1b-1993 standard. That is, to represent time simply as a pair of integers: one for seconds (whole) and one for nanoseconds (fractional). These can then be converted at the interface level to any desired format.
For ESMF, this idea can be modified and extended, in order to handle the requirements for a large time range (> 200,000 years) and to exactly represent any rational fraction, not just nanoseconds. To handle the large time range, a 64-bit or greater integer is used for whole seconds. Any rational fractional second is expressed using two additional integers: a numerator and a denominator. Both the whole seconds and fractional numerator are signed to handle negative time intervals and instants. For arithmetic consistency both must carry the same sign (both positive or both negative), except, of course, for zero values. The fractional seconds element (numerator) is bounded with respect to whole seconds. If the absolute value of the numerator becomes greater than or equal to the denominator, whole seconds are incremented or decremented accordingly and the numerator is reset to the remainder. Conversions are performed upon demand by interface methods within the TimeInterval and Time classes. This is done because different applications require different representations of time intervals and time instances. Floating point values as well as integers can be specified for the various time units in the interfaces, see Table 41.4. Floating point values are represented internally as integer-based rational fractions.
The BaseTime class defines increment and decrement methods for basic TimeInterval calculations between Time instants. It is done here rather than in the Calendar class because it can be done with simple second-based arithmetic that is calendar independent.
Comparison methods can also be defined in the BaseTime class. These perform equality/inequality, less than, and greater than comparisons between any two TimeIntervals or Times. These methods capture the common comparison logic between TimeIntervals and Times and hence are defined here for sharing.
The Time class depends on a calendar. The Time class contains an internal Calendar class. Upon demand by a user, the results of an increment or decrement operation are converted to user units, which may be calendar-dependent, via methods obtained from their internal Calendar.

41.6 Object Model

The following is a simplified UML diagram showing the structure of the Time Manager utility. See Appendix A, A Brief Introduction to UML, for a translation table that lists the symbols in the diagram and their meaning.

$\includegraphics{TimeMgr_obj}$

42 Calendar Class

42.1 Description

The Calendar class represents the standard calendars used in geophysical modeling: Gregorian, Julian, Julian Day, Modified Julian Day, no-leap, 360-day, and no-calendar. It also supports a user-customized calendar. Brief descriptions are provided for each calendar below. For more information on standard calendars, see [20] and [17].

42.2 Constants

42.2.1 ESMF_CALKIND

DESCRIPTION:
Supported calendar kinds.

The type of this flag is:

type(ESMF_CalKind_Flag)

The valid values are:

ESMF_CALKIND_360DAY

Valid range: machine limits
In the 360-day calendar, there are 12 months, each of which has 30 days. Like the no-leap calendar, this is a simple approximation to the Gregorian calendar sometimes used by modelers.

ESMF_CALKIND_CUSTOM

Valid range: machine limits
The user can set calendar parameters in the generic calendar.

ESMF_CALKIND_GREGORIAN

Valid range: 3/1/4801 BC to 10/29/292,277,019,914
The Gregorian calendar is the calendar currently in use throughout Western countries. Named after Pope Gregory XIII, it is a minor correction to the older Julian calendar. In the Gregorian calendar every fourth year is a leap year in which February has 29 and not 28 days; however, years divisible by 100 are not leap years unless they are also divisible by 400. As in the Julian calendar, days begin at midnight.

ESMF_CALKIND_JULIAN

Valid range: 3/1/4713 BC to 4/24/292,271,018,333
The Julian calendar was introduced by Julius Caesar in 46 B.C., and reached its final form in 4 A.D. The Julian calendar differs from the Gregorian only in the determination of leap years, lacking the correction for years divisible by 100 and 400 in the Gregorian calendar. In the Julian calendar, any year is a leap year if divisible by 4. Days are considered to begin at midnight.

ESMF_CALKIND_JULIANDAY

Valid range: +/- 1x10 $^{14}$
Julian days simply enumerate the days and fraction of a day which have elapsed since the start of the Julian era, defined as beginning at noon on Monday, 1st January of year 4713 B.C. in the Julian calendar. Julian days, unlike the dates in the Julian and Gregorian calendars, begin at noon.

ESMF_CALKIND_MODJULIANDAY

Valid range: +/- 1x10 $^{14}$
The Modified Julian Day (MJD) was introduced by space scientists in the late 1950's. It is defined as an offset from the Julian Day (JD):

MJD = JD - 2400000.5

The half day is subtracted so that the day starts at midnight.

ESMF_CALKIND_NOCALENDAR

Valid range: machine limits
The no-calendar option simply tracks the elapsed model time in seconds.

ESMF_CALKIND_NOLEAP

Valid range: machine limits
The no-leap calendar is the Gregorian calendar with no leap years - February is always assumed to have 28 days. Modelers sometimes use this calendar as a simple, close approximation to the Gregorian calendar.

42.3 Use and Examples

In most multi-component Earth system applications, the timekeeping in each component must refer to the same standard calendar in order for the components to properly synchronize. It therefore makes sense to create as few ESMF Calendars as possible, preferably one per application. A typical strategy would be to create a single Calendar at the start of an application, and use that Calendar in all subsequent calls that accept a Calendar, such as ESMF_TimeSet.

The following example shows how to set up an ESMF Calendar.

42.4 Restrictions and Future Work

Months per year set to 12. Due to the requirement of only Earth modeling, the number of months per year is hard-coded at 12. However, for easy modification, this is implemented via a C preprocessor #define MONTHS_PER_YEAR in ESMCI_Calendar.h.
Calendar date conversions. Date conversions are currently defined between the Gregorian, Julian, Julian Day, and Modified Julian Day calendars. Further research and work would need to be done to determine conversion algorithms with and between the other calendars: No Leap, 360 Day, and Custom.
ESMF_CALKIND_CUSTOM. Currently, there is no provision for a custom calendar to define a leap year rule, so ESMF_CalendarIsLeapYear() will always return .false. in this case. However, the arguments daysPerYear, daysPerYearDn, and daysPerYearDd in ESMF_CalendarCreate() and ESMF_CalendarSet() can be used to set a fractional number of days per year, for example, 365.25 = 365 25/100. Also, if further timekeeping precision is required, fractional and/or floating point secondsPerDay and secondsPerYear could be added to the interfaces ESMF_CalendarCreate(), ESMF_CalendarSet(), and ESMF_CalendarGet() and implemented.

42.5 Class API

43 Time Class

43.1 Description

A Time represents a specific point in time. In order to accommodate the range of time scales in Earth system applications, Times in the ESMF can be specified in many different ways, from years to nanoseconds. The Time interface is designed so that you select one or more options from a list of time units in order to specify a Time. The options for specifying a Time are shown in Table 41.4.

There are Time methods defined for setting and getting a Time, incrementing and decrementing a Time by a TimeInterval, taking the difference between two Times, and comparing Times. Special quantities such as the middle of the month and the day of the year associated with a particular Time can be retrieved. There is a method for returning the Time value as a string in the ISO 8601 format YYYY-MM-DDThh:mm:ss [15].

A Time that is specified in hours, minutes, seconds, or subsecond intervals does not need to be associated with a standard calendar; a Time whose specification includes time units of a day and greater must be. The ESMF representation of a calendar, the Calendar class, is described in Section 42.1. The ESMF_TimeSet method is used to initialize a Time as well as associate it with a Calendar. If a Time method is invoked in which a Calendar is necessary and one has not been set, the ESMF method will return an error condition.

In the ESMF the TimeInterval class is used to represent time periods. This class is frequently used in combination with the Time class. The Clock class, for example, advances model time by incrementing a Time with a TimeInterval.

43.2 Use and Examples

Times are most frequently used to represent start, stop, and current model times. The following examples show how to create, initialize, and manipulate Time.

43.3 Restrictions and Future Work

Limits on size and resolution of Time. The limits on the size and resolution of the time representation are based on the 64-bit integer types used. For seconds, a signed 64-bit integer will have a range of +/- $2^{63}$ -1, or +/- 9,223,372,036,854,775,807. This corresponds to a maximum size of +/- ( $2^{63}$ -1)/(86400 * 365.25) or +/- 292,271,023,045 years.
For fractional seconds, a signed 64-bit integer will handle a resolution of +/- $2^{31}$ -1, or +/- 9,223,372,036,854,775,807 parts of a second.

43.4 Class API

44 TimeInterval Class

44.1 Description

A TimeInterval represents a period between time instants. It can be either positive or negative. Like the Time interface, the TimeInterval interface is designed so that you can choose one or more options from a list of time units in order to specify a TimeInterval. See Section 41.3, Table 41.4 for the available options.

There are TimeInterval methods defined for setting and getting a TimeInterval, for incrementing and decrementing a TimeInterval by another TimeInterval, and for multiplying and dividing TimeIntervals by integers, reals, fractions and other TimeIntervals. Methods are also defined to take the absolute value and negative absolute value of a TimeInterval, and for comparing the length of two TimeIntervals.

The class used to represent time instants in ESMF is Time, and this class is frequently used in operations along with TimeIntervals. For example, the difference between two Times is a TimeInterval.

When a TimeInterval is used in calculations that involve an absolute reference time, such as incrementing a Time with a TimeInterval, calendar dependencies may be introduced. The length of the time period that the TimeInterval represents will depend on the reference Time and the standard calendar that is associated with it. The calendar dependency becomes apparent when, for example, adding a TimeInterval of 1 day to the Time of February 28, 1996, at 4:00pm EST. In a 360 day calendar, the resulting date would be February 29, 1996, at 4:00pm EST. In a no-leap calendar, the result would be March 1, 1996, at 4:00pm EST.

TimeIntervals are used by other parts of the ESMF timekeeping system, such as Clocks (Section 45.1) and Alarms (Section 46.1).

44.2 Use and Examples

A typical use for a TimeInterval in a geophysical model is representation of the time step by which the model is advanced. Some models change the size of their time step as the model run progresses; this could be done by incrementing or decrementing the original time step by another TimeInterval, or by dividing or multiplying the time step by an integer value. An example of advancing model time using a TimeInterval representation of a time step is shown in Section 45.1.

The following brief example shows how to create, initialize and manipulate TimeInterval.

44.3 Restrictions and Future Work

Limits on time span. The limits on the time span that can be represented are based on the 64-bit integer types used. For seconds, a signed 64-bit integer will have a range of +/- $2^{63}$ -1, or +/- 9,223,372,036,854,775,807. This corresponds to a range of +/- ( $2^{63}$ -1)/(86400 * 365.25) or +/- 292,271,023,045 years.
For fractional seconds, a signed 64-bit integer will handle a resolution of +/- $2^{31}$ -1, or +/- 9,223,372,036,854,775,807 parts of a second.

44.4 Class API

45 Clock Class

45.1 Description

The Clock class advances model time and tracks its associated date on a specified Calendar. It stores start time, stop time, current time, previous time, and a time step. It can also store a reference time, typically the time instant at which a simulation originally began. For a restart run, the reference time can be different than the start time, when the application execution resumes.

A user can call the ESMF_ClockSet method and reset the time step as desired.

A Clock also stores a list of Alarms, which can be set to flag events that occur at a specified time instant or at a specified time interval. See Section 46.1 for details on how to use Alarms.

There are methods for setting and getting the Times and Alarms associated with a Clock. Methods are defined for advancing the Clock's current time, checking if the stop time has been reached, reversing direction, and synchronizing with a real clock.

45.2 Constants

45.2.1 ESMF_DIRECTION

DESCRIPTION:
Specifies the time-stepping direction of a clock. Use with "direction" argument to methods ESMF_ClockSet() and ESMF_ClockGet(). Cannot be used with method ESMF_ClockCreate(), since it only initializes a clock in the default forward mode; a clock must be advanced (timestepped) at least once before reversing direction via ESMF_ClockSet(). This also holds true for negative timestep clocks which are initialized (created) with stopTime < startTime, since "forward" means timestepping from startTime towards stopTime (see ESMF_DIRECTION_FORWARD below).

"Forward" and "reverse" directions are distinct from positive and negative timesteps. "Forward" means timestepping in the direction established at ESMF_ClockCreate(), from startTime towards stopTime, regardless of the timestep sign. "Reverse" means timestepping in the opposite direction, back towards the clock's startTime, regardless of the timestep sign.

Clocks and alarms run in reverse in such a way that the state of a clock and its alarms after each time step is precisely replicated as it was in forward time-stepping mode. All methods which query clock and alarm state will return the same result for a given timeStep, regardless of the direction of arrival.

The type of this flag is:

type(ESMF_Direction_Flag)

The valid values are:

ESMF_DIRECTION_FORWARD: Upon calling ESMF_ClockAdvance(), the clock will timestep from its startTime toward its stopTime. This is the default direction. A user can use either ESMF_ClockIsStopTime() or ESMF_ClockIsDone() methods to determine when stopTime is reached. This forward behavior also holds for negative timestep clocks which are initialized (created) with stopTime < startTime.
ESMF_DIRECTION_REVERSE: Upon calling ESMF_ClockAdvance(), the clock will timestep backwards toward its startTime. Use method ESMF_ClockIsDone() to determine when startTime is reached. This reverse behavior also holds for negative timestep clocks which are initialized (created) with stopTime < startTime.

45.3 Use and Examples

The following is a typical sequence for using a Clock in a geophysical model.

At initialize:

Set a Calendar.
Set start time, stop time and time step as Times and Time Intervals.
Create and Initialize a Clock using the start time, stop time and time step.
Define Times and Time Intervals associated with special events, and use these to set Alarms.

At run:

Advance the Clock, checking for ringing alarms as needed.
Check if it is time to stop.

At finalize:

Since Clocks and Alarms are deep classes, they need to be explicitly destroyed at finalization. Times and TimeIntervals are lightweight classes, so they don't need explicit destruction.

The following code example illustrates Clock usage.

45.4 Restrictions and Future Work

Alarm list allocation factor The alarm list within a clock is dynamically allocated automatically, 200 alarm references at a time. This constant is defined in both Fortran and C++ with a #define for ease of modification.
Clock variable timesteps in reverse In order for a clock with variable timesteps to be run in ESMF_DIRECTION_REVERSE, the user must supply those timesteps to ESMF_ClockAdvance(). Essentially, the user must save the timesteps while in forward mode. In a future release, the Time Manager will assume this responsibility by saving the clock state (including the timeStep) at every timestep while in forward mode.

45.5 Class API

46 Alarm Class

46.1 Description

The Alarm class identifies events that occur at specific Times or specific TimeIntervals by returning a true value at those times or subsequent times, and a false value otherwise.

46.2 Constants

46.2.1 ESMF_ALARMLIST

DESCRIPTION:
Specifies the characteristics of Alarms that populate a retrieved Alarm list.

The type of this flag is:

type(ESMF_AlarmList_Flag)

The valid values are:

ESMF_ALARMLIST_ALL: All alarms.
ESMF_ALARMLIST_NEXTRINGING: Alarms that will ring before or at the next timestep.
ESMF_ALARMLIST_PREVRINGING: Alarms that rang at or since the last timestep.
ESMF_ALARMLIST_RINGING: Only ringing alarms.

46.3 Use and Examples

Alarms are used in conjunction with Clocks (see Section 45.1). Multiple Alarms can be associated with a Clock. During the ESMF_ClockAdvance() method, a Clock iterates over its internal Alarms to determine if any are ringing. Alarms ring when a specified Alarm time is reached or exceeded, taking into account whether the time step is positive or negative. In ESMF_DIRECTION_REVERSE (see Section 45.1), alarms ring in reverse, i.e., they begin ringing when they originally ended, and end ringing when they originally began. On completion of the time advance call, the Clock optionally returns a list of ringing alarms.

Each ringing Alarm can then be processed using Alarm methods for identifying, turning off, disabling or resetting the Alarm.

Alarm methods are defined for obtaining the ringing state, turning the ringer on/off, enabling/disabling the Alarm, and getting/setting associated times.

The following example shows how to set and process Alarms.

46.4 Restrictions and Future Work

Alarm list allocation factor The alarm list within a clock is dynamically allocated automatically, 200 alarm references at a time. This constant is defined in both Fortran and C++ with a #define for ease of modification.
Sticky alarm end times in reverse For sticky alarms, there is an implicit limitation that in order to properly reverse timestep through a ring end time, that time must have already been traversed in the forward direction. This is due to the fact that the Time Manager cannot predict when user code will call ESMF_AlarmRingerOff(). An error message will be logged when this limitation is not satisfied.
Sticky alarm ring interval in reverse For repeating sticky alarms, it is currently assumed that the ringInterval is constant, so that only the time of the last call to ESMF_AlarmRingerOff() is saved. In ESMF_DIRECTION_REVERSE, this information is used to turn sticky alarms back on. In a future release, ringIntervals will be allowed to be variable, by saving alarm state at every timestep.

46.5 Design and Implementation Notes

The Alarm class is designed as a deep, dynamically allocatable class, based on a pointer type. This allows for both indirect and direct manipulation of alarms. Indirect alarm manipulation is where ESMF_Alarm API methods, such as ESMF_AlarmRingerOff(), are invoked on alarm references (pointers) returned from ESMF_Clock queries such as "return ringing alarms." Since the method is performed on an alarm reference, the actual alarm held by the clock is affected, not just a user's local copy. Direct alarm manipulation is the more common case where alarm API methods are invoked on the original alarm objects created by the user.

For consistency, the ESMF_Clock class is also designed as a deep, dynamically allocatable class.

An additional benefit from this approach is that Clocks and Alarms can be created and used from anywhere in a user's code without regard to the scope in which they were created. In contrast, statically created Alarms and Clocks would disappear if created within a user's routine that returns, whereas dynamically allocated Alarms and Clocks will persist until explicitly destroyed by the user.

46.6 Class API

47 Config Class

47.1 Description

ESMF Configuration Management is based on NASA DAO's Inpak 90 package, a Fortran 90 collection of routines/functions for accessing Resource Files in ASCII format.The package is optimized for minimizing formatted I/O, performing all of its string operations in memory using Fortran intrinsic functions.

47.1.1 Package history

The ESMF Configuration Management Package was evolved by Leonid Zaslavsky and Arlindo da Silva from Ipack90 package created by Arlindo da Silva at NASA DAO.

Back in the 70's Eli Isaacson wrote IOPACK in Fortran 66. In June of 1987 Arlindo da Silva wrote Inpak77 using Fortran 77 string functions; Inpak 77 is a vastly simplified IOPACK, but has its own goodies not found in IOPACK. Inpak 90 removes some obsolete functionality in Inpak77, and parses the whole resource file in memory for performance.

47.1.2 Resource files

A Resource File (RF) is a text file consisting of list of label-value pairs. There is a buffer limit of 256,000 characters for the entire Resource File. Each label is limited to 1,000 characters. Each label should be followed by some data, the value. An example Resource File follows. It is the file used in the example below.

 # This is an example Resource File.  
 # It contains a list of <label,value> pairs.
 # The colon after the label is required. 

 # The values after the label can be an list.
 # Multiple types are authorized.
  
  my_file_names:         jan87.dat jan88.dat jan89.dat  # all strings
  constants:             3.1415   25                    # float and integer
  my_favorite_colors:    green blue 022               


 # Or, the data can be a list of single value pairs. 
 # It is simplier to retrieve data in this format:

  radius_of_the_earth:   6.37E6         
  parameter_1:           89
  parameter_2:           78.2
  input_file_name:       dummy_input.nc


 # Or, the data can be located in a table using the following
 # syntax:

  my_table_name::
   1000     3000     263.0
    925     3000     263.0
    850     3000     263.0
    700     3000     269.0
    500     3000     287.0
    400     3000     295.8
    300     3000     295.8
  ::

Note that the colon after the label is required and that the double colon is required to declare tabular data.

Resource files are intended for random access (except between ::'s in a table definition). This means that order in which a particular label-value pair is retrieved is not dependent upon the original order of the pairs. The only exception to this, however, is when the same label appears multiple times within the Resource File.

47.2 Use and Examples

47.3 Class API

48 HConfig Class

48.1 Description

The ESMF HConfig class implements a hierarchical configuration facility that is compatible with YAML Ain't Markup Language (YAML^TM). ESMF HConfig can be understood as a Fortran interface to YAML. However, no claim is made that all YAML language features are supported in their entirety.

The purpose of the HConfig class under ESMF is to provide a migration path toward more standard configuration management for ESMF applications. To this end ESMF_HConfig integrates with the traditional ESMF_Config class. Through this integration the traditional Config class API offers basic access to YAML configuration files, in addition to providing backward compatible support of the traditional config file format. This is discussed in more detail in the Config class section. For more complete YAML support, applications are encouraged to migrate to the HConfig API discussed in this section.

48.2 Constants

48.2.1 ESMF_HCONFIGMATCH

DESCRIPTION:
Indicates the level to which two HConfig variables match.

The type of this flag is:

type(ESMF_HConfigMatch_Flag)

The valid values in ascending order are:

ESMF_HCONFIGMATCH_INVALID:: Indicates a non-valid matching level. One or both HConfig objects are invalid.
ESMF_HCONFIGMATCH_NONE:: The lowest valid level of HConfig matching. This indicates that the HConfig objects are valid, but their YAML representation does not match.
ESMF_HCONFIGMATCH_EXACT:: There is an exact match between the YAML representation of both HConfig objects. They may or may not be aliases to the same object in memory.
ESMF_HCONFIGMATCH_ALIAS:: Both HConfig variables are aliases to the exact same HConfig object in memory.

48.3 Use and Examples

The following examples demonstrate how a user typically interacts with the HConfig API. The HConfig class introduces two derived types:

ESMF_HConfig
ESMF_HConfigIter

ESMF_HConfig objects can be created explicitly by the user, or they can be accessed from an existing ESMF_Config object, e.g. queried from a Component. They can play a number of roles when interacting with a HConfig hierarchy:

The root node of the entire hierarchy. In YAML terminology, this refers to a document.
Any node within the hierarchy.
Collection of hierarchies, i.e. a set of YAML documents.

ESMF_HConfigIter objects are iterators, referencing a specific node within the hierarchy. They are created from ESMF_HConfig objects. The iterator approach allows convenient sequential traversal of a particular location in the HConfig hierarchy. There are two flavors of iterators in HConfig: sequence and map iterators. Both are represented by the same ESMF_HConfigIter derived type, and the distinction is made at run-time.

Notice that there are redundancies built into the HConfig API, where different ways are available to achieve the same goal. This is mostly done for convenience, allowing the user to pick the approach most suitable to their needs.

For instance, while it can be convenient to use iterators in some cases, in others, it might be more appropriate to access elements directly by index (for sequences) or key (for maps). Both options are available.

48.4 Restrictions and Future Work

The YAML Core schema, which is an extension of the JSON schema, is implemented and used to resolve non-specific tags under HConfig. There is currently no mechanism implemented to switch to a different schema for tag resolution.
Currently the only available removal method for HConfig map objects requires that keys be simple scalar strings.
There is currently no method implemented that allows setting of tags from from the API.

48.5 Design and Implementation Notes

The ESMF HConfig class is implemented on top of YAML-CPP (https://github.com/jbeder/yaml-cpp). A copy of YAML-CPP is included in the ESMF source tree under ./src/prologue/yaml-cpp. It is used by a number of ESMF/NUOPC functions, including HConfig.

48.6 Class API

49 Log Class

49.1 Description

The Log class consists of a variety of methods for writing error, warning, and informational messages to files. A default Log is created at ESMF initialization. Other Logs can be created later in the code by the user. Most Log methods take a Log as an optional argument and apply to the default Log when another Log is not specified. A set of standard return codes and associated messages are provided for error handling.

Log provides capabilities to store message entries in a buffer, which is flushed to a file, either when the buffer is full, or when the user calls an ESMF_LogFlush() method. Currently, the default is for the Log to flush after every ten entries. This can easily be changed by using the ESMF_LogSet() method and setting the maxElements property to another value. The ESMF_LogFlush() method is automatically called when the program exits by any means (program completion, halt on error, or when the Log is closed).

The user has the capability to abort the program on conditions such as an error or on a warning by using the ESMF_LogSet() method with the logmsgAbort argument. For example if the logmsgAbort array is set to (ESMF_LOGMSG_ERROR,ESMF_LOGMSG_WARNING), the program will stop on any and all warning or errors. When the logmsgAbort argument is set to ESMF_LOGMSG_ERROR, the program will only abort on errors. Lastly, the user can choose to never abort by using ESMF_LOGMSG_NONE; this is the default.

Log will automatically put the PET number into the Log. Also, the user can either specify ESMF_LOGKIND_SINGLE which writes all the entries to a single Log or ESMF_LOGKIND_MULTI which writes entries to multiple Logs according to the PET number. To distinguish Logs from each other when using ESMF_LOGKIND_MULTI, the PET number (in the format PETx.) will be prepended to the file name where x is the PET number.

Opening multiple log files and writing log messages from all the processors may affect the application performance while running on a large number of processors. For that reason, ESMF_LOGKIND_NONE is provided to switch off the Log capability. All the Log methods have no effect in the ESMF_LOGKIND_NONE mode.

A tracing capability may be enabled by setting the trace flag by using the ESMF_LogSet() method. When tracing is enabled, calls to methods such as ESMF_LogFoundError, ESMF_LogFoundAllocError, and ESMF_LogFoundDeallocError are logged in the default log file. This can result in voluminous output. It is typically used only around areas of code which are being debugged.

Other options that are planned for Log are to adjust the verbosity of output, and to optionally write to stdout instead of file(s).

49.2 Constants

49.2.1 ESMF_LOGERR

The valid values are:

ESMF_LOGERR_PASSTHRU: A named character constant, with a predefined generic error message, that can be used for the msg argument in any ESMF_Log routine. The message indicated by this named constant is “Passing error in return code."

49.2.2 ESMF_LOGKIND

DESCRIPTION:
Specifies a single log file, multiple log files (one per PET), or no log files.

The type of this flag is:

type(ESMF_LogKind_Flag)

The valid values are:

ESMF_LOGKIND_SINGLE: Use a single log file, combining messages from all of the PETs. Not supported on some platforms.
ESMF_LOGKIND_MULTI: Use multiple log files — one per PET.
ESMF_LOGKIND_MULTI_ON_ERROR: Use multiple log files — one per PET. A log file is only opened when a message of type ESMF_LOGMSG_ERROR is encountered.
ESMF_LOGKIND_NONE: Do not issue messages to a log file.

49.2.3 ESMF_LOGMSG

DESCRIPTION:
Specifies a message level

The type of this flag is:

type(ESMF_LogMsg_Flag)

The valid values are:

ESMF_LOGMSG_INFO: Informational messages
ESMF_LOGMSG_WARNING: Warning messages
ESMF_LOGMSG_ERROR: Error messages
ESMF_LOGMSG_TRACE: Trace messages
ESMF_LOGMSG_DEBUG: DEBUG messages
ESMF_LOGMSG_JSON: JSON format messages

Valid predefined named array constant values are:

ESMF_LOGMSG_ALL: All messages
ESMF_LOGMSG_NONE: No messages
ESMF_LOGMSG_NOTRACE: All messages EXCEPT trace messages

49.3 Use and Examples

By default ESMF_Initialize() opens a default Log in ESMF_LOGKIND_MULTI mode. ESMF handles the initialization and finalization of the default Log so the user can immediately start using it. If additional Log objects are desired, they must be explicitly created or opened using ESMF_LogOpen().

ESMF_LogOpen() requires a Log object and filename argument. Additionally, the user can specify single or multi Logs by setting the logkindflag property to ESMF_LOGKIND_SINGLE or ESMF_LOGKIND_MULTI. This is useful as the PET numbers are automatically added to the Log entries. A single Log will put all entries, regardless of PET number, into a single log while a multi Log will create multiple Logs with the PET number prepended to the filename and all entries will be written to their corresponding Log by their PET number.

By default, the Log file is not truncated at the start of a new run; it just gets appended each time. Future functionality may include an option to either truncate or append to the Log file.

In all cases where a Log is opened, a Fortran unit number is assigned to a specific Log. A Log is assigned an unused unit number using the algorithm described in the ESMF_IOUnitGet() method.

The user can then set or get options on how the Log should be used with the ESMF_LogSet() and ESMF_LogGet() methods. These are partially implemented at this time.

Depending on how the options are set, ESMF_LogWrite() either writes user messages directly to a Log file or writes to a buffer that can be flushed when full or by using the ESMF_LogFlush() method. The default is to flush after every ten entries because maxElements is initialized to ten (which means the buffer reaches its full state after every ten writes and then flushes).

A message filtering option may be set with ESMF_LogSet() so that only selected message types are actually written to the log. One key use of this feature is to allow placing informational log write requests into the code for debugging or tracing. Then, when the informational entries are not needed, the messages at that level may be turned off — leaving only warning and error messages in the logs.

For every ESMF_LogWrite(), a time and date stamp is prepended to the Log entry. The time is given in microsecond precision. The user can call other methods to write to the Log. In every case, all methods eventually make a call implicitly to ESMF_LogWrite() even though the user may never explicitly call it.

When calling ESMF_LogWrite(), the user can supply an optional line, file and method. These arguments can be passed in explicitly or with the help of cpp macros. In the latter case, a define for an ESMF_FILENAME must be placed at the beginning of a file and a define for ESMF_METHOD must be placed at the beginning of each method. The user can then use the ESMF_CONTEXT cpp macro in place of line, file and method to insert the parameters into the method. The user does not have to specify line number as it is a value supplied by cpp.

An example of Log output is given below running with logkindflag property set to ESMF_LOGKIND_MULTI (default) using the default Log:

(Log file PET0.ESMF_LogFile)

20041105 163418.472210 INFO      PET0     Running with ESMF Version 2.2.1

(Log file PET1.ESMF_LogFile)

20041105 163419.186153 ERROR     PET1     ESMF_Field.F90             812  
ESMF_FieldGet No Grid or Bad Grid attached to Field

The first entry shows date and time stamp. The time is given in microsecond precision. The next item shown is the type of message (INFO in this case). Next, the PET number is added. Lastly, the content is written.

The second entry shows something slightly different. In this case, we have an ERROR. The method name (ESMF_Field.F90) is automatically provided from the cpp macros as well as the line number (812). Then the content of the message is written.

When done writing messages, the default Log is closed by calling ESMF_LogFinalize() or ESMF_LogClose() for user created Logs. Both methods will release the assigned unit number.

49.4 Restrictions and Future Work

Line, file and method are only available when using the C preprocessor Message writing methods are expanded using the ESMF macro ESMF_CONTEXT that adds the predefined symbolic constants __LINE__ and __FILE__ (or the ESMF constant ESMF_FILENAME if defined) and the ESMF constant ESMF_METHOD to the argument list. Using these constants, we can associate a file name, line number and method name with the message. If the CPP preprocessor is not used, this expansion will not be done and hence the ESMF macro ESMF_CONTEXT can not be used, leaving the file name, line number and method out of the Log text.
Get and set methods are partially implemented. Currently, the ESMF_LogGet() and ESMF_LogSet() methods are partially implemented.
Log only appends entries. All writing to the Log is appended rather than overwriting the Log. Future enhancements include the option to either append to an existing Log or overwrite the existing Log.
Avoiding conflicts with the default Log. The private methods ESMF_LogInitialize() and ESMF_LogFinalize() are called during ESMF_Initialize() and ESMF_Finalize() respectively, so they do not need to be called if the default Log is used. If a new Log is required, ESMF_LogOpen() is used with a new Log object passed in so that there are no conflicts with the default Log.
ESMF_LOGKIND_SINGLE does not work properly. When the ESMF_LogKind_Flag is set to ESMF_LOGKIND_SINGLE, different system may behave differently. The log messages from some processors may be lost or overwritten by other processors. Users are advised not to use this mode. The MPI-based I/O will be implemented to fix the problem in the future release.

49.5 Design and Implementation Notes

The Log class was implemented in Fortran and uses the Fortran I/O libraries when the class methods are called from Fortran. The C/C++ Log methods use the Fortran I/O library by calling utility functions that are written in Fortran. These utility functions call the standard Fortran write, open and close functions. At initialization an ESMF_LOG is created. The ESMF_LOG stores information for a specific Log file. When working with more than one Log file, multiple ESMF_LOG's are required (one ESMF_LOG for each Log file). For each Log, a handle is returned through the ESMF_LogInitialize method for the default log or ESMF_LogOpen for a user created log. The user can specify single or multi logs by setting the logkindflag property in the ESMF_LogInitialize or ESMF_Open method to ESMF_LOGKIND_SINGLE or ESMF_LOGKIND_MULTI. Similarly, the user can set the logkindflag property for the default Log with the ESMF_Initialize method call. The logkindflag is useful as the PET numbers are automatically added to the log entries. A single log will put all entries, regardless of PET number, into a single log while a multi log will create multiple logs with the PET number prepended to the filename and all entries will be written to their corresponding log by their PET number.
The properties for a Log are set with the ESMF_LogSet() method and retrieved with the ESMF_LogGet() method.
Additionally, buffering is enabled. Buffering allows ESMF to manage output data streams in a desired way. Writing to the buffer is transparent to the user because all the Log entries are handled automatically by the ESMF_LogWrite() method. All the user has to do is specify the buffer size (the default is ten) by setting the maxElements property. Every time the ESMF_LogWrite() method is called, a LogEntry element is populated with the ESMF_LogWrite() information. When the buffer is full (i.e., when all the LogEntry elements are populated), the buffer will be flushed and all the contents will be written to file. If buffering is not needed, that is maxElements=1 or flushImmediately=ESMF_TRUE, the ESMF_LogWrite() method will immediately write to the Log file(s).

49.6 Object Model

The following is a simplified UML diagram showing the structure of the Log class. See Appendix A, A Brief Introduction to UML, for a translation table that lists the symbols in the diagram and their meaning.

$\includegraphics{Log_obj}$

49.7 Class API

50 DELayout Class

50.1 Description

The DELayout class provides an additional layer of abstraction on top of the Virtual Machine (VM) layer. DELayout does this by introducing DEs (Decomposition Elements) as logical resource units. The DELayout object keeps track of the relationship between its DEs and the resources of the associated VM object.

The relationship between DEs and VM resources (PETs (Persistent Execution Threads) and VASs (Virtual Address Spaces)) contained in a DELayout object is defined during its creation and cannot be changed thereafter. There are, however, a number of hint and specification arguments that can be used to shape the DELayout during its creation.

Contrary to the number of PETs and VASs contained in a VM object, which are fixed by the available resources, the number of DEs contained in a DELayout can be chosen freely to best match the computational problem or other design criteria. Creating a DELayout with less DEs than there are PETs in the associated VM object can be used to share resources between decomposed objects within an ESMF component. Creating a DELayout with more DEs than there are PETs in the associated VM object can be used to evenly partition the computation over the available resources.

The simplest case, however, is where the DELayout contains the same number of DEs as there are PETs in the associated VM context. In this case the DELayout may be used to re-label the hardware and operating system resources held by the VM. For instance, it is possible to order the resources so that specific DEs have best available communication paths. The DELayout will map the DEs to the PETs of the VM according to the resource details provided by the VM instance.

Furthermore, general DE to PET mapping can be used to offer computational resources with finer granularity than the VM does. The DELayout can be queried for computational and communication capacities of DEs and DE pairs, respectively. This information can be used to best utilize the DE resources when partitioning the computational problem. In combination with other ESMF classes, general DE to PET mapping can be used to realize cache blocking, communication hiding and dynamic load balancing.

Finally, the DELayout layer offers primitives that allow a work queue style dynamic load balancing between DEs.

50.2 Constants

50.2.1 ESMF_PIN

DESCRIPTION:
Specifies which VM resource DEs are pinned to (PETs, VASs, SSIs).

The type of this flag is:

type(ESMF_Pin_Flag)

The valid values are:

ESMF_PIN_DE_TO_PET: Pin DEs to PETs. Only the owning PET has access to a DE.
ESMF_PIN_DE_TO_VAS: Pin DEs to virtual address spaces (VAS). DEs are accessible from all PETs within the same VAS.
ESMF_PIN_DE_TO_SSI: Pin DEs to single system images (SSI) - typically shared memory nodes. DEs are accessible from all PETs within the same SSI. The memory allocation between different DEs is allowed to be non-contiguous.
ESMF_PIN_DE_TO_SSI_CONTIG: Same as ESMF_PIN_DE_TO_SSI, but the shared memory allocation across DEs located on the same SSI must be contigous throughout.

50.2.2 ESMF_SERVICEREPLY

DESCRIPTION:
Reply when a PET offers to service a DE.

The type of this flag is:

type(ESMF_ServiceReply_Flag)

The valid values are:

ESMF_SERVICEREPLY_ACCEPT: The service offer has been accepted. The PET is expected to service the DE.
ESMF_SERVICEREPLY_DENY: The service offer has been denied. The PET is expected to not service the DE.

50.3 Use and Examples

The following examples demonstrate how to create, use and destroy DELayout objects.

50.4 Restrictions and Future Work

50.5 Design and Implementation Notes

The DELayout class is a light weight object. It stores the DE to PET and VAS mapping for all DEs within all PET instances and a list of local DEs for each PET instance. The DELayout does not store the computational and communication weights optionally provided as arguments to the create method. These hints are only used during create while they are available in user owned arrays.

50.6 Class API

51 VM Class

51.1 Description

In addition to resource description and management, the VM class offers the lowest level of ESMF communication methods. The VM communication calls are very similar to MPI. Data references in VM communication calls must be provided as raw, language-specific, one-dimensional, contiguous data arrays. The similarity between VM and MPI communication calls is striking and there are many equivalent point-to-point and collective communication calls. However, unlike MPI, the VM communication calls support communication between threaded PETs in a completely transparent fashion.

Many ESMF applications do not interact with the VM class directly very much. The resource management aspect is wrapped completely transparent into the ESMF Component concept. Often the only reason that user code queries a Component object for the associated VM object is to inquire about resource information, such as the localPet or the petCount. Further, for most applications the use of higher level communication APIs, such as provided by Array and Field, are much more convenient than using the low level VM communication calls.

The basic elements of a VM are called PETs, which stands for Persistent Execution Threads. These are equivalent to OS threads with a lifetime of at least that of the associated component. All VM functionality is expressed in terms of PETs. In the simplest, and most common case, a PET is equivalent to an MPI process. However, ESMF also supports multi-threading, where multiple PETs run as Pthreads inside the same virtual address space (VAS).

The resource management functions of the VM class become visible when a component, or the driver code, creates sub-components. Section 16.4.3 discusses this aspect from the Superstructure perspective and provides links to the relevant Component examples in the documentation.

There are two parts to resource management, the parent and the child. When the parent component creates a child component, the parent VM object provides the resources on which the child is created with ESMF_GridCompCreate() or ESMF_CplCompCreate(). The optional petList argument to these calls limits the resources that the parent gives to a specific child. The child component, may specify - during its optional ESMF_<Grid/Cpl>CompSetVM() method - how it wants to arrange the inherited resources in its own VM. After this, all standard ESMF methods of the Component, including ESMF_<Grid/Cpl>CompSetServices(), will execute in the child VM. Notice that the ESMF_<Grid/Cpl>CompSetVM() routine, although part of the child Component, must execute before the child VM has been started up. It runs in the parent VM context. The child VM is created and started up just before the user-written set services routine, specified as an argument to ESMF_<Grid/Cpl>CompSetServices(), is entered.

51.2 Constants

51.2.1 ESMF_VMEPOCH

DESCRIPTION:
Specifies the kind of VM Epoch being entered.

The type of this flag is:

type(ESMF_VMEpoch_Flag)

The valid values are:

ESMF_VMEPOCH_NONE: An epoch wihout special behavior.
ESMF_VMEPOCH_BUFFER: This option must only be used for parts of the code with distinct sending and receiving PETs, i.e. where no PETs are both sender and receiver. All non-blocking messages are being buffered. A single message is sent between unique pairs of src-dst PETs. This can significantly improve performance for cases with a large imbalance in the number of sending versus receiving PETs. The extra buffering also improves the overall asynchronous behavior between the sending and receiving side.

51.3 Use and Examples

The concept of the ESMF Virtual Machine (VM) is so fundamental to the framework that every ESMF application uses it. However, for many user applications the VM class is transparently hidden behind the ESMF Component concept and higher data classes (e.g. Array, Field). The interaction between user code and VM is often only indirect. The following examples provide an overview of where the VM class can come into play in user code.

51.4 Restrictions and Future Work

Only array section syntax that leads to contiguous sub sections is supported. The source and destination arguments in VM communication calls must reference contiguous data arrays. Fortran array sections are not guaranteed to be contiguous in all cases.
Non-blocking Reduce() operations not implemented. None of the reduce communication calls have an implementation for the non-blocking feature. This affects:
- ESMF_VMAllFullReduce(),
- ESMF_VMAllReduce(),
- ESMF_VMReduce().
Limitations when using mpiuni mode. In mpiuni mode non-blocking communications are limited to one outstanding message per source-destination PET pair. Furthermore, in mpiuni mode the message length must be smaller than the internal ESMF buffer size.
Alternative communication paths not accessible. All user accessible VM communication calls are currently implemented using MPI-1.2. VM's implementation of alternative communication techniques, such as shared memory between threaded PETs and POSIX IPC between PETs located on the same single system image, are currently inaccessible to the user. (One exception to this is the mpiuni case for which the VM automatically utilizes a shared memory path.)
Data arrays in VM comm calls are assumed shape with rank=1. Currently all dummy arrays in VM comm calls are defined as assumed shape arrays of rank=1. The motivation for this choice is that the use of assumed shape dummy arrays guards against the Fortran copy in/out problem. However it may not be as flexible as desired from the user perspective. Alternatively all dummy arrays could be defined as assumed size arrays, as it is done in most MPI implementations, allowing arrays of various rank to be passed into the comm methods. Arrays of higher rank can be passed into the current interfaces using Fortran array syntax. This approach is explained in section .
Limitations when using VMEpoch. Using a blocking collective call (e.g. ESMF_VMBroadcast(), the MPI_Bcast() used by ESMF_InfoBroadcast(), etc.) within the region enclosed by ESMF_VMEpochEnter() and ESMF_VMEpochExit() will result in a deadlock.

51.5 Design and Implementation Notes

The VM class provides an additional layer of abstraction on top of the POSIX machine model, making it suitable for HPC applications. There are four key aspects the VM class deals with.

Encapsulation of hardware and operating system details within the concept of Persistent Execution Threads (PETs).
Resource management in terms of PETs with a guard against over-subscription.
Topological description of the underlying configuration of the compute resources in terms of PETs.
Transparent communication API for point-to-point and collective PET-based primitives, hiding the many different communication channels and offering best possible performance.

$\scalebox{0.6}{\includegraphics{VM_design}}$

Definition of terms used in the diagram

PE: A processing element (PE) is an alias for the smallest physical processing unit available on a particular hardware platform. In the language of today's microprocessor architecture technology a PE is identical to a core, however, if future microprocessor designs change the smallest physical processing unit the mapping of the PE to actual hardware will change accordingly. Thus the PE layer separates the hardware specific part of the VM from the hardware-independent part. Each PE is labeled with an id number which identifies it uniquely within all of the VM instances of an ESMF application.
Core: A Core is the smallest physical processing unit which typically comprises a register set, an integer arithmetic unit, a floating-point unit and various control units. Each Core is labeled with an id number which identifies it uniquely within all of the VM instances of an ESMF application.
CPU: The central processing unit (CPU) houses single or multiple cores, providing them with the interface to system memory, interconnects and I/O. Typically the CPU provides some level of caching for the instruction and data streams in and out of the Cores. Cores in a multi-core CPU typically share some caches. Each CPU is labeled with an id number which identifies it uniquely within all of the VM instances of an ESMF application.
SSI: A single system image (SSI) spans all the CPUs controlled by a single running instance of the operating system. SMP and NUMA are typical multi-CPU SSI architectures. Each SSI is labeled with an id number which identifies it uniquely within all of the VM instances of an ESMF application.
TOE: A thread of execution (TOE) executes an instruction sequence. TOE's come in two flavors: PET and TET.
PET: A persistent execution thread (PET) executes an instruction sequence on an associated set of data. The PET has a lifetime at least as long as the associated data set. In ESMF the PET is the central concept of abstraction provided by the VM class. The PETs of an VM object are labeled from 0 to N-1 where N is the total number of PETs in the VM object.
TET: A transient execution thread (TET) executes an instruction sequence on an associated set of data. A TET's lifetime might be shorter than that of the associated data set.
OS-Instance: The OS-Instance of a TOE describes how a particular TOE is instantiated on the OS level. Using POSIX terminology a TOE will run as a single thread within a single- or multi-threaded process.
Pthreads: Communication via the POSIX Thread interface.
MPI-1, MPI-2: Communication via MPI standards 1 and 2.
armci: Communication via the aggregate remote memory copy interface.
SHMEM: Communication via the SHMEM interface.
OS-IPC: Communication via the operating system's inter process communication interface. Either POSIX IPC or System V IPC.
InterCon-lib: Communication via the interconnect's library native interface. An example is the Elan library for Quadrics.

The POSIX machine abstraction, while a very powerful concept, needs augmentation when applied to HPC applications. Key elements of the POSIX abstraction are processes, which provide virtually unlimited resources (memory, I/O, sockets, ...) to possibly multiple threads of execution. Similarly POSIX threads create the illusion that there is virtually unlimited processing power available to each POSIX process. While the POSIX abstraction is very suitable for many multi-user/multi-tasking applications that need to share limited physical resources, it does not directly fit the HPC workload where over-subscription of resources is one of the most expensive modes of operation.

ESMF's virtual machine abstraction is based on the POSIX machine model but holds additional information about the available physical processing units in terms of Processing Elements (PEs). A PE is the smallest physical processing unit and encapsulates the hardware details (Cores, CPUs and SSIs).

There is exactly one physical machine layout for each application, and all VM instances have access to this information. The PE is the smallest processing unit which, in today's microprocessor technology, corresponds to a single Core. Cores are arranged in CPUs which in turn are arranged in SSIs. The setup of the physical machine layout is part of the ESMF initialization process.

On top of the PE concept the key abstraction provided by the VM is the PET. All user code is executed by PETs while OS and hardware details are hidden. The VM class contains a number of methods which allow the user to prescribe how the PETs of a desired virtual machine should be instantiated on the OS level and how they should map onto the hardware. This prescription is kept in a private virtual machine plan object which is created at the same time the associated component is being created. Each time component code is entered through one of the component's registered top–level methods (Initialize/Run/Finalize), the virtual machine plan along with a pointer to the respective user function is used to instantiate the user code on the PETs of the associated VM in form of single- or multi-threaded POSIX processes.

The process of starting, entering, exiting and shutting down a VM is very transparent, all spawning and joining of threads is handled by VM methods "behind the scenes". Furthermore, fundamental synchronization and communication primitives are provided on the PET level through a uniform API, hiding details related to the actual instantiation of the participating PETs.

Within a VM object each PE of the physical machine maps to 0 or 1 PETs. Allowing unassigned PEs provides a means to prevent over-subscription between multiple concurrently running virtual machines. Similarly a maximum of one PET per PE prevents over-subscription within a single VM instance. However, over-subscription is possible by subscribing PETs from different virtual machines to the same PE. This type of over-subscription can be desirable for PETs associated with I/O workloads expected to be used infrequently and to block often on I/O requests.

On the OS level each PET of a VM object is represented by a POSIX thread (Pthread) either belonging to a single– or multi–threaded process and maps to at least 1 PE of the physical machine, ensuring its execution. Mapping a single PET to multiple PEs provides resources for user–level multi–threading, in which case the user code inquires how many PEs are associated with its PET and if there are multiple PEs available the user code can spawn an equal number of threads (e.g. OpenMP) without risking over-subscription. Typically these user spawned threads are short-lived and used for fine-grained parallelization in form of TETs. All PEs mapped against a single PET must be part of a unique SSI in order to allow user–level multi–threading!

In addition to discovering the physical machine the ESMF initialization process sets up the default global virtual machine. This VM object, which is the ultimate parent of all VMs created during the course of execution, contains as many PETs as there are PEs in the physical machine. All of its PETs are instantiated in form of single-threaded MPI processes and a 1:1 mapping of PETs to PEs is used for the default global VM.

The VM design and implementation is based on the POSIX process and thread model as well as the MPI-1.2 standard. As a consequence of the latter standard the number of processes is static during the course of execution and is determined at start-up. The VM implementation further requires that the user starts up the ESMF application with as many MPI processes as there are PEs in the available physical machine using the platform dependent mechanism to ensure proper process placement.

All MPI processes participating in a VM are grouped together by means of an MPI_Group object and their context is defined via an MPI_Comm object (MPI intra-communicator). The PET local process id within each virtual machine is equal to the MPI_Comm_rank in the local MPI_Comm context whereas the PET process id is equal to the MPI_Comm_rank in MPI_COMM_WORLD. The PET process id is used within the VM methods to determine the virtual memory space a PET is operating in.

In order to provide a migration path for legacy MPI-applications the VM offers accessor functions to its MPI_Comm object. Once obtained this object may be used in explicit user-code MPI calls within the same context.

51.6 Class API

52 Profiling and Tracing

52.1 Description

52.1.1 Profiling

ESMF's built in profiling capability collects runtime statistics of an executing ESMF application through both automatic and manual code instrumentation. Timing information for all phases of all ESMF components executing in an application can be automatically collected using the ESMF_RUNTIME_PROFILE environment variable (see below for settings). Additionally, arbitrary user-defined code regions can be timed by manually instrumenting code with special API calls. Timing profiles of component phases and user-defined regions can be output in several different formats:

in text at the end of ESMF Log files
in separate text file, one per PET (if the ESMF Logs are turned off)
in a single summary text file that aggregates timings over multiple PETs
in a binary format for import into the esmf-profiler for profile visualization

The following table lists important environment variables that control aspects of ESMF profiling.

Environment Variable	Description	Example Values	Default
ESMF_RUNTIME_PROFILE	Enable/disables all profiling functions	ON or OFF	OFF
ESMF_RUNTIME_PROFILE_PETLIST	Limits profiling to an explicit list of PETs	“0-9 50 99”	profile all PETs
ESMF_RUNTIME_PROFILE_OUTPUT	Controls output format of profiles; multiple can be specified in a space separated list	TEXT, SUMMARY, BINARY	TEXT

52.1.2 Tracing

Whereas profiling collects summary information from an application, tracing records a more detailed set of events for later analysis. Trace analysis can be used to understand what happened during a program's execution and is often used for diagnosing problems, debugging, and performance analysis.

ESMF has a built-in tracing capability that records events into special binary log files. Unlike log files written by the ESMF_Log class, which are primarily for human consumption (see Section 49.1), the trace output files are recorded in a compact binary representation and are processed by tools to produce various analyses. ESMF event streams are recorded in the Common Trace Format (CTF). CTF traces include one or more event streams, as well as a metadata file describing the events in the streams.

Several tools are available for reading in the CTF traces output by ESMF. Of the tools listed below, the first one is designed specifically for analyzing ESMF applications and the second two are general purpose tools for working with all CTF traces.

esmf-profiler is a tool that ingests traces from an ESMF application and generates performance profile plots.
TraceCompass is a general purpose tool for reading, analyzing, and visualizing traces.
Babeltrace is a command-line tool and library for trace conversion that can read and write CTF traces. Python bindings are available to open CTF traces are iterate through events.

Events that can be captured by the ESMF tracer include the following. Events are recorded with a high-precision timestamp to allow timing analyses.

phase_enter: indicates entry into an initialize, run, or finalize ESMF component routine
phase_exit: indicates exit from an initialize, run, or finalize ESMF component routine
region_enter: indicates entry into a user-defined code region
region_exit: indicates exit from a user-defined code region

The following table lists important environment variables that control aspects of ESMF tracing.

Environment Variable	Description	Example Values	Default
ESMF_RUNTIME_TRACE	Enable/disables all tracing functions	ON or OFF	OFF
ESMF_RUNTIME_TRACE_CLOCK	Sets the type of clock for timestamping events (see Section 52.2.6).	REALTIME or MONOTONIC or MONOTONIC_SYNC	REALTIME
ESMF_RUNTIME_TRACE_PETLIST	Limits tracing to an explicit list of PETs	“0-9 50 99”	trace all PETs
ESMF_RUNTIME_TRACE_COMPONENT	Enables/disable tracing of Component phase_enter and phase_exit events	ON or OFF	ON
ESMF_RUNTIME_TRACE_FLUSH	Controls frequency of event stream flushing to file	DEFAULT or EAGER	DEFAULT

52.2 Use and Examples

52.2.1 Output a Timing Profile to Text

ESMF profiling is disabled by default. To profile an application, set the ESMF_RUNTIME_PROFILE variable to ON prior to executing the application. You do not need to recompile your code to enable profiling.

# csh shell
$ setenv ESMF_RUNTIME_PROFILE ON

# bash shell
$ export ESMF_RUNTIME_PROFILE=ON

# (from now on, only the csh shell version will be shown)

Then execute the application in the usual way. At the end of the run the profile information will be available at the end of each PET log (if ESMF Logs are turned on) or in a set of separate files, one per PET, with names ESMF_Profile.XXX where XXX is the PET number. Below is an example timing profile. Some regions are left out for brevity.

Region                           Count  Total (s)   Self (s)    Mean (s)    Min (s)     Max (s)
  [esm] Init 1                   1      4.0878      0.0341      4.0878      4.0878      4.0878
    [OCN-TO-ATM] IPDv05p6b       1      2.6007      2.6007      2.6007      2.6007      2.6007
    [ATM-TO-OCN] IPDv05p6b       1      1.4333      1.4333      1.4333      1.4333      1.4333
    [ATM] IPDv00p2               1      0.0055      0.0055      0.0055      0.0055      0.0055
    [OCN] IPDv00p2               1      0.0023      0.0023      0.0023      0.0023      0.0023
    [ATM] IPDv00p1               1      0.0011      0.0011      0.0011      0.0011      0.0011
    [OCN] IPDv00p1               1      0.0009      0.0009      0.0009      0.0009      0.0009
    [ATM-TO-OCN] IPDv05p3        1      0.0008      0.0008      0.0008      0.0008      0.0008
    [ATM-TO-OCN] IPDv05p1        1      0.0008      0.0008      0.0008      0.0008      0.0008
    [ATM-TO-OCN] IPDv05p2b       1      0.0007      0.0007      0.0007      0.0007      0.0007
    [ATM-TO-OCN] IPDv05p4        1      0.0007      0.0007      0.0007      0.0007      0.0007
    [ATM-TO-OCN] IPDv05p2a       1      0.0007      0.0007      0.0007      0.0007      0.0007
    [ATM-TO-OCN] IPDv05p5        1      0.0007      0.0007      0.0007      0.0007      0.0007
    [OCN-TO-ATM] IPDv05p3        1      0.0006      0.0006      0.0006      0.0006      0.0006
    [OCN-TO-ATM] IPDv05p4        1      0.0006      0.0006      0.0006      0.0006      0.0006
    [OCN-TO-ATM] IPDv05p2b       1      0.0006      0.0006      0.0006      0.0006      0.0006
    [OCN-TO-ATM] IPDv05p2a       1      0.0006      0.0006      0.0006      0.0006      0.0006
    [OCN-TO-ATM] IPDv05p5        1      0.0006      0.0006      0.0006      0.0006      0.0006
    [OCN-TO-ATM] IPDv05p1        1      0.0005      0.0005      0.0005      0.0005      0.0005
  [esm] RunPhase1                1      2.7423      0.9432      2.7423      2.7423      2.7423
    [OCN-TO-ATM] RunPhase1       864    0.6094      0.6094      0.0007      0.0006      0.0179
    [ATM] RunPhase1              864    0.5296      0.2274      0.0006      0.0005      0.0011
      ATM:ModelAdvance           864    0.3022      0.3022      0.0003      0.0003      0.0005
    [ATM-TO-OCN] RunPhase1       864    0.3345      0.3345      0.0004      0.0002      0.0299
    [OCN] RunPhase1              864    0.3256      0.3256      0.0004      0.0003      0.0010
  [esm] FinalizePhase1           1      0.0029      0.0020      0.0029      0.0029      0.0029
    [OCN-TO-ATM] FinalizePhase1  1      0.0006      0.0006      0.0006      0.0006      0.0006
    [ATM-TO-OCN] FinalizePhase1  1      0.0002      0.0002      0.0002      0.0002      0.0002
    [OCN] FinalizePhase1         1      0.0001      0.0001      0.0001      0.0001      0.0001
    [ATM] FinalizePhase1         1      0.0000      0.0000      0.0000      0.0000      0.0000

A timed region is either an ESMF component phase (e.g., initialize, run, or finalize) or a user-defined region of code surrounded by calls to ESMF_TraceRegionEnter() and ESMF_TraceRegionExit(). (See section for more information on instrumenting user-defined regions.) Regions are organized hierarchically with sub-regions nested. For example, in the profile above, the [OCN] RunPhase1 is a sub-region of [esm] RunPhase1 and is entirely contained inside that region. Regions with the same name may appear at multiple places in the hierarchy, and so would appear in multiple rows in the table. The statistics in that row apply to that region at that location in the hierarchy. Component names appear in square brackets, e.g., [ATM], [OCN], and [ATM-TO-OCN]. By default, timings are based on elapsed wall clock time and are collected on a per-PET basis. Therefore, regions timings may differ across PETs. Regions are sorted with the most expensive regions appearing at the top. The following describes the meaning of the statistics in each column:

Count	the number of times the region is executed
Total	the aggregate time spent in the region, inclusive of all sub-regions
Self	the aggregate time spend in the region, exclusive of all sub-regions
Mean	the average amount of time for one execution of the region
Min	time of the fastest execution of the region
Max	time of the slowest execution of the region

52.2.2 Summarize Timings across Multiple PETs

By default, separate timing profiles are generated for each PET in the application. The per-PET profiles can be aggregated together and output to a single file, ESMF_Profile.summary, by setting the ESMF_RUNTIME_PROFILE_OUTPUT environment variable as follows:

$ setenv ESMF_RUNTIME_PROFILE ON              # turn on profiling
$ setenv ESMF_RUNTIME_PROFILE_OUTPUT SUMMARY  # specify summary output

Note the ESMF_RUNTIME_PROFILE environment variable must also be set to ON since this controls all profiling capabilities. The ESMF_Profile.summary file will contain a tree of timed regions, but aggregated across all PETs. For example:

Region                           PETs   PEs    Count    Mean (s)    Min (s)     Min PET Max (s)     Max PET
  [esm] Init 1                   4      4      1        4.0880      4.0878      2       4.0883      1
    [OCN-TO-ATM] IPDv05p6b       4      4      1        2.6007      2.6007      2       2.6007      3
    [ATM-TO-OCN] IPDv05p6b       4      4      1        1.4335      1.4333      0       1.4337      3
    [ATM-TO-OCN] IPDv05p4        4      4      1        0.0037      0.0007      0       0.0060      1
    [ATM] IPDv00p2               4      4      1        0.0034      0.0020      1       0.0055      0
    [ATM-TO-OCN] IPDv05p1        4      4      1        0.0020      0.0007      2       0.0033      3
    [OCN] IPDv00p2               4      4      1        0.0019      0.0015      3       0.0024      2
    [ATM-TO-OCN] IPDv05p3        4      4      1        0.0010      0.0008      0       0.0013      1
    [ATM-TO-OCN] IPDv05p2a       4      4      1        0.0009      0.0007      0       0.0012      3
    [ATM] IPDv00p1               4      4      1        0.0009      0.0007      3       0.0011      0
    [ATM-TO-OCN] IPDv05p2b       4      4      1        0.0008      0.0007      0       0.0010      3
    [ATM-TO-OCN] IPDv05p5        4      4      1        0.0008      0.0007      0       0.0010      3
    [ATM-TO-OCN] IPDv05p6a       4      4      1        0.0008      0.0005      2       0.0012      3
    [OCN-TO-ATM] IPDv05p3        4      4      1        0.0008      0.0006      2       0.0010      3
    [OCN-TO-ATM] IPDv05p4        4      4      1        0.0008      0.0006      0       0.0009      3
    [OCN-TO-ATM] IPDv05p2b       4      4      1        0.0007      0.0006      2       0.0009      3
    [OCN] IPDv00p1               4      4      1        0.0007      0.0005      1       0.0009      2
    [OCN-TO-ATM] IPDv05p2a       4      4      1        0.0007      0.0006      2       0.0009      1
    [OCN-TO-ATM] IPDv05p5        4      4      1        0.0007      0.0006      0       0.0009      3
    [OCN-TO-ATM] IPDv05p1        4      4      1        0.0006      0.0005      0       0.0008      1
    [OCN-TO-ATM] IPDv05p6a       4      4      1        0.0006      0.0004      2       0.0007      1
  [esm] RunPhase1                4      4      1        2.7444      2.7423      0       2.7454      1
    [OCN-TO-ATM] RunPhase1       4      4      864      0.6123      0.6004      2       0.6244      1
    [ATM] RunPhase1              4      4      864      0.5386      0.5296      0       0.5530      1
      ATM:ModelAdvance           4      4      864      0.3038      0.3022      0       0.3065      1
    [OCN] RunPhase1              4      4      864      0.3471      0.3256      0       0.3824      1
    [ATM-TO-OCN] RunPhase1       4      4      864      0.2843      0.1956      1       0.3345      0
  [esm] FinalizePhase1           4      4      1        0.0029      0.0029      1       0.0030      2
    [OCN-TO-ATM] FinalizePhase1  4      4      1        0.0007      0.0006      0       0.0008      3
    [ATM-TO-OCN] FinalizePhase1  4      4      1        0.0002      0.0001      3       0.0002      1
    [OCN] FinalizePhase1         4      4      1        0.0001      0.0001      3       0.0001      0
    [ATM] FinalizePhase1         4      4      1        0.0001      0.0000      0       0.0001      2

The meaning of the statistics in each column in as follows:

PETs	the number of reporting PETs that executed the region
PEs	the number of PEs associated with the reporting PETs that executed the region
Count	the number of times each reporting PET executed the region or “MULTIPLE” if not all PETs executed the region the same number of times
Mean	the mean across all reporting PETs of the total time spent in the region
Min	the minimum across all reporting PETs of the total time spent in the region
Min PET	the PET that reported the minimum time
Max	the maximum across all reporting PETs of the total time spent in the region
Max PET	the PET that reported the maximum time

Note that setting the ESMF_RUNTIME_PROFILE_PETLIST environment variable (described below) may reduce the number of reporting PETs. Only reporting PETs are included in the summary profile. To output both the per-PET and summary timing profiles, set the ESMF_RUNTIME_PROFILE_OUTPUT environment variable as follows:

$ setenv ESMF_RUNTIME_PROFILE_OUTPUT "TEXT SUMMARY"

52.2.3 Limit the Set of Profiled PETs

By default, all PETs in an application are profiled. It may be desirable to only profile a subset of PETs to reduce the amount of output. An explicit list of PETs can be specified by setting the ESMF_RUNTIME_PROFILE_PETLIST environment variable. The syntax of this environment variable is to list PET numbers separated by spaces. PET ranges are also supported using the “X-Y” syntax where X < Y. For example:

# only profile PETs 0, 20, and 35 through 39
$ setenv ESMF_RUNTIME_PROFILE_PETLIST "0 20 35-39"

When used in conjunction with the SUMMARY option above, the summarized profile will only aggregate over the specified set of PETs. The one exception is that PET 0 is always profiled if ESMF_RUNTIME_PROFILE=ON, regardless of the ESMF_RUNTIME_TRACE_PETLIST setting.

52.2.4 Include MPI Communication in the Profile

MPI functions can be included in the timing profile to indicate how much time is spent inside communication calls. This can also help to determine load imbalance in the system, since large times spent inside MPI may indicate that communication between PETs is not tightly synchronized. This option includes all MPI calls in the application, whether or not they originate from the ESMF library. Here is a partial example summary profile that contains MPI times:

Region                           PETs   Count    Mean (s)    Min (s)     Min PET Max (s)     Max PET
  [esm] RunPhase1                8      1        4.9307      4.6867      0       4.9656      1
    [OCN] RunPhase1              8      1824     0.8344      0.8164      0       0.8652      1
    [MED] RunPhase1              8      1824     0.8203      0.7900      5       0.8584      1
    [ATM] RunPhase1              8      1824     0.6387      0.6212      5       0.6610      1
    [ATM-TO-MED] RunPhase1       8      1824     0.5975      0.5317      0       0.6583      5
      MPI_Bcast                  8      1824     0.0443      0.0025      4       0.1231      5
      MPI_Wait                   8      MULTIPLE 0.0421      0.0032      0       0.0998      2
    [MED-TO-OCN] RunPhase1       8      1824     0.4879      0.4497      0       0.5362      4
      MPI_Wait                   8      MULTIPLE 0.0234      0.0030      0       0.0821      4
      MPI_Bcast                  8      1824     0.0111      0.0024      4       0.0273      5
    [OCN-TO-MED] RunPhase1       8      1824     0.4541      0.4075      0       0.4918      4
      MPI_Wait                   8      MULTIPLE 0.0339      0.0017      0       0.0824      4
      MPI_Bcast                  8      1824     0.0194      0.0026      4       0.0452      6
    [MED-TO-ATM] RunPhase1       8      1824     0.4487      0.4005      0       0.4911      5
      MPI_Bcast                  8      1824     0.0338      0.0026      4       0.0942      5
      MPI_Wait                   8      MULTIPLE 0.0241      0.0022      1       0.0817      2
  [esm] Init 1                   8      1        0.6287      0.6287      1       0.6287      4
    [ATM-TO-MED] IPDv05p6b       8      1        0.1501      0.1500      1       0.1501      2
      MPI_Barrier                8      242      0.0082      0.0006      3       0.0157      7
      MPI_Wait                   8      MULTIPLE 0.0034      0.0010      0       0.0053      7
      MPI_Allreduce              8      62       0.0030      0.0003      3       0.0063      7
      MPI_Alltoall               8      6        0.0015      0.0000      1       0.0022      5
      MPI_Allgather              8      21       0.0010      0.0002      1       0.0017      7
      MPI_Waitall                8      MULTIPLE 0.0006      0.0001      3       0.0015      7
      MPI_Send                   8      MULTIPLE 0.0004      0.0001      7       0.0008      6
      MPI_Allgatherv             8      6        0.0001      0.0001      4       0.0001      0
      MPI_Scatter                8      5        0.0000      0.0000      0       0.0000      7
      MPI_Reduce                 8      5        0.0000      0.0000      1       0.0000      0
      MPI_Recv                   8      MULTIPLE 0.0000      0.0000      0       0.0000      3
      MPI_Bcast                  8      1        0.0000      0.0000      0       0.0000      7

The procedure for including MPI functions in the timing profile depends on whether the application is dynamically or statically linked. Most applications are dynamically linked, however on some systems (such as Cray), static linking may be used. Note that for either option, ESMF must be built with ESMF_TRACE_LIB_BUILD=ON, which is the default.

In dynamically linked applications, the LD_PRELOAD (Linux) or DYLD_INSERT_LIBRARIES (Darwin) environment variable must be used when executing the MPI application. This instructs the dynamic loader to interpose certain MPI symbols so they can be captured by the ESMF profiler. To simplify this process, a script is provided at $(ESMF_INSTALL_LIBDIR)/preload.sh that sets the appropriate variable.

For example, if you typically execute your application as as follows:

$ mpirun -np 8 ./myApp

then you should add the preload.sh script in front of the executable when starting the application as follows:

# replace $(ESMF_INSTALL_LIBDIR) with absolute path
# ... to the ESMF installation lib directory
$ mpirun -np 8 $(ESMF_INSTALL_LIBDIR)/preload.sh ./myApp

An advantage of this approach is that your application does not need to be recompiled. The MPI timing information will be included in the per-PET profiles and/or the summary profile, depending on the setting of environment variable ESMF_RUNTIME_PROFILE_OUTPUT.

Notice that an additional step is required for dynamically linked applications on Darwin systems with System Integrity Protection (SIP) enabled! In addition to using the $(ESMF_INSTALL_LIBDIR)/preload.sh script during launching of the executable as shown above, the executable must also be linked against the dynamic ESMF trace preload library. This must happen during the link step of the executable. It is most easily accomplished by using variable $(ESMF_F90ESMFPRELOADLINKLIBS) instead of the typical $(ESMF_F90ESMFLINKLIBS) variable for the final link command. Both variables are defined in the esmf.mk file that should be imported by the application Makefile. For example:

# import esmf.mk
include $(ESMFMKFILE)

# other makefile targets here...

# example final link command, with $(ESMF_F90ESMFPRELOADLINKLIBS)
myApp: myApp.o driver.o model.o
        $(ESMF_F90LINKER) $(ESMF_F90LINKOPTS) $(ESMF_F90LINKPATHS) \
        $(ESMF_F90LINKRPATHS) -o $@ $^ $(ESMF_F90ESMFPRELOADLINKLIBS)

In statically linked applications, the application must be re-linked with specific options provided to the linker. These options instruct the linker to wrap the MPI symbols with the ESMF profiling functions. The linking flags that must be provided are included in the esmf.mk Makefile fragment that is part of the ESMF installation. These link flags should be imported into your application Makefile, and included in the final link command. To do this, first import the esmf.mk file into your application Makefile. The path to this file is typically stored in the ESMFMKFILE environment variable. Then, pass the variables $(ESMF_TRACE_STATICLINKOPTS) and $(ESMF_TRACE_STATICLINKLIBS) to the final linking command. For example:

# import esmf.mk
include $(ESMFMKFILE)

# other makefile targets here...

# example final link command, with $(ESMF_TRACE_STATICLINKOPTS)
# ... and $(ESMF_TRACE_STATICLINKLIBS) added
myApp: myApp.o driver.o model.o
        $(ESMF_F90LINKER) $(ESMF_F90LINKOPTS) $(ESMF_F90LINKPATHS) \
        $(ESMF_F90LINKRPATHS) -o $@ $^ $(ESMF_F90ESMFLINKLIBS) \
        $(ESMF_TRACE_STATICLINKOPTS) $(ESMF_TRACE_STATICLINKLIBS)

This option will statically wrap all of the MPI functions and include them in the profile output. Execute the application in the normal way with the environment variable ESMF_RUNTIME_PROFILE set to ON. You will see the MPI functions included in the timing profile.

52.2.5 Output a Detailed Trace for Analysis

ESMF tracing is disabled by default. To enable tracing, set the ESMF_RUNTIME_TRACE environment variable to ON. You do not need to recompile your code to enable tracing.

# csh shell
$ setenv ESMF_RUNTIME_TRACE ON

# bash shell
$ export ESMF_RUNTIME_TRACE=ON

When enabled, the default behavior is to trace all PETs of the ESMF application. Although the ESMF tracer is designed to write events in a compact form, tracing can produce an extremely large number of events depending on the total number of PETs and the length of the run. To reduce output, it is possible to restrict the PETs that produce trace output by setting the ESMF_RUNTIME_TRACE_PETLIST environment variable. For example, this setting:

$ setenv ESMF_RUNTIME_TRACE_PETLIST "0 101 192-196"

will instruct the tracer to only trace PETs 0, 101, and 192 through 196 (inclusive). The syntax of this environment variable is to list PET numbers separated by spaces. PET ranges are also supported using the “X-Y” syntax where X < Y. For PET counts greater than 100, it is recommended to set this environment variable. The one exception is that PET 0 is always traced, regardless of the ESMF_RUNTIME_TRACE_PETLIST setting.

ESMF's profiling and tracing options can be used together. A typical use would be to set ESMF_RUNTIME_PROFILE=ON for all PETs to capture summary timings, and set ESMF_RUNTIME_TRACE=ON and ESMF_RUNTIME_TRACE_PETLIST to a subset of of PETs, such as the root PET of each ESMF component. This helps to keep trace sizes small while still providing timing summaries over all PETs.

When tracing is enabled, phase_enter and phase_exit events will automatically be recorded for all initialize, run, and finalize phases of all Components in the application. To trace only user-instrumented regions (via the ESMF_TraceRegionEnter() and ESMF_TraceRegionExit() calls), Component-level tracing can be turned off by setting:

$ setenv ESMF_RUNTIME_TRACE_COMPONENT OFF

After running an ESMF application with tracing enabled, a directory called traceout will be created in the run directory and it will contain a metadata file and an event stream file esmf_stream_XXXX for each PET with tracing enabled. Together these files form a valid CTF trace which may be analyzed with any of the tools listed above.

Trace events are flushed to file at a regular interval. If the application crashes, some of the most recent events may not be flushed to file. To maximize the number of events appearing in the trace, an option is available to flush events to file more frequently. Because this option may have negative performance implications due to increased file I/O, it is not recommended unless needed. To turn on eager flushing use:

$ setenv ESMF_RUNTIME_TRACE_FLUSH EAGER

52.2.6 Set the Clock used for Profiling/Tracing

There are three options for the kind of clock to use to timestamp events when profiling/tracing an application. These options are controlled by setting the environment variable ESMF_RUNTIME_TRACE_CLOCK.

REALTIME	The REALTIME clock timestamps events with the current time on the system. This is the default clock if the above environment variable is not set. This setting can be useful when tracing PETs that span multiple physical computing nodes assuming that the system clocks on each node are adequately synchronized. On most HPC systems, system clocks are periodically updated to stay in sync. A disadvantage of this clock is that periodic adjustments mean the clock is not monotonically increasing so some timings may be inaccurate if the system clock jumps forward or backward significantly. Testing has shown that this is not typically an issue on most systems.
MONOTONIC	The MONOTONIC clock is guaranteed to be monotonically increasing and does not suffer from periodic adjustments. The timestamps represent an amount of time since some arbitrary point in the past. There is no guarantee that these timestamps will be synchronized across physical computing nodes, so this option should only be used for tracing a set of PETs running on a single physical machine.
MONOTONIC_SYNC	The MONOTONIC_SYNC clock is similar to the MONOTONIC clock in that it is guaranteed to be monotonically increasing. In addition, at application startup, all PET clocks are synchronized to a common time by determining a PET-local offset to be applied to timestamps. Therefore this option can be used to compare trace streams across physical nodes.

52.3 Restrictions and Future Work

Limited types of trace events. Currently only a few trace event types are available. The tracer may be extended in the future to record additional types of events.
MPI call profing not available for statically linked executables on Darwin. Currently the linker on Darwin systems does not support the wrapping of symbols during static linking. In order to access MPI call profiling on Darwin, executables should be linked dynamically in combination with the procedure described in section 52.2.4.

52.4 Class API

53 Fortran I/O and System Utilities

53.1 Description

The ESMF Fortran I/O and System utilities provide portable methods to access capabilities which are often implemented in different ways amongst different environments. These utility methods are divided into three groups: command line access, Fortran I/O, and sorting.

Command line arguments may be accessed using three methods: ESMF_UtilGetArg() returns a given command line argument, ESMF_UtilGetArgC() returns a count of the number of command line arguments available. Finally, the ESMF_UtilGetArgIndex() method returns the index of a desired argument value, given its keyword name.

Two I/O methods are implemented: ESMF_IOUnitGet(), to obtain an unopened Fortran unit number within the range of unit numbers that ESMF is allowed to use, and ESMF_IOUnitFlush() to flush the I/O buffer associated with a specific Fortran unit.

Finally, the ESMF_UtilSort() method sorts integer, floating point, and character string data types in either ascending or descending order.

53.2 Use and Examples

53.2.1 Fortran unit number management

The ESMF_UtilIOUnitGet() method is provided so that applications using ESMF can remain free of unit number conflicts — both when combined with other third party code, or with ESMF itself. This call is typically used just prior to an OPEN statement:

  call ESMF_UtilIOUnitGet (unit=grid_unit, rc=rc)
  open (unit=grid_unit, file='grid_data.dat', status='old', action='read')

By default, unit numbers between 50 and 99 are scanned to find an unopened unit number.

Internally, ESMF also uses ESMF_UtilIOUnitGet() when it needs to open Fortran unit numbers for file I/O. By using the same API for both user and ESMF code, unit number collisions can be avoided.

When integrating ESMF into an application where there are conflicts with other uses of the same unit number range, such as when hard-coded unit number values are used, an alternative unit number range can be specified. The ESMF_Initialize() optional arguments IOUnitLower and IOUnitUpper may be set as needed. Note that IOUnitUpper must be set to a value higher than IOUnitLower, and that both must be non-negative. Otherwise ESMF_Initialize will return a return code of ESMF_FAILURE. ESMF itself does not typically need more than about five units for internal use.

  call ESMF_Initialize (..., IOUnitLower=120, IOUnitUpper=140)

All current Fortran environments have preconnected unit numbers, such as units 5 and 6 for standard input and output, in the single digit range. So it is recommended that the unit number range is chosen to begin at unit 10 or higher to avoid these preconnected units.

53.2.2 Flushing output

Fortran run-time libraries generally use buffering techniques to improve I/O performance. However output buffering can be problematic when output is needed, but is “trapped” in the buffer because it is not full. This is a common occurrance when debugging a program, and inserting WRITE statements to track down the bad area of code. If the program crashes before the output buffer has been flushed, the desired debugging output may never be seen — giving a misleading indication of where the problem occurred. It would be desirable to ensure that the output buffer is flushed at predictable points in the program in order to get the needed results. Likewise, in parallel code, predictable flushing of output buffers is a common requirement, often in conjunction with ESMF_VMBarrier() calls.

The ESMF_UtilIOUnitFlush() API is provided to flush a unit as desired. Here is an example of code which prints debug values, and serializes the output to a terminal in PET order:

  type(ESMF_VM) :: vm

  integer :: tty_unit
  integer :: me, npets

  call ESMF_Initialize (vm=vm, rc=rc)
  call ESMF_VMGet (vm, localPet=me, petCount=npes)

  call ESMF_UtilIOUnitGet (unit=tty_unit)
  open (unit=tty_unit, file='/dev/tty', status='old', action='write')
  ...
  call ESMF_VMBarrier (vm=vm)
  do, i=0, npets-1
    if (i == me) then
      write (tty_unit, *) 'PET: ', i, ', values are: ', a, b, c
      call ESMF_UtilIOUnitFlush (unit=tty_unit)
    end if
    call ESMF_VMBarrier (vm=vm)
  end do

53.3 Design and Implementation Notes

53.3.1 Fortran unit number management

When ESMF needs to open a Fortran I/O unit, it calls ESMF_IOUnitGet() to find an unopened unit number. As delivered, the range of unit numbers that are searched are between ESMF_LOG_FORTRAN_UNIT_NUMBER (normally set to 50), and ESMF_LOG_UPPER (normally set to 99.) Unopened unit numbers are found by using the Fortran INQUIRE statement.

When integrating ESMF into an application where there are conflicts with other uses of the same unit number range, an alternative range can be specified in the ESMF_Initialize() call by setting the IOUnitLower and IOUnitUpper arguments as needed. ESMF_IOUnitGet() will then search the alternate range of unit numbers. Note that IOUnitUpper must be set to a value higher than IOUnitLower, and that both must be non-negative. Otherwise ESMF_Initialize will return a return code of ESMF_FAILURE.

Fortran unit numbers are not standardized in the Fortran 90 Standard. The standard only requires that they be non-negative integers. But other than that, it is up to the compiler writers and application developers to provide and use units which work with the particular implementation. For example, units 5 and 6 are a defacto standard for “standard input” and “standard output” — even though this is not specified in the actual Fortran standard. The Fortran standard also does not specify which unit numbers can be used, nor does it specify how many can be open simultaneously.

Since all current compilers have preconnected unit numbers, and these are typically found on units lower than 10, it is recommended that applications use unit numbers 10 and higher.

53.3.2 Flushing output

When ESMF needs to flush a Fortran unit, the ESMF_IOUnitFlush() API is used to centralize the file flushing capability, because Fortran has not historically had a standard mechanism for flushing output buffers. Most compilers run-time libraries support various library extensions to provide this functionality — though, being non-standard, the spelling and number of arguments vary between implementations. Fortran 2003 also provides for a FLUSH statement which is built into the language. When possible, ESMF_IOUnitFlush() uses the F2003 FLUSH statement. With older compilers, the appropriate library call is made.

53.3.3 Sorting algorithms

The ESMF_UtilSort() algorithms are the same as those in the LAPACK sorting procedures SLASRT() and DLASRT(). Two algorithms are used. For small sorts, arrays with 20 or fewer elements, a simple Insertion sort is used. For larger sorts, a Quicksort algorithm is used.

Compared to the original LAPACK code, a full Fortran 90 style interface is supported for ease of use and enhanced compile time checking. Additional support is also provided for integer and character string data types.

53.4 Utility API

esmf_support@ucar.edu