Subsections


4 Processes

The ESMF development environment has several defining characteristics. First, both the ESMF Core Team and the JST are distributed. This makes incorporating simple, efficient communication mechanisms into the development process essential. Second, the JST and Core Team work on a range of different platforms, at sites that don't have the time, resources, or inclination to install demanding packages. Collaboration tools that require no purchase or installation before use are essential. Finally, ESMF is committed to open development. As much as possible, the ESMF team tries to keep the workings of the project - metrics, support and bug lists, schedules, task lists, source code, you name it - visible to the broad community.

4.1 Software Process Model

The ESMF software development cycle is based on the staged delivery model [#!mcconnell96!#]. The steps in this software development model are:

  1. Software Concept Collect and itemize the high-level requirements of the system and identify the basic functions that the system must perform.

  2. Requirements Analysis Write and review a requirements document - a detailed statement of the scientific and computational requirements for the software.

  3. Architectural Design Define a high-level software architecture that outlines the functions, relationships, and interfaces for major components. Write and review an architecture document.

  4. Stage 1, 2, ..., n Repeat the following steps creating a potentially releasable product at the end of each stage. Each stage produces a more robust, complete version of the software.

  5. Code Distribution and Maintenance Official public release of the software, beginning of maintenance phase.

We have customized and extended this standard model to suit the ESMF project. At this stage of ESMF development, we are in the iterative design/implement/release cycle. Below are a few notes on earlier stages.

4.2 ESMF Process History

4.2.1 Software Concept

Participants in the ESMF project completed the Software Concept stage in the process of developing a unified set of proposals. A summary of the high-level requirements of ESMF - a statement of project scope and vision - is included in the General Requirements part of the ESMF Requirements Document[#!bib:ESMFreqdoc!#]. This was a successful effort in defining the scope of the project and agreeing to an overall design strategy.

4.2.2 Requirements Analysis

The ESMF Team spent about six months at the start of the project producing the ESMF Requirements Document. This outlined the major ESMF capabilities necessary to meet project milestones and achieve project goals. The second part of the document was a detailed requirements specification for each functionality class included in the framework. This document also included a discussion of the process that was used to initially collect requirements. The Requirements Document was a useful reference for the development team, especially for new developers coming in from outside of the Earth science domain. However, as the framework matured, support requests and the Change Review Board process took precedence in defining development tasks and setting priorities. The Requirements Document is bundled with the ESMF source distribution through version 2; with version 3 it was removed.

4.2.3 Architectural Design

The project had difficulty with the Architecture Document. The comments received back on the completed work, informally and from a peer review body, indicated that the presentation of the document was ineffective at conveying how the ESMF worked. Although the document was full of detailed and complex diagrams, the terminology and diagrams were oriented to software engineers and were not especially scientist-friendly. The detailed diagrams also made the document difficult to maintain. This experience helped to guide the ESMF project towards more user-oriented documents, but it also left a gap in the documentation that has taken time to fill.

4.3 Ongoing Development

The following are processes the ESMF team is actively following. These guidelines apply to core team developers and outside contributors who will be checking code into the main ESMF repository.

All design and code reviews are telecons held with the JST. Telecons are scheduled with the Core Team Manager, put on the ESMF calendar on the home page of the ESMF website, and announced on the esmf_jst@ucar.edu list.

4.3.1 Telecon Etiquette

When you call in, it's nice to give your name at the first opportunity. Telecon hosts will make an effort to introduce people on the JST calls, especially first-timers. Please don't put the telecon on hold (we sometimes get telecon-stopping music or beeps this way).

Within a week or so after the telecon, the host (the developer if it's a design or code review) is expected to send out a summary to esmf_jst@ucar.edu with the date of the call, the participants, and notes or conclusions.

4.3.2 Design Reviews

  1. Introductory telecon(s). The point here is to scope out the problem at hand. These calls cover the following, as they apply.

    For these introductory discussions, any form of material is fine - diagrams, slides, plain text ramblings, lists of questions, ...

  2. Initial design review(s). The document presented should be in the format of the ESMF Reference Manual, either in plain text or in latex/ProTeX. This is so the document can be incorporated into project documentation after implementation. The initial review document should include at least the following sections:

    This step is iterated until developers and customers converge.

  3. Full telecon review(s). The developer should prepare the API specification using latex and ProTex following the conventions in the Reference Manual. Most of the Reference Manual section(s) for the new or modified class(es), including Class Options and Restrictions and Future Work, should be available at the time of this review. Diagrams should be ready here too.

  4. Use test case telecon review. For each major piece of functionality, a use test case is prepared in collaboration with customers and executed before release. The use test case is performed on a realistic number of processors and with realistic input data sets.

    It doesn't have to work (and probably won't) before it's reviewed, but it needs to work before the functionality appears in a release. The developer checks it into the top-level use_test_case directory on SourceForge and prepares a HTML page outlining it for the Test & Validation page on the ESMF website. Unlike unit and system tests, use test cases aren't distributed with the ESMF source.

4.3.3 Implementation and Test Before Internal Release

Code should be written in accordance with the interface specifications agreed to during design reviews and the coding conventions described in Section [*].

There is an internal release checklist on the Test & Validation page of the ESMF website that contains an exhaustive listing of develop and tester responsibilities for releases. For additional discussion of test and validation procedures, see Section [*].

The developer is responsible for working with the tester(s) to make sure that the following are present before an internal release:

4.3.4 Implementation and Test Before Public Release

There is a public release checklist on the Test & Validation page of the ESMF website that contains an exhaustive listing of develop and tester responsibilities for releases. For additional discussion of test and validation procedures, see Section [*].

Same as for internal release, plus:

4.3.5 Code Check-In

Developers are encouraged to check their changes into the repository as they complete them, as frequently as possible without breaking the existing code base.

  1. Both core and contributors should test on at least three compilers before commit.
  2. For core team developers, a mail should go out to esmf_core@ucar.edu before check-in for very large commits and for commits that will break the HEAD. For contributors a mail should go out to esmf_core@ucar.edu before ANY commit.
  3. No code commits should be made between 0.00 and 4:00 Mountain Time. During this time the regression tests scripts are checking out code and any commits will lead to inconsistent test results which are hard to interpret.
  4. Core team developers can be set up to receive email from GitHub for every check-in by subscribing to esmf_commits@ucar.edu.

To accomplish the first item on the list after a commit of source code, an email can be sent to esmftest@cgd,ucar.edu with the exact subject "Run_ESMF_Test_Build". The mailbox is checked every quarter hour on the quarter hour. This email initiates a test on pluto that builds and installs ESMF with four compilers: g95, gfortran, lahey, and nag, with ESMF_BOPT set to "g" and "O".

When the test is started an email with the subject "ESMF_Test_Builds_Pluto_started", is sent to esmf_core@ucar.edu, with a time stamp in the body of the message. If a test is already running, an email, with the subject "ESMF_Test_Builds_Pluto_not_started", is sent with "Test not started, a test is already running." in the body. The test that is running will run to completion, a new test will NOT be queued up. A new "Run_ESMF_Test_Build" email must be sent when the running test is completed.

4.3.6 Code Reviews

  1. All significant chunks of externally contributed code are reviewed by the JST. It's usual to do the code review after check-in. The code review should be scheduled with the Core Team Manager when the code is checked in, and the code review held before the next release.
  2. We also do code reviews with core team members, as desired/required by the JST.

4.3.7 Releases

The ESMF produces internal releases and public releases based on the schedule generated by the CRB. Every public release is preceded by an internal release three months prior, for the purpose of beta testing. During those three months, bugs may be fixed and documentation improved, but no new functionality may be added. Occasionally the Core Team releases an internal release that does not become a public release. This would happen, for example, when major changes are being made to ESMF and user input is needed for multiple preliminary versions of the software.

The Integrator tags new system versions with coherent changes prior to release. The tagging convention for public and internal releases is described in Section [*].

Prior to release all ESMF software is regression-tested on all platforms and interfaces. The Integrator is responsible for regression testing, identifying problems and notifying the appropriate developers, and collecting and sending out Release Notes and Known Bugs.

ESMF releases are announced on the esmf_jst@ucar.edu mailing list and are posted on the ESMF website. Source code is available for download from the ESMF website and from the main ESMF GitHub page.

4.3.8 Backups

The backup strategy for each entity of the ESMF project is as follows:

To conserve memory only the backup files for the current year and the prior year are retained. For years beyond the prior year, only 6 month backup files are retained i.e. for 2010 to 2012 of the ESMF cvs files are:

20100103.esmf-cvsroot.tar.gz
20100606.esmf-cvsroot.tar.gz
20110102.esmf-cvsroot.tar.gz
20110605.esmf-cvsroot.tar.gz
20120101.esmf-cvsroot.tar.gz
All of 1012 and 1013

Once a year in January, the backup files of the year before the prior year will be cleaned up. For example, In January 2014 all of backup files of 2012 and 2013 would be archived, so the 2012 backup files will be cleaned up and only 6 month backup files will be retained.


4.4 Testing and Validation

ESMF software is subject to the following tests:

  1. Unit tests, which are simple per-class tests.
  2. Testing Harness, parameter space spanning tests similar to the unit tests
  3. System tests, which generally involve inter-component interactions.
  4. Use test cases (UTCs), which are tests at realistic problem sizes (e.g., large data sets, processor counts, grids).
  5. Examples that range from simple to complex.
  6. Beta testing through preliminary releases.
Unit tests, system tests, and examples are distributed with the ESMF software. UTCs, because of their size, are stored and distributed separately. Tests are run nightly, following a weekly schedule, on a wide variety of platforms. Beta testing of ESMF software is done by providing an Internal Release to customers three months before public release.

The ESMF team keeps track of test coverage on a per-method basis. This information is on the Metrics page under the Development link on the navigation bar.

Testing information is stored on a Test and Validation web page, under the Development link on the ESMF web site. This web page includes:

The ESMF is designed to run on several target platforms, in different configurations, and is required to interoperate with many combinations of application software. Thus our test strategy includes the following.

4.4.1 Unit Tests

Each class in the framework is associated with a suite of unit tests. Typically the unit tests are stored in one file per class, and are located near the corresponding source code in a test directory. The framework make system will have an option to build and run unit tests. The user has the option of building either a "sanity check" type test or an exhaustive suite. The exhaustive tests include tests of many functionalities and a variety of valid and invalid input values. The sanity check tests are a minimum set of tests to indicate whether, for example, the software has been installed correctly. It is the responsibility of the software developer to write and execute the unit tests. Unit tests are distributed with the framework software.

To achieve adequate unit testing, developers shall attempt to meet the following goals.

4.4.1.1 Writing Unit Tests

Unit tests usually test a single argument of a method to make it easier to identify the bug when a unit test fails. There are several steps to writing a unit test. First, each unit test must be labeled with one of the following tags:

Note that when the NEX_UTest_Multi_Proc_Only or EX_UTest_Multi_Proc_Only tags are used, all the unit tests in the file must be labeled as such. You may not mix these tags with the other tags. In addition, verify that the makefile does not allow the unit tests with these tags to be run uni.

Second, a string is specified describing the test, for example:

	write(name, *) "Grid Destroy Test"
Third, a string to be printed when the test fails is specified, for example:
	write(failMsg, *) "Did not return ESMF_SUCCESS"
Fourth, the ESMF_Test subroutine is called to determine the test results, for example:
	call ESMF_Test((rc.eq.ESMF_SUCCESS), name, failMsg, result, ESMF_SRCLINE)
The following two tests are good examples of how unit tests should be written. The first test verify that getting the attribute count from a Field returns ESMF_SUCCESS, while the second verifies the attribute count is correct. These two tests could be combined into one with a logical AND statement when calling ESMF_Test, but breaking the tests up allows you to identify the source of the bug immediately.
      !------------------------------------------------------------------------
      !EX_UTest
      ! Getting Attrubute count from a Field
      call ESMF_FieldGetAttributeCount(f1, count, rc=rc)
      write(failMsg, *) "Did not return ESMF_SUCCESS"
      write(name, *) "Getting Attribute count from a Field "
      call ESMF_Test((rc.eq.ESMF_SUCCESS), name, failMsg, result, ESMF_SRCLINE)

      !------------------------------------------------------------------------
      !EX_UTest
      ! Verify Attribute Count Test
      write(failMsg, *) "Incorrect count"
      write(name, *) "Verify Attribute count from a Field "
      call ESMF_Test((count.eq.0), name, failMsg, result, ESMF_SRCLINE)

      !------------------------------------------------------------------------

Sometimes a unit test is written expecting a subset of the processors to fail the test. To handle this case, the unit test must verify results from each processor as in the unit test below:

    !------------------------------------------------------------------------
    !EX_UTest
    ! Verify that the rc is correct on all pets.
    write(failMsg, *) "Did not return FAILURE  on PET 1, SUCCESS otherwise"
    write(name, *) "Verify rc of a Gridded Component Test"
    if (localPet==1) then
      call ESMF_Test((rc.eq.ESMF_FAILURE), name, failMsg, result, ESMF_SRCLINE)
    else
      call ESMF_Test((rc.eq.ESMF_SUCCESS), name, failMsg, result, ESMF_SRCLINE)
    endif

    !------------------------------------------------------------------------

Some tests may require that a loop be written to verify multiple results. The following is an example of how a single tag, NEX_UTest, is used instead of a tag for each loop iteration.

 !-----------------------------------------------------------------------------
  !NEX_UTest
  write(name, *) "Verifying data in Array via Fortran array pointer access"
  write(failMsg, *) "Incorrect data detected"
  looptest = .true.
  do i = -12, -6
    j = i + 12 + lbound(fptr, 1)
    print *, fptr(j), fdata(i)
    if (fptr(j) /= fdata(i)) looptest = .false.
  enddo
  call ESMF_Test(looptest, name, failMsg, result, ESMF_SRCLINE)
  !-----------------------------------------------------------------------------

4.4.1.2 Analyzing unit test results

When unit test are run, a Perl script prints out the test results as shown in Section "Running ESMF Unit Tests" in the ESMF User's Guide. To print out the test results, the Perl script must determine the number of unit tests in each test file and the number of processors executing the unit test. It determines the number of tests by counting the EX_UTest, NEX_UTest, EX_UTest_Multi_Proc_Only, or NEX_UTest_Multi_Proc_Only tags in the test source file whichever is appropriate for the test being run. To determine the number of processors, it counts the number of "NUMBER_OF_PROCESSORS" strings in the unit test output Log file. The script then counts the number of PASS and FAIL strings in the test Log file. The Perl script first divides the number of PASS strings by the number of processors. If the quotient is not a whole number then the script concludes that the test crashed. If the quotient is a whole number, the script then divides the number of FAIL strings by the number of processors. The sum of the two quotients must equal the total number of tests, if not the test is marked as crashed.

4.4.1.3 Disabling unit tests

Sometimes in the software development process it becomes necessary to disable one or more unit tests. To disable a unit test, two lines need to be modified. First, the line calling "ESMF_Test" must be commented out. Second, the NEX_UTest, EX_UTest, NEX_UTest_Multi_Proc_Only and EX_UTest_Multi_Proc_Only tags must be modified so that they are not found by the Perl script that analyzes the test results. The recommended way to modify these tags is to replace the first underscore with "_disable_", thus NEX_UTest becomes NEX_disable_UTest.

4.4.1.4 Benchmarking Unit Tests

Benchmark testing is included in the ESMF regression tests to detect any unexpected change in the performance of the software. This capability is available to developers. Developers can run the unit tests and save their execution times to be used as a benchmark for future unit test runs.

The following section now appears in the output of "gmake info".

 
--------------------------------------------------------------
 * ESMF Benchmark directory and parameters *
ESMF_BENCHMARK_PREFIX:    ./DEFAULTBENCHMARKDIR
ESMF_BENCHMARK_TOLERANCE: 3%
ESMF_BENCHMARK_THRESHOLD_MSEC: 500
 
--------------------------------------------------------------

The steps for using the benchmarking test tool are as follows:

According to the default settings above, the benchmarking test will only analyze unit tests that run 500 msecs (ESMF_BENCHMARK_THRESHOLD_MSEC) or longer. If a unit test runs 3 percent (ESMF_BENCHMARK_TOLERANCE) or more beyond the benchmarked unit test, it will be flagged as failing the benchmark test. The developer may change these parameters as desired. The following is an example of the output of running "gmake run_unit_tests_benchmark":

 



The following unit tests with a threshold of 500 msecs. passed the 3% 
 tolerance benchmark test:

PASS: src/Infrastructure/DELayout/tests/ESMF_DELayoutWorkQueueUTest.F90
PASS: src/Infrastructure/Field/tests/ESMF_FieldCreateGetUTest.F90
PASS: src/Infrastructure/Field/tests/ESMF_FieldRegridCsrvUTest.F90
PASS: src/Infrastructure/Field/tests/ESMF_FieldRegridXGUTest.F90
PASS: src/Infrastructure/Field/tests/ESMF_FieldStressUTest.F90
PASS: src/Infrastructure/TimeMgr/tests/ESMF_CalRangeUTest.F90
PASS: src/Infrastructure/VM/tests/ESMF_VMBarrierUTest.F90
PASS: src/Infrastructure/VM/tests/ESMF_VMUTest.F90
PASS: src/Infrastructure/XGrid/tests/ESMF_XGridMaskingUTest.F90
PASS: src/Infrastructure/XGrid/tests/ESMF_XGridUTest.F90
PASS: src/Superstructure/Component/tests/ESMF_CompTunnelUTest.F90


The following unit tests with a threshold of 500 msecs. failed the 3% 
 tolerance benchmark test:

FAIL: src/Infrastructure/Field/tests/ESMF_FieldRegridUTest.F90
      Test elapsed time: 4331.446 msec.
      Benchmark elapsed time: 2958.47675 msec.
      Increase: 46.41%

FAIL: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleRegridUTest.F90
      Test elapsed time: 2051.05675 msec.
      Benchmark elapsed time: 1920.42125 msec.
      Increase: 6.8%

FAIL: src/Infrastructure/LogErr/tests/ESMF_LogErrUTest.F90
      Test elapsed time: 2986.40425 msec.
      Benchmark elapsed time: 2583.36775 msec.
      Increase: 15.6%



Found 167 exhaustive multi-processor unit tests files, of those with a 
 threshold of 500 msecs. 11 passed the 3% tolerance benchmark test, and 3 failed.

Benchmark install date: Thu Jun  4 13:26:55 MDT 2015

Note that only the unit tests that have an elapsed time of 500 msecs. or greater are listed. In addition, the date when the benchmark install was completed is displayed.

When a unit test run it benchmarked it is written to a directory such as "BENCHMARKDIR/test/testg/Darwin.gfortran.64.mpich2.default/". Therefore you can only compare unit tests elapsed between the identical configurations.

To implement the benchmarking tool, the unit tests were modified to record the elapsed time of each PET. The stdout file of each unit test has the following lines i.e.

ESMF_GridItemUTest.stdout: PET 0 Test Elapsed Time 5.7840000000000007 msec.
ESMF_GridItemUTest.stdout: PET 1 Test Elapsed Time 5.7259999999999982 msec.
ESMF_GridItemUTest.stdout: PET 2 Test Elapsed Time 6.6200000000000010 msec.
ESMF_GridItemUTest.stdout: PET 3 Test Elapsed Time 5.7190000000000021 msec.

The benchmarking tool uses the average of the four elapsed times to determine the test results since the elapsed times of each PET can vary.

4.4.2 Examples

The examples are written to help users understand a specific use of an ESMF capability. The examples appear as text in the ESMF Reference Manual, therefore care must be taken to insure that correct portions of the examples appear in the document. Latex tags have been created to designate which portions of the examples are visible in the document.

BOE and EOE are used between text describing the example. BOC and EOC are used between actual working code that appears in the Reference Manual. Below is an example of how the tags are used:

!-------------------------------- Example -----------------------------
!>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%
!BOE
!\subsubsection{Get Grid and Array and other information from a Field}
!\label{sec:field:usage:field_get_default}
!
!  A user can get the internal {\tt ESMF\_Grid} and {\tt ESMF\_Array}
!  from a {\tt ESMF\_Field}.  Note that the user should not issue any destroy command
!  on the retrieved grid or array object since they are referenced
!  from within the {\tt ESMF\_Field}. The retrieved objects should be used
!  in a read-only fashion to query additional information not directly
!  available through the {\tt ESMF\_FieldGet()} interface.
!
!EOE

!BOC
    call ESMF_FieldGet(field, grid=grid, array=array, &
        typekind=typekind, dimCount=dimCount, staggerloc=staggerloc, &
        gridToFieldMap=gridToFieldMap, &
        ungriddedLBound=ungriddedLBound, ungriddedUBound=ungriddedUBound, &
        totalLWidth=totalLWidth, totalUWidth=totalUWidth, &
        name=name, &
        rc=rc)
!EOC
    if(rc .ne. ESMF_SUCCESS) finalrc = ESMF_FAILURE
    print *, "Field Get Grid and Array example returned"

    call ESMF_FieldDestroy(field, rc=rc)
    if(rc .ne. ESMF_SUCCESS) finalrc = ESMF_FAILURE
!>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%>%

Note that any code or text that is not contained within the tag pairs does not appear in the Reference Manual.

Most examples can be run on multiple processors or a single processor. Those examples should have the tag, "ESMF_EXAMPLE" as a comment in the body of the example file. If the example can only run on multiple processors then use the tag, "ESMF_MULTI_PROC_EXAMPLE".

4.4.2.1 Disabling examples

When an example is removed from the makefile, the "ESMF_EXAMPLE" or "ESMF_MULTI_PROC_EXAMPLE" tags must be modified so that the example is not flagged as failed. The recommended way to modify these tags is to replace the first underscore with "_disable_", thus "ESMF_EXAMPLE" becomes "ESMF_disable_EXAMPLE".

4.4.3 System Tests

System tests are written to test functionality that spans several classes. The following areas should be addressed in system testing.

The system tester should issue a test log after each software release is tested, which is recorded on the Test and Validation web page. The test log shall include: a test ID number, a software release ID number, testing environment descriptions, a list of test cases executed, results, and any unexpected events.

4.4.3.1 Writing System Tests

System tests should contain the following sections:

Most system tests can be run on multiple processors or a single processor. Those system tests should have the tag, "ESMF_SYSTEM_TEST" as a comment in the body of the system test. If the system test can only run on multiple processors then use the tag, "ESMF_MULTI_PROC_SYSTEM_TEST".

At the end of the system it is recommended that the ESMF_TestGlobal subroutine be used to gather test results from all processors and print out a single PASS/FAIL message instead of individual PASS/FAIL messages from all the processors. After the test is written it must be documented on the ESMF Test & Validation web page:

http://www.earthsystemmodeling.org/developers/test/system/

4.4.3.2 Disabling system tests

When a system test is removed from the makefile, the "ESMF_SYSTEM_TEST" or "ESMF_MULTI_PROC_SYSTEM_TEST" tags must be modified so that the system test is not counted as failed. The recommended way to modify these tags is to replace the first underscore with "_disable_", thus ESMF_SYSTEM_TEST becomes ESMF_disable_SYSTEM_TEST.

4.4.4 Test Harness

The Test Harness is a highly configurable test control system for conducting thorough testing of the Regridding and Redistribution processes. The Test Harness consists of a single shared executable and a collection of customizable resource files that define an ensemble of test configurations tailored to each ESMF class. The Test Harness is integrated into the Unit test framework, enabling the Test Harness to be built and run as part of the Unit tests. The test results are reported to a single standard-out file which is located with the unit test results.

See section [*] for a complete discussion of the test harness.

4.4.4.1 Analyzing Test Harness results

When the Test Harness completes a run, the results from the ensemble of tests are reported in two ways. The first is analogous to the unit test reporting, since the test harness is run as part of the unit tests, a summary of the results are recorded just as with the unit tests. In addition to the standard unit test reporting, the test harness is also able to produce a human readable report. The report consists of a concise summary of the test configuration along with the test results. The test configuration is described in terms of the Field Taxonomy syntax and user provided strings. The intent is not to provide a exhaustive description of the test, but rather to provide a useful description of the failed tests.

Consider another example similar to the previous one, where two descriptor strings describing an ensemble regridding tests. The first uses the patch method and the second uses bilinear interpolation.

[ B1 G1; B2 G2 ] =P=> [ B1 G1; B2 G2 ] 
[ B1 G1; B2 G2 ] =B=> [ B1 G1; B2 G2 ]

Suppose the associated specifier files indicate that the source grid is rectilinear and is 100 X 50 in size. The destination grid is also rectilinear and is 80 X 20 in size. Both grids are block distributed in two ways, 1 X NPETS and NPETS X 1. And suppose that the first dimension of both the source and destination grids are periodic. If the test succeeds for the bilinear regridding, but fails for one of the patch regridding configurations, the reported results could look something like

SUCCESS: [B1 G1; B2 G2 ] =B=> [B1 G1; B2 G2 ] 
FAILURE: [B1{1} G1{100}+P; B2{npets} G2{50} ] =P=> [B1{1} G1{80}+P; B2{npets} G2{20} ] 
     failure at line 101 of test.F90
SUCCESS: [ B1{npets} G1{100} +P; B2{1} G2{50} ] =P=> [ B1{npets} G1{80}+P; B2{1} G2{20} ]

The report indicates that all the test configurations for the bilinear regridding are successful. This is indicated by the key word SUCCESS which is followed by the successful problem descriptor string. Since all of the tests in the first case pass, there is no need to include any of the specifier information. For the second ensemble of tests, one configuration passed, while the other failed. In this case, since there is a mixture of successes and failures, the report includes specifier information for all configurations to help indicate the source of the test failure. The supplemental information, while not a complete problem description since it lacks items such as the physical coordinates of the grid and the nature of the test field, includes information crucial to isolating the failed test.

4.4.5 Use Test Cases (UTCs)

Use Test Cases are problems of realistic size created to test the ESMF software. They were initiated when the ESMF team and its users saw that often ESMF capabilities could pass simple system tests but would fail out in the field, for real customer problems. UTCs have realistic processor counts, data set sizes, and grid and data array sizes. UTCs are listed on the Test & Validation page of the ESMF website. They are not distributed with the ESMF software; instead they are stored in a separate module in the main repository called use_test_cases.

4.4.6 Beta Testing

ESMF software is released in a beta form, as an Internal Release, three months before it is publicly released. This gives users a chance to test the software and report back any problems to support.

4.4.7 Automated Regression Tests

The purpose of regression testing is to reveal faults caused by new or modified software (e.g. side effects, incompatibility between releases, and bad bug fixes). Regression tests regularly exercise all interfaces of the code on all target platforms. The regression test results for the last two weeks can be found here. This web page provides a complete color-coded current view of the state of the trunk ESMF software, sorting options by platform or compiler are provided. A similiar test results web page for the branch is also available. Clicking on any of the cells will display the specific test report for that day. Hovering over the test name i.e., Blues gfortran, will reveal notes particular to that platform/compiler. Clicking on the test name, will take you to the home page of the platform.

The platforms that run the regression tests, email the test results to a server that updates the test results web page. A script checks for test reports every 15 minutes, and updates the web page. The time of the last update appears on the web page.

4.4.8 Investigating Test Failures

The regression test results web page provides a color-coded view of the state of the software. When a developer finds that a test fails on a particular platform with a particular compiler, sometimes the bug is readily identified and fixed. However other times the developer may want to know if the test fails on other platforms and if the failure is related to a compiler, mpi configuration or optimized/debug execution. The developer would need to click to all the cells of different platforms searching for the test results for that particular test.

A tool was created to allow the developers to query the test results for a specific test for a specific date, as long as it is within two weeks of the current date. The developer may send a query test results message to the following email address: esmftest@cgd.ucar.edu The subject of the email must be exactly "Test_Results_Query". The body of the email message must be "Test:" followed by the test name and "Date" followed by the desired date. The format must be a three letter month and a number. If the date is 2 digits, greater than 9, then insert one space between the month and date e.g. Apr 25. If the day is a single digit insert two spaces, between the month and day e.g. Apr 4.

Test:ESMF_FieldBundleSMMUTest.F90
Date:Feb  8
   or
Date Feb 28

This mail box is checked every quarter hour on the quarter hour, the results are emailed to:esmf_test@cgd.ucar.edu. The subject of the results email for this example would be:

        ESMF_FieldBundleSMMUTest.F90 test results for Feb  8

The body of the email would be as follows:

	ESMF_Blues_PGI:PASS: mvapich2/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Blues_PGI:PASS: mvapich2/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Blues_PGI:CRASHED: mpich3/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Blues_PGI:PASS: mpich3/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Blues_PGI:PASS: openmpi/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Blues_PGI:PASS: openmpi/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Discover_g95:PASS: mvapich2/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Discover_g95:PASS: mvapich2/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Haumea_g95:PASS: mpich2/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Haumea_g95:PASS: mpich2/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Haumea_g95:PASS: mvapich2/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Haumea_g95:PASS: mvapich2/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Pluto_g95:FAIL: mpich2/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Pluto_g95:FAIL: mpich2/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Pluto_g95:FAIL: mvapich2/g: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90
	ESMF_Pluto_g95:FAIL: mvapich2/O: src/Infrastructure/FieldBundle/tests/ESMF_FieldBundleSMMUTest.F90

Note that if the date of the query is the current day, the developer should query periodically during the day since the test results are being updated as platforms report their test results. If a test crashes it can be because another test hung and the test in question did not run.

Another instance where this tool is useful is when a developer adds a new test, after the nightly tests run, the developer can run a query to quickly see the test results.

4.4.9 Building the Documentation

As software development progresses, the documentation is updated, built and posted at http://earthsystemmodeling.org/docs/nightly/develop/dev_guide/

The documents are built daily in the early morning, the results of the builds are posted at http://earthsystemmodeling.org/doc/

These documents can be updated by the developers, by checking out the documents from the repository and submitting the edited files. To have the new version of the documents posted on the web, the developer must sent a request to the following email address: esmftest@cgd.ucar.edu. The subject of the email indicates which document to build and post. The following is the list of subjects that have been implemented:

4.4.10 Testing for Releases

We provide two types of tar files, the ESMF source and the shared libraries of the supported platforms. Consequently, there are two test procedures followed before placing the tar files on the ESMF download website.

The Source Code Test Procedure is followed on all the supported platforms for the particular release.

  1. Verify that the source code builds in both BOPT=g and BOPT=O.
  2. Verify that the ESMF_COUPLED_FLOW demonstration executes successfully.
  3. Verify that the unit tests run successfully, and that there are no NON-EXHAUSTIVE unit tests failures.
  4. Verify that all system tests run successfully.

The Shared Libraries Test Procedure is also followed on all supported platforms for a release.

  1. Change to the CoupledFlowEx directory and execute gmake. Verify that the demo runs successfully.
  2. Change to the CoupledFlowSrc directory and execute gmake then gmake run. Verify that the demo runs successfully.
  3. Change to the examples directory and execute gmake and gmake run. Verify that the example runs successfully.


4.5 User Support

4.5.1 Roles

The Advocate is the staff person assigned to a particular code e.g. GEOS-5. See section 2.1.1 for a full definition and list of responsibilities. The Handler is the staff person assigned to solve a support ticket. The Advocate and the Handler may be the same person or they may be different. See section 2.1.1 for complete definition and list of responsibilities.

4.5.2 Support Categories

New is a request that has not been replied to. Closed is a request that has been fixed to the user's satisfaction. Pending is a request that has been fixed to the Handler's satisfaction but has not yet been approved by the user.

4.5.3 Summary Work Flow

  1. Message received.
  2. The Integrator or in his absence the Support Lead, generates a GitHub issue.
  3. If the request contains more than one topic, then Integrator will open multiple tickets, one per topic. This can been done initially if obvious, or later if more research indicates it is necessary.

  4. Initial contact is made by:

  5. The Handler works to solve the tickets issues. He or she will communicate periodically with the ticket's originator and will keep the rest of the Core team informed on the tickets progress at the monthly ticket review meetings. Once the issue has been solved, the ticket will be marked pending by the Handler.

  6. At this point, the Handler contacts the originator to gauge their satisfaction with the solution. If the originator is satisfied, the ticket may be closed, and the mail folder on the IMAP server moved from Open to Closed by the Support Lead. If the customer does not respond, an attempt at contact will be made once a month for two months. If after this period, the originator still does not reply, a pending ticket may be closed with final notification to the originator.

4.5.4 General Guidelines for Handling Tickets

4.5.5 esmf_support@ucar.edu Mail Archives

The Support Lead manages the archive of esmf_support@ucar.edu email traffic and is responsible for the creation of ticket folders, component folders, and the proper placement of mail messages. The archive is located on the main CISL IMAP server and can be accessed by any Core member. Contact the Support Lead if you wish your local mail client enabled to view the archive. The IMAP archive will have the following appearance:

The following rules apply to the above:


4.5.6 INFO:Code (subject) mail messages

Advocates need to share the information they have received from their codes with the rest of the Core team. This will be done by sending an email to esmf_support@ucar.edu with a subject line labeled INFO: Code e.g. INFO: CCSM, INFO: GEOS-5. These messages will be filed on the IMAP server (see above section) under the code referenced. All information about a code that is general and not related to a specific support request will be archived in this manner.

4.5.7 freeCRM

A client relationship management tool (freeCRM http://www.freecrm.com) is being used to archive codes, their affiliated contacts, degree of componentization, issues, and applicable funding information if known. The following is a list of roles and responsibilities associated with this software:

4.5.8 Annual Code Contact

Once a year all codes in the freeCRM data base will be contacted in order to gauge their development progress, and to update our component metrics. This process will contain the following steps:

4.5.9 Dealing with Applications that use ESMF

More and more applications are being distributed with embedded ESMF interfaces. It may difficult to determine if a reported problem with one of these applications is related to an incorrect ESMF implementation, a true ESMF bug, or an issue within the parent model. The following are several definitions: The following are some guidelines for dealing with such Applications that use ESMF:

esmf_support@ucar.edu