Skip to content

Ocean: Remove scratch allocates and calls to get_config#457

Merged
mark-petersen merged 36 commits intoMPAS-Dev:ocean/developfrom
mattdturner:ocean/remove_scratch_allocate
Aug 10, 2020
Merged

Ocean: Remove scratch allocates and calls to get_config#457
mark-petersen merged 36 commits intoMPAS-Dev:ocean/developfrom
mattdturner:ocean/remove_scratch_allocate

Conversation

@mattdturner
Copy link
Collaborator

@mattdturner mattdturner commented Feb 27, 2020

This PR replaces almost all calls to mpas_allocate_scratch_field in the ocean model with straight Fortran allocates. This is done as part of the MPAS framework rewrite, per @philipwjones .

This PR also replaces almost all calls to mpas_pool_get_config in subroutines with definitions in ocn_config via use ocn_config. This addresses #394 , and allows #406 to be closed without being merged.

There are still a few calls to mpas_allocate_scratch_field that cannot be replaced until more of the MPAS framework rewrite has been completed (namely removal of blocks).

This PR only changes the files in the shared subdirectory, not the mode_forward subdirectory.

@mattdturner
Copy link
Collaborator Author

Testing was done on Cori with the Intel compiler (ifort (IFORT) 19.0.3.199 20190206). Testing was done for both an optimized model and a debug model (with -O0 added to compilation flags). The Nightly test suite (with a few tests removed, such as the block tests which FAILed for the baseline) was run for both a baseline and the code in this PR:

Baseline - optimized

 ** Running case Global Ocean 240km - Init Test
      PASS
 ** Running case Global Ocean 240km - Performance Test
      PASS
 ** Running case Global Ocean 240km - Restart Test
      PASS
 ** Running case Global Ocean 240km - Analysis Test
      PASS
 ** Running case ZISO 20km - Smoke Test
      PASS
 ** Running case ZISO 20km - Smoke Test with frazil
      PASS
 ** Running case Baroclinic Channel 10km - Thread Test
      PASS
 ** Running case Baroclinic Channel 10km - Decomp Test
      PASS
 ** Running case Baroclinic Channel 10km - Restart Test
      PASS
 ** Running case sub-ice-shelf 2D - restart test
      PASS
TEST RUNTIMES:
11:31 Baroclinic_Channel_10km_-_Decomp_Test
11:10 Baroclinic_Channel_10km_-_Restart_Test
06:27 Baroclinic_Channel_10km_-_Thread_Test
16:35 Global_Ocean_240km_-_Analysis_Test
12:22 Global_Ocean_240km_-_Init_Test
05:23 Global_Ocean_240km_-_Performance_Test
16:18 Global_Ocean_240km_-_Restart_Test
13:33 ZISO_20km_-_Smoke_Test
05:18 ZISO_20km_-_Smoke_Test_with_frazil
24:44 sub-ice-shelf_2D_-_restart_test
Total runtime 123:21

Baseline - debug

 ** Running case Global Ocean 240km - Init Test
      PASS
 ** Running case Global Ocean 240km - Performance Test
      PASS
 ** Running case Global Ocean 240km - Restart Test
      PASS
 ** Running case Global Ocean 240km - Analysis Test
      PASS
 ** Running case ZISO 20km - Smoke Test
   ** FAIL (See case_outputs/ZISO_20km_-_Smoke_Test for more information)
 ** Running case ZISO 20km - Smoke Test with frazil
   ** FAIL (See case_outputs/ZISO_20km_-_Smoke_Test_with_frazil for more information)
 ** Running case Baroclinic Channel 10km - Thread Test
   ** FAIL (See case_outputs/Baroclinic_Channel_10km_-_Thread_Test for more information)
 ** Running case Baroclinic Channel 10km - Decomp Test
   ** FAIL (See case_outputs/Baroclinic_Channel_10km_-_Decomp_Test for more information)
 ** Running case Baroclinic Channel 10km - Restart Test
   ** FAIL (See case_outputs/Baroclinic_Channel_10km_-_Restart_Test for more information)
 ** Running case sub-ice-shelf 2D - restart test
   ** FAIL (See case_outputs/sub-ice-shelf_2D_-_restart_test for more information)
TEST RUNTIMES:
01:35 Baroclinic_Channel_10km_-_Decomp_Test
01:28 Baroclinic_Channel_10km_-_Restart_Test
01:40 Baroclinic_Channel_10km_-_Thread_Test
17:41 Global_Ocean_240km_-_Analysis_Test
13:00 Global_Ocean_240km_-_Init_Test
04:17 Global_Ocean_240km_-_Performance_Test
17:45 Global_Ocean_240km_-_Restart_Test
11:48 ZISO_20km_-_Smoke_Test
02:00 ZISO_20km_-_Smoke_Test_with_frazil
01:50 sub-ice-shelf_2D_-_restart_test
Total runtime 73:04

This PR - optimized

 ** Running case Global Ocean 240km - Init Test
      PASS
 ** Running case Global Ocean 240km - Performance Test
   ** FAIL (See case_outputs/Global_Ocean_240km_-_Performance_Test for more information)
 ** Running case Global Ocean 240km - Restart Test
   ** FAIL (See case_outputs/Global_Ocean_240km_-_Restart_Test for more information)
 ** Running case Global Ocean 240km - Analysis Test
   ** FAIL (See case_outputs/Global_Ocean_240km_-_Analysis_Test for more information)
 ** Running case ZISO 20km - Smoke Test
   ** FAIL (See case_outputs/ZISO_20km_-_Smoke_Test for more information)
 ** Running case ZISO 20km - Smoke Test with frazil
   ** FAIL (See case_outputs/ZISO_20km_-_Smoke_Test_with_frazil for more information)
 ** Running case Baroclinic Channel 10km - Thread Test
      PASS
 ** Running case Baroclinic Channel 10km - Decomp Test
      PASS
 ** Running case Baroclinic Channel 10km - Restart Test
      PASS
 ** Running case sub-ice-shelf 2D - restart test
   ** FAIL (See case_outputs/sub-ice-shelf_2D_-_restart_test for more information)
TEST RUNTIMES:
11:31 Baroclinic_Channel_10km_-_Decomp_Test
11:08 Baroclinic_Channel_10km_-_Restart_Test
06:31 Baroclinic_Channel_10km_-_Thread_Test
17:27 Global_Ocean_240km_-_Analysis_Test
12:08 Global_Ocean_240km_-_Init_Test
04:24 Global_Ocean_240km_-_Performance_Test
16:54 Global_Ocean_240km_-_Restart_Test
00:07 ZISO_20km_-_Smoke_Test
00:07 ZISO_20km_-_Smoke_Test_with_frazil
00:31 sub-ice-shelf_2D_-_restart_test
Total runtime 80:48

This PR - debug

 ** Running case Global Ocean 240km - Init Test
      PASS
 ** Running case Global Ocean 240km - Performance Test
      PASS
 ** Running case Global Ocean 240km - Restart Test
      PASS
 ** Running case Global Ocean 240km - Analysis Test
   ** FAIL (See case_outputs/Global_Ocean_240km_-_Analysis_Test for more information)
 ** Running case ZISO 20km - Smoke Test
   ** FAIL (See case_outputs/ZISO_20km_-_Smoke_Test for more information)
 ** Running case ZISO 20km - Smoke Test with frazil
   ** FAIL (See case_outputs/ZISO_20km_-_Smoke_Test_with_frazil for more information)
 ** Running case Baroclinic Channel 10km - Thread Test
   ** FAIL (See case_outputs/Baroclinic_Channel_10km_-_Thread_Test for more information)
 ** Running case Baroclinic Channel 10km - Decomp Test
   ** FAIL (See case_outputs/Baroclinic_Channel_10km_-_Decomp_Test for more information)
 ** Running case Baroclinic Channel 10km - Restart Test
   ** FAIL (See case_outputs/Baroclinic_Channel_10km_-_Restart_Test for more information)
 ** Running case sub-ice-shelf 2D - restart test
   ** FAIL (See case_outputs/sub-ice-shelf_2D_-_restart_test for more information)
TEST RUNTIMES:
01:37 Baroclinic_Channel_10km_-_Decomp_Test
01:36 Baroclinic_Channel_10km_-_Restart_Test
01:45 Baroclinic_Channel_10km_-_Thread_Test
19:58 Global_Ocean_240km_-_Analysis_Test
12:37 Global_Ocean_240km_-_Init_Test
04:26 Global_Ocean_240km_-_Performance_Test
18:23 Global_Ocean_240km_-_Restart_Test
00:20 ZISO_20km_-_Smoke_Test
00:19 ZISO_20km_-_Smoke_Test_with_frazil
00:08 sub-ice-shelf_2D_-_restart_test
Total runtime 61:09

@mattdturner
Copy link
Collaborator Author

The debug failures in both baseline and this PR for ZISO, Baroclinic Channel, and sub-ice-shelf all have the following error report:

forrtl: severe (408): fort: (7): Attempt to use pointer REDIKAPPA when it is not associated with a target

Image              PC                Routine            Line        Source
ocean_model        0000000003251CE6  Unknown               Unknown  Unknown
ocean_model        0000000001F47AB6  ocn_tendency_mp_o         869  mpas_ocn_tendency.F
ocean_model        0000000001BBB1BB  ocn_time_integrat        1504  mpas_ocn_time_integration_split.F
ocean_model        0000000001B84B98  ocn_time_integrat         110  mpas_ocn_time_integration.F
ocean_model        0000000001B846FD  ocn_forward_mode_         613  mpas_ocn_forward_mode.F
libiomp5.so        00002AAAADCDC9F3  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        00002AAAADC9D5B2  __kmp_fork_call       Unknown  Unknown
libiomp5.so        00002AAAADC5DD60  __kmpc_fork_call      Unknown  Unknown
ocean_model        0000000001B8310B  ocn_forward_mode_         611  mpas_ocn_forward_mode.F
ocean_model        0000000001B7AC73  ocn_core_mp_ocn_c         111  mpas_ocn_core.F
ocean_model        0000000000417C74  mpas_subdriver_mp         347  mpas_subdriver.F
ocean_model        0000000000412CA7  MAIN__                     16  mpas.F
ocean_model        0000000000412C12  Unknown               Unknown  Unknown
libc-2.26.so       00002AAAAE208F8A  __libc_start_main     Unknown  Unknown
ocean_model        0000000000412B2A  Unknown               Unknown  Unknown

The optimized failures for Global Ocean in this PR are a result of non-bfb results. However, the fact that the debug tests are bfb suggests that the non-bfb for the optimized tests are a result of additional optimization that can be done by the compiler as a result of the changes in this PR.

@mattdturner
Copy link
Collaborator Author

mattdturner commented Apr 1, 2020

I just rebased to resolve the merge conflicts (and fixed a bug in the new code)

@philipwjones
Copy link
Contributor

philipwjones commented Apr 1, 2020 via email

Copy link
Contributor

@philipwjones philipwjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed bit-for-bit in debug mode and apparently round-off level changes in optimized version. Using PGI on Summit in QU240 configuration. Also visual inspection and earlier testing by @mattdturner . Performance improvement in the noise - disappointing since it had a larger effect in tracer advection - but will aid further threading improvements later.

@mark-petersen
Copy link
Contributor

@mattdturner I'm rebasing, testing, and merging this week. Since this PR touches so many parts of the code, I plan to repackage this into three smaller PRs:

  • analysis_members
  • init_mode
  • remainder (may split further if there are problems)

I will leave this PR in tact, as the diffs will show the changes remaining against ocean/develop branch as I merge the smaller PRs.

Do you mind if I squash your commits on this PR, and then make separate commits for each of those items?

The code will remain identical, but it allows me to cherry-pick the new commits. There may just be a few lines to separate in Registry.xml, but otherwise it should be a clean separation. It's easier for me to do it, since I'm testing at the same time.

@mattdturner
Copy link
Collaborator Author

Do you mind if I squash your commits on this PR, and then make separate commits for each of those items?

That's not a problem. Let me know if there is anything I can do to help.

@mark-petersen mark-petersen force-pushed the ocean/remove_scratch_allocate branch from 83bbe1f to ab76eac Compare April 27, 2020 15:47
@mark-petersen mark-petersen self-assigned this May 19, 2020
mattdturner added a commit to mattdturner/MPAS-Model that referenced this pull request Jun 23, 2020
Merge the changes in PR#457 for mpas_ocn_diagnostics.F that remove
scratch allocates and the calls to mpas_pool_get_config.
mark-petersen added a commit that referenced this pull request Jul 14, 2020
…develop

Remove scratch allocates and config. Part 2: Init mode #553

Part 2 of #457.
mark-petersen added a commit that referenced this pull request Jul 14, 2020
… ocean/develop

Remove scratch allocates and config. Part 1: analysis members #539

Part 1 of #457.
@mattdturner mattdturner force-pushed the ocean/remove_scratch_allocate branch from ab76eac to 4de1b4d Compare July 15, 2020 00:29
@mark-petersen
Copy link
Contributor

mark-petersen commented Jul 15, 2020

@mattdturner thanks for the updates. I ran these through the MPAS-Ocean nightly regression suite on grizzly. For a straight test (no comparisons) I get:

  1. gnu debug: PASS on all tests
  2. gnu optimized: PASS on all tests
  3. intel 17 debug: PASS on all tests
  4. intel 17 optimized: PASS on all tests except Global_Ocean_240km_-_BGC_Ecosys_Test (dies during run, probably init mode creates a bad file)
  5. intel 19 debug: PASS on all tests
  6. intel 19 optimized: PASS on all tests

I then compared to commit 82eb734 on ocean/develop, before these changes (just before PR #539 and #553). I get:

  1. gnu optimized: PASS all comparisons, except Global_Ocean_240km_-_Analysis_Test, mismatch in variable tThreshMLD (update: this mismatch disappeared in latest testing. I'm not sure why.)
  2. intel 17 optimized: PASS all comparisons, except Global_Ocean_240km_-_BGC_Ecosys_Test fails to run
  3. intel 19 optimized: comparisons FAIL on most of the tests. (Update: these are all machine precision at 1e-13)

Since the comparisons vary by compiler, I'm guessing these are differences in optimizations in that intel 19 test. I'll compare debug next.

@mark-petersen
Copy link
Contributor

@mattdturner I've been doing more testing. First, I compared the repo between 82eb734 (before PR #539 and #553) to f0865d7 (current head, just after those two PRs). It passes the nightly regression suite and is bfb between before/after on all three compilers, optimized (gnu, intel17, intel19). So it looks like everything is OK with the current head of ocean/develop.

I then reran with the current head of this PR, 4de1b4d. On intel 19, the failed bfb comparison variables are all 1e-13, so that comparison is OK. I'll keep testing on those other couple of problems above.

@mark-petersen
Copy link
Contributor

With that last commit I can pass the Global_Ocean_240km_-_BGC_Ecosys_Test. The atmospheric pressure from init mode was not specified. Some compilers set it to zero, and some set it to random stuff. So I forced it to be zero.

@mark-petersen mark-petersen force-pushed the ocean/remove_scratch_allocate branch from 20ce297 to e0877d6 Compare August 4, 2020 23:10
@mark-petersen mark-petersen merged commit ceac09d into MPAS-Dev:ocean/develop Aug 10, 2020
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Aug 11, 2020
…(PR #3597)

Ocean: Reduce pointer retrievals in shared directory

This PR brings in a new mpas-source submodule with changes only to the ocean
core. These changes are intended to improve performance by reducing pointer
retrievals. This includes scratch allocates and config pointers. We are taking
a staged approach, so this PR only alters the src/core_ocean/shared directory.
See similar changes for analysis and init in previous PR, #3717. Here, we:
* replace almost all calls to mpas_allocate_scratch_field in the ocean
  model with straight Fortran allocates. This is done as part of the MPAS
  framework rewrite, per @philipwjones; and
* replace almost all calls to mpas_pool_get_config in subroutines with
  definitions in ocn_config via use ocn_config.
This PR is labelled as non-BFB, because there could be machine-precision changes
that accumulate. In MPAS-Ocean stand-alone testing, comparisons compiled in
intel debug, gnu debug, and gnu optimized are all BFB.  Comparisons with intel
optimized had small differences.

See MPAS-Dev/MPAS-Model#457 by @mattdturner

[non-BFB]
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Aug 12, 2020
Ocean: Reduce pointer retrievals in shared directory

This PR brings in a new mpas-source submodule with changes only to the ocean
core. These changes are intended to improve performance by reducing pointer
retrievals. This includes scratch allocates and config pointers. We are taking
a staged approach, so this PR only alters the src/core_ocean/shared directory.
See similar changes for analysis and init in previous PR, #3717. Here, we:
* replace almost all calls to mpas_allocate_scratch_field in the ocean
  model with straight Fortran allocates. This is done as part of the MPAS
  framework rewrite, per @philipwjones; and
* replace almost all calls to mpas_pool_get_config in subroutines with
  definitions in ocn_config via use ocn_config.
This PR is labelled as non-BFB, because there could be machine-precision changes
that accumulate. In MPAS-Ocean stand-alone testing, comparisons compiled in
intel debug, gnu debug, and gnu optimized are all BFB.  Comparisons with intel
optimized had small differences.

See MPAS-Dev/MPAS-Model#457 by @mattdturner

[non-BFB]
mattdturner added a commit that referenced this pull request Sep 1, 2020
…n/develop

This was removed in #447 but inadvertently got added back in in #457.

It should be the first step toward addressing E3SM-Project/E3SM#3797
mark-petersen added a commit to mark-petersen/MPAS-Model that referenced this pull request Jan 11, 2021
…rs' into ocean/develop

Remove scratch allocates and config. Part 1: analysis members MPAS-Dev#539

Part 1 of MPAS-Dev#457.
mark-petersen pushed a commit to mark-petersen/MPAS-Model that referenced this pull request Jan 11, 2021
…nto ocean/develop

This was removed in MPAS-Dev#447 but inadvertently got added back in in MPAS-Dev#457.

It should be the first step toward addressing E3SM-Project/E3SM#3797
@mattdturner mattdturner deleted the ocean/remove_scratch_allocate branch March 16, 2021 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants