Codes for a semi implicit barotropic mode solver #422

hyungyukang · 2020-01-09T03:39:42Z

Adds semi-implicit barotropic mode solver.

mark-petersen · 2020-01-09T15:32:59Z

@hyungyukang Thank you for your effort on this. Please also link or attach any design documents or testing results you already have. For example, you could link to the talk you gave at the last E3SM meeting.

hyungyukang · 2020-01-09T23:22:34Z

@hyungyukang Thank you for your effort on this. Please also link or attach any design documents or testing results you already have. For example, you could link to the talk you gave at the last E3SM meeting.

I'll do that soon. Thanks, @mark-petersen!

hyungyukang · 2020-03-24T18:54:28Z

Uploading a slide file summarizing this work. A bit revised from my AGU talk last December.
Kang_A Semi-implicit Barotropic Mode Solver for the MPAS-Ocean.pdf

mark-petersen · 2020-05-27T12:35:40Z

@hyungyukang I am able to work on this now. My apologies for the delay.

It looks like commit e89de64ad removed all the files in the repo, and 687f11ac3 added them back again. At the command line, git diff --stat and wc are handy for this:

git diff --stat 3979482ff e89de64ad | wc -l
1764

What were you trying to do with those two commits? We need to reset back to 3979482 and add just the changes you intended. One way to do that is to use a unix cp or tar on the files you want, then
git reset --hard 3979482ff
and copy them back into place again, commit, and git push --force back to this branch.

mark-petersen · 2020-06-27T04:00:48Z

Rebased. For my own reference, I compiled lapack (https://github.com/Reference-LAPACK/lapack) on LANL IC as follows:

git clone git@github.com:Reference-LAPACK/lapack.git
cd lapack
mkdir build/lapack_gnu/lib
cd build
module load gcc/5.3.0 openmpi/1.10.5 cmake
cmake -DCMAKE_INSTALL_LIBDIR=/usr/projects/climate/mpeterse/repos/lapack/build/lapack_gnu/lib ..
cmake --build . -j --target install
ls lapack_gnu/lib
    cmake  libblas.a  liblapack.a  pkgconfig

Then to compile:

export LAPACK=/usr/projects/climate/mpeterse/repos/lapack/build/lapack_gnu
make $COMPILER CORE=ocean OPENMP=false DEBUG=true GEN_F90=true USE_LAPACK=true

Some test cases run using semi-implicit, but some result in a seg fault with an invalid memory reference. Looking into it now...

mark-petersen · 2020-06-29T11:48:04Z

Fixed invalid memory reference. A flag name had changed. Not passing restart tests yet.

mark-petersen · 2020-06-29T12:25:06Z

@hyungyukang, did you test the semi-implicit solver with restarts? Most of my tests work when config_do_restart = .false. but in restart tests with config_do_restart = .true., where the initial condition is read in from a restart file, it dies with

CRITICAL ERROR: Iteration number exceeds Max. #iteration: PROGRAM STOP

In this example, the full run goes 4 timesteps from time 0 to 8 hours (2hr/time step) and completes successfully. The restart run reads in the restart file at time step 2 (4 hours) and goes to 8 hours, and should be bfb identical to the full run.

Is there a variable that you need added to the restart file? Or some other condition that could be missing for a restart? Perhaps on restart init we have to populate a variable used for the initial guess for the implicit solver?

Makefile

hyungyukang · 2020-06-29T13:04:24Z

@mark-petersen, thanks for reviewing the codes. Actually I've never tested the solver using config_do_restart = .false.. I just checked the same problem.
But when I run the model with config_time_integrator = 'split_explicit', the model also showed error:
ERROR: Warning: Sea surface height is outside of acceptable physical range, i.e. abs(sum(h)-bottomDepth)>20m. iCell: 1, maxLevelCell(iCell): 55, bottomDepth(iCell): 4222.79159790825, sum(h): 0.00000000000000

I'm currently running these cases - QU240, EC60to80, and RRS30to10. Are these test cases working with config_do_restart = .true.? If not, can you please recommend some other test cases?

mark-petersen · 2020-06-29T15:57:06Z

You need to run the code forward and write restart files, and then restart from there. I can send a test case. That SSH warning is because the model is reading in all zeros, because no restart file is available.

hyungyukang · 2020-06-29T16:00:49Z

You need to run the code forward and write restart files, and then restart from there. I can send a test case. That SSH warning is because the model is reading in all zeros, because no restart file is available.

@mark-petersen, got it. I'll run it today. I'll let you know when I find problems as soon as possible.

mark-petersen · 2020-06-29T16:14:51Z

@hyungyukang I just sent you a zip file of our nightly regression suite, with all the correct flags for your rebased PR. unzip it, then

cd ocean/global_ocean/QU240/restart_test/

and point ocean_model to your executable in each subdirectory.

You can use run.py or cd into full_run and then restart_run and run each of those separately. The run.py in the restart_test directory compares if you get bit-for-bit matching solutions at the end of the full run and restart run.

hyungyukang · 2020-06-29T16:18:39Z

@hyungyukang I just sent you a zip file of our nightly regression suite, with all the correct flags for your rebased PR. unzip it, then
cd ocean/global_ocean/QU240/restart_test/
and point ocean_model to your executable in each subdirectory.

You can use run.py or cd into full_run and then restart_run and run each of those separately. The run.py in the restart_test directory compares if you get bit-for-bit matching solutions at the end of the full run and restart run.

@mark-petersen, just received it. Thanks!

mark-petersen · 2020-06-29T16:23:38Z

I'll let you test it. When the restart run is not identical to the original run, the most likely cause is that you have a variable or setting in memory that is not saved for the restart. In that case, you need to add that variable to

Registry.xml
1503         <stream name="restart"

list just below that.

Another possibility is that you are missing something in your init routine, like initializing variables for your implicit method. The original run (no restart) initializes the normalVelocity to zero everywhere, so it can hide an error of something you might need to initialize on every simulation start, not just the restart ones.

hyungyukang · 2020-06-30T22:10:29Z

@mark-petersen , I just fixed the bugs for restart run. Can you please test it?

hyungyukang · 2020-07-01T13:30:15Z

@mark-petersen , I've made some changes for the SI code based on what I've tested in other branch.
One thing you have to change before run the model is namelist.ocean. I create a new section and changed some in Registry.xml.

Since explicit-subcycling and semi-implicit schemes are sharing these options (config_n_ts_iter, config_n_bcl_iter_beg, config_n_bcl_iter_mid, config_n_bcl_iter_end), so I took them out from &split_explicit_ts and put them in &split_timestep_share which is the new namelist section. Ideas on this would be welcome!

/
&split_timestep_share
    config_n_ts_iter = 2
    config_n_bcl_iter_beg = 1
    config_n_bcl_iter_mid = 2
    config_n_bcl_iter_end = 2
/
&split_explicit_ts
    config_btr_dt = '0000_00:00:15'
    config_n_btr_cor_iter = 2
    config_vel_correction = .true.
    config_btr_subcycle_loop_factor = 2
    config_btr_gam1_velWt1 = 0.5
    config_btr_gam2_SSHWt1 = 1.0
    config_btr_gam3_velWt2 = 1.0
    config_btr_solve_SSH2 = .false.
/
&semi_implicit_ts
    config_btr_si_preconditioner = 'ras'
    config_btr_si_tolerance = 1.0e-9
    config_n_btr_si_outer_iter = 2
/

And now the SI code passes the restart run test:

Beginning variable comparisons for all time levels of field 'temperature'. Note any time levels reported are 0-based.
    Pass thresholds are:
       L1: 0.00000000000000e+00
       L2: 0.00000000000000e+00
       L_Infinity: 0.00000000000000e+00
 ** PASS Comparison of temperature between full_run/output.nc and
    restart_run/output.nc
Beginning variable comparisons for all time levels of field 'salinity'. Note any time levels reported are 0-based.
    Pass thresholds are:
       L1: 0.00000000000000e+00
       L2: 0.00000000000000e+00
       L_Infinity: 0.00000000000000e+00
 ** PASS Comparison of salinity between full_run/output.nc and
    restart_run/output.nc
Beginning variable comparisons for all time levels of field 'layerThickness'. Note any time levels reported are 0-based.
    Pass thresholds are:
       L1: 0.00000000000000e+00
       L2: 0.00000000000000e+00
       L_Infinity: 0.00000000000000e+00
 ** PASS Comparison of layerThickness between full_run/output.nc and
    restart_run/output.nc
Beginning variable comparisons for all time levels of field 'normalVelocity'. Note any time levels reported are 0-based.
    Pass thresholds are:
       L1: 0.00000000000000e+00
       L2: 0.00000000000000e+00
       L_Infinity: 0.00000000000000e+00
 ** PASS Comparison of normalVelocity between full_run/output.nc and

mark-petersen · 2020-07-01T18:33:14Z

@hyungyukang this is great progress. I just ran our nightly regression suite with gnu debug and optimized, but with openmp off. I compile with

make gfortran CORE=ocean OPENMP=false DEBUG=true

and get

./nightly_ocean_test_suite.py
 ** Running case Baroclinic Channel 10km - Default Test
      PASS
 ** Running case Global Ocean 240km - Init Test
      PASS
 ** Running case Global Ocean 240km - Performance Test
      PASS
 ** Running case Global Ocean 240km - Restart Test
      PASS
 ** Running case Global Ocean 240km - Thread Test
      PASS
 ** Running case Global Ocean 240km - Analysis Test
      PASS
 ** Running case Global Ocean 240km - BGC Ecosys Test
      PASS
 ** Running case ZISO 20km - Smoke Test
      PASS
 ** Running case ZISO 20km - Smoke Test with frazil
      PASS
 ** Running case Baroclinic Channel 10km - Thread Test
      PASS
 ** Running case Baroclinic Channel 10km - Decomp Test
   ** FAIL (See case_outputs/Baroclinic_Channel_10km_-_Decomp_Test for more information)
 ** Running case Baroclinic Channel 10km - Restart Test
      PASS
 ** Running case sub-ice-shelf 2D - restart test
   ** FAIL (See case_outputs/sub-ice-shelf_2D_-_restart_test for more information)
 ** Running case Periodic Planar 20km - LIGHT particle region reset test
      PASS
 ** Running case Periodic Planar 20km - LIGHT particle time reset test
      PASS

First, thank you for fixing the restart, it now works! That is great.
Let's not worry about the sub-ice-shelf yet.
We do need to worry about the fail on the Baroclinic Channel 10km - Decomp Test. That means that if you run with 4 versus 8 cores (say), that they are not bfb with each other. There are two possible causes:

There is place where we need to run a calculation on more halo layers (nCells = nCellsArray( 2 ), nEdges = nEdgesArray( size(nEdgesArray) ), etc).
a halo update needs to be added somewhere

I will make a QU240 that tests partitions as well and send it to you. I don't have time to debug it much this week, but you could try. The general method is, make sure you also get a non-bfb match between 4 vs 8 cores (say), but a bfb match with split explicit.

Then in your semi-implicit code, set your nCells and nEdges variables are maxed out, i.e.
nCells = nCellsArray( size(nCellsArray) ), nEdges = nEdgesArray( size(nEdgesArray) ). Making them large is safer. Once we have bfb match across partitions, we can make them smaller for efficiency. You may get an error with it all maxed out, and need to reduce it by one somewhere, if it is referencing a bad cell at the very edge that causes a problem.

If that does not produce a bfb match, then look through and guess where you might need a halo update, and add it. Again, you can put in lots extra and there is no harm done. Once you get a bfb match, we need to take out all the extra ones, because halo updates are communication and are very expensive.

hyungyukang · 2020-07-01T20:16:50Z

Thanks @mark-petersen for the results. I'll be waiting for the partition test and look into the baroclinic channel test as well!

mark-petersen · 2020-07-02T15:40:33Z

I just sent a tar file with a new global test case, comparing 4 vs 8 processors. To run it:

cd ocean/global_ocean/QU240/decomp_test/
./run.py
 * Running 4proc_run
  - Complete
 * Running 8proc_run
  - Complete
Beginning variable comparisons for all time levels of field 'temperature'. Note any time levels reported are 0-based.

and it shows a mismatch between the two. Run with split explicit and it should match. Give it a try, and you can explore the halo strategies I have above.

Add a new preprocessor directive (USE_LAPACK) around the calls to LAPACK routines. Also add OpenMP parallel regions to reflect recent threading PRs. Finally, remove calls to `mpas_ocn_get_config` and replace with `use ocn_config` module.

Update the log output when too many SI iterations occur. Previously, not all messages would be written to the log since MPAS_LOG_CRIT was passed prior to all log messages being written.

Modify time step size setting for SI code. Update USE_LAPACK flag in Makefile to provide broader compatibility

In this mode, the Jacobi preconditioner is used. Also, a simple single-precision allreduce is implemented to achieve BFB match across partitions. Later version, this BFB allreduce would be changed using better algorithms.

Codes for a semi implicit barotropic mode solver #422

mark-petersen · 2021-01-13T18:39:13Z

rebased. Testing...

mark-petersen · 2021-01-13T19:46:29Z

Passes mpas-o nightly regression suite with gnu debug and both split explicit and semi-implicit. Note that for semi-implicit, compile on grizzly with

export LAPACK=/usr/projects/climate/mpeterse/repos/lapack/build/lapack_gnu
source /usr/projects/climate/SHARED_CLIMATE/anaconda_envs/load_latest_compass.sh
module load gcc/5.3.0 openmpi/1.10.5 netcdf/4.4.1 parallel-netcdf/1.5.0 pio/1.7.2
make gfortran CORE=ocean USE_LAPACK=true

Adds ocean semi-implicit barotropic mode solver This brings in a new mpas-source submodule with changes only to the ocean core. It adds a semi-implicit barotropic solver for ocean time-stepping. Current default is split explicit and this new option is off by default, so this PR should be BFB. But it does require changes to the mpas-ocean Registry, so this PR also includes corresponding updates to its e3sm bld scripts. See MPAS-Dev/MPAS-Model#422 [NML] [BFB]

Codes for a semi implicit barotropic mode solver MPAS-Dev#422

hyungyukang changed the title ~~PR for a semi implicit barotropic mode solver~~ A semi implicit barotropic mode solver codes Jan 9, 2020

hyungyukang changed the title ~~A semi implicit barotropic mode solver codes~~ Codes for a semi implicit barotropic mode solver Jan 9, 2020

mark-petersen self-assigned this Jan 9, 2020

mark-petersen self-requested a review January 9, 2020 15:30

mark-petersen added Ocean performance labels Jan 9, 2020

hyungyukang force-pushed the semi_implicit branch from 773867f to 3979482 Compare May 27, 2020 13:01

mark-petersen force-pushed the semi_implicit branch from 3979482 to bf1dc40 Compare June 27, 2020 03:44

mark-petersen force-pushed the semi_implicit branch from bf1dc40 to 0d0d715 Compare June 29, 2020 11:47

mark-petersen reviewed Jun 29, 2020

View reviewed changes

Makefile Outdated Show resolved Hide resolved

mark-petersen force-pushed the semi_implicit branch from 0d0d715 to d92cf48 Compare June 29, 2020 14:03

hyungyukang force-pushed the semi_implicit branch from ce30963 to d92cf48 Compare June 30, 2020 22:04

mattdturner and others added 20 commits January 13, 2021 09:39

New preprocessor directive around LAPACK calls

51d8037

Add a new preprocessor directive (USE_LAPACK) around the calls to LAPACK routines. Also add OpenMP parallel regions to reflect recent threading PRs. Finally, remove calls to `mpas_ocn_get_config` and replace with `use ocn_config` module.

Add semi-implicit file to cmake

87cf318

Makefile modifications for new USE_LAPACK flag

fa719b6

Update SI crash log output

d239c2a

Update the log output when too many SI iterations occur. Previously, not all messages would be written to the log since MPAS_LOG_CRIT was passed prior to all log messages being written.

Update ocean.cmake to check for LAPACK

3775d9e

Fix error in ocean.cmake

e19fda1

Some modifications on SI code and Makefile

d52bba0

Modify time step size setting for SI code. Update USE_LAPACK flag in Makefile to provide broader compatibility

Modify SI code for E3SM tests and preconditioners

e1a1c82

add ncpus declaration

f176bc8

Add a thread-match mode for the SI solver

6f82630

In this mode, the Jacobi preconditioner is used. Also, a simple single-precision allreduce is implemented to achieve BFB match across partitions. Later version, this BFB allreduce would be changed using better algorithms.

Small cleaning for the thread match mode

26aee31

Small modifications for the thread match mode

54e3eb9

Change to config_btr_si_partition_match_mode, update

cff52a5

Remove unneeded compass changes.

36bcbf5

Roll back Makefile changes. Those will be on new PR to develop.

a6e9d6b

Registry white space corrections

12782fc

Registry typo on units

d36a31e

Add package semiImplicitTime

64e1eaa

Remove trailing white spaces

c1ad8da

Added back USE_LAPACK Makefile flag

87a1c21

mark-petersen force-pushed the semi_implicit branch from 6602b92 to 87a1c21 Compare January 13, 2021 18:21

mark-petersen added a commit that referenced this pull request Jan 13, 2021

Merge branch 'ocean/develop' into e3sm/develop

5c8b841

Codes for a semi implicit barotropic mode solver #422

mark-petersen merged commit 3d13880 into MPAS-Dev:ocean/develop Jan 14, 2021

mark-petersen added a commit to mark-petersen/MPAS-Model that referenced this pull request Mar 16, 2021

Merge branch 'ocean/develop' into e3sm/develop

7687088

Codes for a semi implicit barotropic mode solver MPAS-Dev#422

matthewhoffman added Ocean performance labels Mar 17, 2021

Codes for a semi implicit barotropic mode solver #422

Codes for a semi implicit barotropic mode solver #422

Uh oh!

Conversation

hyungyukang commented Jan 9, 2020 • edited by mark-petersen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-petersen commented Jan 9, 2020

Uh oh!

hyungyukang commented Jan 9, 2020

Uh oh!

hyungyukang commented Mar 24, 2020

Uh oh!

mark-petersen commented May 27, 2020

Uh oh!

mark-petersen commented Jun 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-petersen commented Jun 29, 2020

Uh oh!

mark-petersen commented Jun 29, 2020

Uh oh!

Uh oh!

hyungyukang commented Jun 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-petersen commented Jun 29, 2020

Uh oh!

hyungyukang commented Jun 29, 2020

Uh oh!

mark-petersen commented Jun 29, 2020

Uh oh!

hyungyukang commented Jun 29, 2020

Uh oh!

mark-petersen commented Jun 29, 2020

Uh oh!

hyungyukang commented Jun 30, 2020

Uh oh!

hyungyukang commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-petersen commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyungyukang commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-petersen commented Jul 2, 2020

Uh oh!

mark-petersen commented Jan 13, 2021

Uh oh!

mark-petersen commented Jan 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hyungyukang commented Jan 9, 2020 •

edited by mark-petersen

Loading

mark-petersen commented Jun 27, 2020 •

edited

Loading

hyungyukang commented Jun 29, 2020 •

edited

Loading

hyungyukang commented Jul 1, 2020 •

edited

Loading

mark-petersen commented Jul 1, 2020 •

edited

Loading

hyungyukang commented Jul 1, 2020 •

edited

Loading