-
Notifications
You must be signed in to change notification settings - Fork 389
Refactoring and GPU optimization of the ocean equation of state #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring and GPU optimization of the ocean equation of state #69
Conversation
|
@philipwjones thanks for your work on this. I would like to see this test: Start with QU240, run one step, write out full density field. Run that test before/after these changes. Take a point-wise diff and report max and average difference. That would be very convincing, and you may have already done it. Also, could say a bit about running on GPUs? Can MPAS-Ocean currently run on GPUs? I thought not, because we need to add OpenACC directives for that to happen. If yes, which parts of the code runs on the GPUs? |
|
Will get those numbers today - most changes were actually b4b, but one of the loop changes caused a small difference from sqrt, so must have triggered a different version of the library call. Regarding GPUs, we have a couple of different versions of the transport that are GPU-enabled, but we are still evaluating which version to use/commit. I've gotten pretty far along on vmix, but that's not quite ready yet either. We have other bits here and there. We added an OPENACC build flag to turn the GPU portions on, but those Makefile changes were not part of this commit since no one else has access to Summit and these changes will not be effective on Titan due to the higher transfer overhead on that machine. |
|
On Cori after one timestep: |
|
@philipwjones, could you add a blank line for line 2 of your commit message? Also, limit first line to 50 characters to appear on a concise one-liner. If you use you will see that a lack of a blank line 2 makes the whole message appear. You can use a |
|
A few notes for new code additions: Doug wrote a nice tool to help us write uniform code. It is here: it produces a new file that notifies you of 'delint' clean-up items, that sometimes cause a problem on odd compilers. These can also be helpful for brand new code: though it is best to clean up just the new code you are submitting, and not include white space alterations to lots of other code in this PR. |
mark-petersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested on wolf with gnu and intel, testing debug and optimized for both, using the MPAS-O nightly regression suite. Everything passed. Compared to head of ocean/develop, and max differences were order 1e-12, a few were 1e-9 after six time steps. This is acceptable.
|
For something non-bfb like this, we need to make an E3SM PR with this updated MPAS submodule and test it. I would say an EC60to30 G case for two years, compared to the current repo, would be fine. @jonbob, this would be a nice task for Kristin to work on. (As an aside, we also have new values for EC60to30 with dt=30 minutes and revised del2 & del4 from Mat that could be tested with that bunch). @philipwjones I am merging a bunch of bfb compass updates to ocean/develop this week. I'd like to merge this one after it is tested in E3SM. |
|
@mark-petersen - fixed the commit message. Also fixed an untyped constant that lint found and some trailing whitespace that lint also complained about. |
… routines. Changes include: - moving calculation of reference pressure to initialization - separated EOS calls to routines with and without expansion coefficients - merged some loops to reduce arrays to scalars - moved tracer modification to separate loop for optimization - changed loop limits to fixed nVertLevels rather than maxLevelCell - pushed extraction of pool variables up one call level - added OpenACC directives to enable computations on GPU - convert JM allowable T,S range to parameters - introduce enums for EOS types to reduce string manipulation - changed scalar pointers to scalar variables - corrections and updates to a number of comments - general code cleanup
|
@philipwjones I rebased and tested. Passes nightly regression suite on grizzly with gnu and intel, optimized and debug. Tests with nonlinear EOS do not match bit-for-bit with ocean/develop. Differ by max of 1e-9 after 6 time steps, which is expected. Can I merge and make PR into E3SM? |
|
@mark-petersen - fine by me |
Large refactoring and GPU optimization of the ocean equation of state routines. Minor performance improvements on KNL, traditional clusters, but enables significant GPU (>2x) improvement on Summit. Roundoff level differences with QU240 standalone tests on Cori and Summit. Did not check Jackett and McDougall reference values. Changes include: - moving calculation of reference pressure to initialization - separating EOS calls to routines with and without expansion coefficients - merged some loops to reduce arrays to scalars - moved tracer modification to separate loop for optimization - changed loop limits to fixed nVertLevels rather than maxLevelCell - pushed extraction of pool variables up one call level - added OpenACC directives to enable computations on GPU - convert JM allowable T,S range to parameters - introduce enums for EOS types to reduce string manipulation - changed scalar pointers to scalar variables - corrections and updates to a number of comments - general code cleanup
* ocean/develop: Revert "Merge PR #69 'ocean/EOSopt' into ocean/develop"
|
@philipwjones, this did not pass the tests on E3SM next branch, so we had to revert this merge in order to proceed with other pull requests. See comments here: It passed my G-case tests on edison, but it's failing these:
|
This reverts commit a1113be.
This is a re-merge of #69 plus a fix in #170. Large refactoring and GPU optimization of the ocean equation of state routines. Minor performance improvements on KNL, traditional clusters, but enables significant GPU (>2x) improvement on Summit. Roundoff level differences with QU240 standalone tests on Cori and Summit. Did not check Jackett and McDougall reference values. Changes include: moving calculation of reference pressure to initialization separating EOS calls to routines with and without expansion coefficients merged some loops to reduce arrays to scalars moved tracer modification to separate loop for optimization changed loop limits to fixed nVertLevels rather than maxLevelCell pushed extraction of pool variables up one call level added OpenACC directives to enable computations on GPU convert JM allowable T,S range to parameters introduce enums for EOS types to reduce string manipulation changed scalar pointers to scalar variables corrections and updates to a number of comments general code cleanup
Reduce threading barriers in ocean Update ocean BGC threading Update threading in ocean EOS allocates Revert "Revert "Merge PR #69 'ocean/EOSopt' into ocean/develop""
Reduce threading barriers in ocean Update ocean BGC threading Update threading in ocean EOS allocates Revert "Revert "Merge PR #69 'ocean/EOSopt' into ocean/develop""
* ocean/develop: Reduce threading barriers in ocean Update ocean BGC threading Update threading in ocean EOS allocates Revert "Revert "Merge PR #69 'ocean/EOSopt' into ocean/develop""
MPAS-Ocean threading and GPU update This updates OpenMP and OpenACC calls in MPAS-Ocean, including: * GPU optimization in EOS: MPAS-Dev/MPAS-Model#69 and MPAS-Dev/MPAS-Model#170 * Update ocean BGC threading: MPAS-Dev/MPAS-Model#152 * Reduce threading barriers: MPAS-Dev/MPAS-Model#194 It is not bfb due to change in order of operations. It should not be climate changing. Fixes #2860 [non-BFB]
MPAS-Ocean threading and GPU update This PR updates OpenMP and OpenACC calls in MPAS-Ocean, including: * GPU optimization in EOS: MPAS-Dev/MPAS-Model#69 and MPAS-Dev/MPAS-Model#170 * Update ocean BGC threading: MPAS-Dev/MPAS-Model#152 * Reduce threading barriers: MPAS-Dev/MPAS-Model#194 It is not bfb due to change in order of operations. It should not be climate changing. Fixes #2860 [non-BFB]
Large refactoring and GPU optimization of the ocean equation of state routines. Minor performance improvements on KNL, traditional clusters, but enables significant GPU (>2x) improvement on Summit. Roundoff level differences with QU240 standalone tests on Cori and Summit. Did not check Jackett and McDougall reference values. Changes include: - moving calculation of reference pressure to initialization - separating EOS calls to routines with and without expansion coefficients - merged some loops to reduce arrays to scalars - moved tracer modification to separate loop for optimization - changed loop limits to fixed nVertLevels rather than maxLevelCell - pushed extraction of pool variables up one call level - added OpenACC directives to enable computations on GPU - convert JM allowable T,S range to parameters - introduce enums for EOS types to reduce string manipulation - changed scalar pointers to scalar variables - corrections and updates to a number of comments - general code cleanup
This reverts commit a1113be.
This is a re-merge of MPAS-Dev#69 plus a fix in MPAS-Dev#170. Large refactoring and GPU optimization of the ocean equation of state routines. Minor performance improvements on KNL, traditional clusters, but enables significant GPU (>2x) improvement on Summit. Roundoff level differences with QU240 standalone tests on Cori and Summit. Did not check Jackett and McDougall reference values. Changes include: moving calculation of reference pressure to initialization separating EOS calls to routines with and without expansion coefficients merged some loops to reduce arrays to scalars moved tracer modification to separate loop for optimization changed loop limits to fixed nVertLevels rather than maxLevelCell pushed extraction of pool variables up one call level added OpenACC directives to enable computations on GPU convert JM allowable T,S range to parameters introduce enums for EOS types to reduce string manipulation changed scalar pointers to scalar variables corrections and updates to a number of comments general code cleanup
Reduce threading barriers in ocean Update ocean BGC threading Update threading in ocean EOS allocates Revert "Revert "Merge PR MPAS-Dev#69 'ocean/EOSopt' into ocean/develop""
This reverts commit a1113be.
Reduce threading barriers in ocean Update ocean BGC threading Update threading in ocean EOS allocates Revert "Revert "Merge PR MPAS-Dev#69 'ocean/EOSopt' into ocean/develop""
* Adds code to determine whether to use aerosols (nwfa and nifa) from RAP or the monthly climatology data. * Adds a config variable, config_tempo_rap, to the core_init_atmosphere Registry that must be set to true in the init_atmosphere namelist to use RAP aerosols. * Adds a logical function that check for config_tempo_rap = true and non-zero values of nwfa and nifa from RAP. This function will control whether to use RAP aerosols or the monthly climatology. * Modifies logic for creation of ICs/LBCs so that users can use aerosol information from the first guess field (e.g., RAP) instead of the monthly climatology. This aerosol information, nwfa and nifa, is needed for the aerosol-aware TEMPO microphysics. * Adds optional argument lbc_state to the use_rap_aerosols function Use_rap_aerosols will now check for existing aerosols in the initial conditions if lbc_state is not present. Otherwise, it will check for existing aerosols in lbc_state (lateral boundary condition data type) * Fix missing parentheses in if statement * Initialize tempo_aerosolaware package so that nwfa and nifa can be added to IC/LBC files. * Changes lbc code to use lbc-specific indexes when adding variables to lbc_state. * For qv, the lbc index is changed from index_qv to index_lbc_qv and state is changed to lbc_state * This is done for all LBC variables * Fixes a bug that caused an LBC variable (nwfa) to populate the incorrect variable name (lbc_nccn) since index_nwfa is NOT equal to index_lbc_nwfa * Fix typos
Large refactoring and GPU optimization of the ocean equation of state routines. Minor performance improvements on KNL, traditional clusters, but enables significant GPU (>2x) improvement on Summit. Roundoff level differences with QU240 standalone tests on Cori and Summit. Did not check Jackett and McDougall reference values.
Changes include: