-
Notifications
You must be signed in to change notification settings - Fork 388
Update ocean BGC threading #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ocean BGC threading #152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary/duplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private can be omitted, firstprivate is default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amametjanov could you explain all the changes that remove private(var1, var2,...)?
I thought we always need to declare indices within an openmp loop to be private, as in the removed line. It looks like my understand of this is incorrect. Doesn't each thread need it's own copy of index variables, so the private clause is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of !$omp parallel default(firstprivate), all variables inside !$omp do regions are firstprivate by default: meaning each thread's copy/storage of a private variable inherits master thread's value. There is no need to make these variables here private, because their values are defined/initialized prior to the first use.
Overall, two possible scenarios in an omp-do region: def-before-use, use-before-def. If firstprivate variable var1 is defined before use:
private(var1)is redundant;var1is overwritten anyway (this is the reason for removal above)
If firstprivate var1 is used before def:
private(var1)is a bug, intent was to inherit master thread's value
Instead of enumerating all locally defined variables in private clauses, I'd like to simplify and remove private clauses when possible. Otherwise, when a new variable is added to an omp-do region, it also needs to be added to the private clause; when a variable is removed from a region, it also needs to be removed from the private clause. This is hard to do/enforce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Let's make sure I understand - is that another way to say it:
- In MPAS, threads are spawned upon initialization, and run throughout.
- Upon entry to this subroutine, each thread declares its own set of local variables such as
cell1, cell2, k, normalThicknessFluxSum, thicknessSum, so the memory space is unique to each thread. - At each loop,
privateis not needed because each thread already has its own copy.
But why do you say "each thread's copy of a private variable inherits master thread's value"? When does that occur - only on initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
The copying occurs just before entering an omp-do loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: omp master is needed here, because of the firstprivate default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: continuing & is missing on the second line:
!$omp do ... &
!$omp& ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug
src/core_ocean/shared/mpas_ocn_gm.F
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implied barriers make this unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug. This will run ocn_tracer_*_compute single-threaded.
amametjanov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maltrud, I added notes to bug fix updates.
|
What is the status of this PR? An E3SM PR is waiting on it. |
|
It's next on my list, but complicated by having to come in through mpas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of !$omp parallel default(firstprivate), all variables inside !$omp do regions are firstprivate by default: meaning each thread's copy/storage of a private variable inherits master thread's value. There is no need to make these variables here private, because their values are defined/initialized prior to the first use.
Overall, two possible scenarios in an omp-do region: def-before-use, use-before-def. If firstprivate variable var1 is defined before use:
private(var1)is redundant;var1is overwritten anyway (this is the reason for removal above)
If firstprivate var1 is used before def:
private(var1)is a bug, intent was to inherit master thread's value
Instead of enumerating all locally defined variables in private clauses, I'd like to simplify and remove private clauses when possible. Otherwise, when a new variable is added to an omp-do region, it also needs to be added to the private clause; when a variable is removed from a region, it also needs to be removed from the private clause. This is hard to do/enforce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example: temp_mask was added to the region, but not declared in the private clause. So private(cell1, cell2, CoriolisTerm, i, eoe) is now outdated/incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I merged this locally and got conflicts with the previous local merge #170. I resolved the conflicts by using the version from #170 but removing all omp private statements from @philipwjones's EOS update.
|
@amametjanov could you rebase on the head of ocean/develop? After that, I will make a temporary MPAS branch with #152, #170, #194 and make an E3SM branch for people to test with all these changes. That way we have the opportunity to edit these PRs while testing in E3SM. Any other threading or performance changes that are ready could also be tested. |
To make multi-threaded runs BFB with single-threaded runs in coupled cases.
8189dd3 to
d7476d5
Compare
|
Rebased |
Update ocean BGC threading to make multi-threaded runs BFB with single-threaded runs in coupled E3SM cases. Checked with E3SM test SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexp on latest maint-1.1 branch: pure-MPI run on 20 nodes with 18 tasks/node, all components stacked multi-threaded run on 20 nodes with 18x2 tasks x threads Coupler history files cpl.hi.0001-01-02-00000.nc at the end of 1-day runs are BFB. [BFB] Fixes E3SM-Project/E3SM#1931
mark-petersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This passed all my tests for MPAS-O and E3SM. Just waiting in line for PR to proceed on E3SM next now.
|
This fix needs to go to the maint-1.1 branch of E3SM which is not using the head of MPAS-Dev. Is there a branch here tracking the version of MPAS used in E3SM maint-1.1? |
|
@rljacob - yes, we do have branches for each E3SM maint-1.X |
MPAS-Ocean threading and GPU update This updates OpenMP and OpenACC calls in MPAS-Ocean, including: * GPU optimization in EOS: MPAS-Dev/MPAS-Model#69 and MPAS-Dev/MPAS-Model#170 * Update ocean BGC threading: MPAS-Dev/MPAS-Model#152 * Reduce threading barriers: MPAS-Dev/MPAS-Model#194 It is not bfb due to change in order of operations. It should not be climate changing. Fixes #2860 [non-BFB]
MPAS-Ocean threading and GPU update This PR updates OpenMP and OpenACC calls in MPAS-Ocean, including: * GPU optimization in EOS: MPAS-Dev/MPAS-Model#69 and MPAS-Dev/MPAS-Model#170 * Update ocean BGC threading: MPAS-Dev/MPAS-Model#152 * Reduce threading barriers: MPAS-Dev/MPAS-Model#194 It is not bfb due to change in order of operations. It should not be climate changing. Fixes #2860 [non-BFB]
…elop Update ocean BGC threading to make multi-threaded runs BFB with single-threaded runs in coupled E3SM cases. Checked with E3SM test SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexp on latest maint-1.1 branch: pure-MPI run on 20 nodes with 18 tasks/node, all components stacked multi-threaded run on 20 nodes with 18x2 tasks x threads Coupler history files cpl.hi.0001-01-02-00000.nc at the end of 1-day runs are BFB. [BFB] Fixes E3SM-Project/E3SM#1931
…elop Update ocean BGC threading to make multi-threaded runs BFB with single-threaded runs in coupled E3SM cases. Checked with E3SM test SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexp on latest maint-1.1 branch: pure-MPI run on 20 nodes with 18 tasks/node, all components stacked multi-threaded run on 20 nodes with 18x2 tasks x threads Coupler history files cpl.hi.0001-01-02-00000.nc at the end of 1-day runs are BFB. [BFB] Fixes E3SM-Project/E3SM#1931
* Updating MYNN-EDMF and adding bl_mynn_ess namelist option. * Adding config_tempo_cldfra option used to alter mynn namelists option when running with prog sgs clouds. * updating version number
Update ocean BGC threading to make multi-threaded runs BFB with single-threaded runs in coupled E3SM cases.
Checked with E3SM test
SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexpon latestmaint-1.1branch:Coupler history files
cpl.hi.0001-01-02-00000.ncat the end of 1-day runs are BFB.[BFB]
Fixes E3SM-Project/E3SM#1931