Skip to content

Conversation

@amametjanov
Copy link
Contributor

@amametjanov amametjanov commented Feb 7, 2019

Update ocean BGC threading to make multi-threaded runs BFB with single-threaded runs in coupled E3SM cases.

Checked with E3SM test SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexp on latest maint-1.1 branch:

  • pure-MPI run on 20 nodes with 18 tasks/node, all components stacked
  • multi-threaded run on 20 nodes with 18x2 tasks x threads

Coupler history files cpl.hi.0001-01-02-00000.nc at the end of 1-day runs are BFB.

[BFB]

Fixes E3SM-Project/E3SM#1931

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary/duplicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private can be omitted, firstprivate is default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amametjanov could you explain all the changes that remove private(var1, var2,...)?

I thought we always need to declare indices within an openmp loop to be private, as in the removed line. It looks like my understand of this is incorrect. Doesn't each thread need it's own copy of index variables, so the private clause is required?

Copy link
Contributor Author

@amametjanov amametjanov Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of !$omp parallel default(firstprivate), all variables inside !$omp do regions are firstprivate by default: meaning each thread's copy/storage of a private variable inherits master thread's value. There is no need to make these variables here private, because their values are defined/initialized prior to the first use.

Overall, two possible scenarios in an omp-do region: def-before-use, use-before-def. If firstprivate variable var1 is defined before use:

  • private(var1) is redundant; var1 is overwritten anyway (this is the reason for removal above)

If firstprivate var1 is used before def:

  • private(var1) is a bug, intent was to inherit master thread's value

Instead of enumerating all locally defined variables in private clauses, I'd like to simplify and remove private clauses when possible. Otherwise, when a new variable is added to an omp-do region, it also needs to be added to the private clause; when a variable is removed from a region, it also needs to be removed from the private clause. This is hard to do/enforce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Let's make sure I understand - is that another way to say it:

  • In MPAS, threads are spawned upon initialization, and run throughout.
  • Upon entry to this subroutine, each thread declares its own set of local variables such as cell1, cell2, k, normalThicknessFluxSum, thicknessSum, so the memory space is unique to each thread.
  • At each loop, private is not needed because each thread already has its own copy.

But why do you say "each thread's copy of a private variable inherits master thread's value"? When does that occur - only on initialization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

The copying occurs just before entering an omp-do loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: omp master is needed here, because of the firstprivate default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: continuing & is missing on the second line:

!$omp do ... &
!$omp& ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implied barriers make this unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug. This will run ocn_tracer_*_compute single-threaded.

Copy link
Contributor Author

@amametjanov amametjanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maltrud, I added notes to bug fix updates.

@rljacob
Copy link

rljacob commented Mar 20, 2019

What is the status of this PR? An E3SM PR is waiting on it.

@jonbob
Copy link
Contributor

jonbob commented Mar 20, 2019

It's next on my list, but complicated by having to come in through mpas

Copy link
Contributor Author

@amametjanov amametjanov Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of !$omp parallel default(firstprivate), all variables inside !$omp do regions are firstprivate by default: meaning each thread's copy/storage of a private variable inherits master thread's value. There is no need to make these variables here private, because their values are defined/initialized prior to the first use.

Overall, two possible scenarios in an omp-do region: def-before-use, use-before-def. If firstprivate variable var1 is defined before use:

  • private(var1) is redundant; var1 is overwritten anyway (this is the reason for removal above)

If firstprivate var1 is used before def:

  • private(var1) is a bug, intent was to inherit master thread's value

Instead of enumerating all locally defined variables in private clauses, I'd like to simplify and remove private clauses when possible. Otherwise, when a new variable is added to an omp-do region, it also needs to be added to the private clause; when a variable is removed from a region, it also needs to be removed from the private clause. This is hard to do/enforce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example: temp_mask was added to the region, but not declared in the private clause. So private(cell1, cell2, CoriolisTerm, i, eoe) is now outdated/incomplete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I merged this locally and got conflicts with the previous local merge #170. I resolved the conflicts by using the version from #170 but removing all omp private statements from @philipwjones's EOS update.

@mark-petersen
Copy link
Contributor

@amametjanov could you rebase on the head of ocean/develop?

After that, I will make a temporary MPAS branch with #152, #170, #194 and make an E3SM branch for people to test with all these changes. That way we have the opportunity to edit these PRs while testing in E3SM. Any other threading or performance changes that are ready could also be tested.

To make multi-threaded runs BFB with single-threaded runs in coupled cases.
@amametjanov amametjanov force-pushed the azamat/ocean/fix-bgc-threading branch from 8189dd3 to d7476d5 Compare April 11, 2019 14:01
@amametjanov
Copy link
Contributor Author

Rebased

mark-petersen added a commit that referenced this pull request Apr 12, 2019
Update ocean BGC threading to make multi-threaded runs BFB with
single-threaded runs in coupled E3SM cases.

Checked with E3SM test
SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexp on
latest maint-1.1 branch:

pure-MPI run on 20 nodes with 18 tasks/node, all components stacked
multi-threaded run on 20 nodes with 18x2 tasks x threads
Coupler history files cpl.hi.0001-01-02-00000.nc at the end of 1-day
runs are BFB.

[BFB]

Fixes E3SM-Project/E3SM#1931
@mark-petersen mark-petersen self-requested a review April 23, 2019 20:15
Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passed all my tests for MPAS-O and E3SM. Just waiting in line for PR to proceed on E3SM next now.

@rljacob
Copy link

rljacob commented May 12, 2019

This fix needs to go to the maint-1.1 branch of E3SM which is not using the head of MPAS-Dev. Is there a branch here tracking the version of MPAS used in E3SM maint-1.1?

@jonbob
Copy link
Contributor

jonbob commented May 15, 2019

@rljacob - yes, we do have branches for each E3SM maint-1.X

@mark-petersen mark-petersen merged commit d7476d5 into MPAS-Dev:ocean/develop May 27, 2019
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request May 29, 2019
MPAS-Ocean threading and GPU update

This updates OpenMP and OpenACC calls in MPAS-Ocean, including:
* GPU optimization in EOS: MPAS-Dev/MPAS-Model#69 and MPAS-Dev/MPAS-Model#170
* Update ocean BGC threading: MPAS-Dev/MPAS-Model#152
* Reduce threading barriers: MPAS-Dev/MPAS-Model#194
It is not bfb due to change in order of operations. It should not be climate
changing.

Fixes #2860

[non-BFB]
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request May 30, 2019
MPAS-Ocean threading and GPU update

This PR updates OpenMP and OpenACC calls in MPAS-Ocean, including:
* GPU optimization in EOS: MPAS-Dev/MPAS-Model#69 and MPAS-Dev/MPAS-Model#170
* Update ocean BGC threading: MPAS-Dev/MPAS-Model#152
* Reduce threading barriers: MPAS-Dev/MPAS-Model#194
It is not bfb due to change in order of operations. It should not be climate
changing.

Fixes #2860

[non-BFB]
ashwathsv pushed a commit to ashwathsv/MPAS-Model that referenced this pull request Jul 21, 2020
…elop

Update ocean BGC threading to make multi-threaded runs BFB with
single-threaded runs in coupled E3SM cases.

Checked with E3SM test
SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexp on
latest maint-1.1 branch:

pure-MPI run on 20 nodes with 18 tasks/node, all components stacked
multi-threaded run on 20 nodes with 18x2 tasks x threads
Coupler history files cpl.hi.0001-01-02-00000.nc at the end of 1-day
runs are BFB.

[BFB]

Fixes E3SM-Project/E3SM#1931
mark-petersen added a commit to mark-petersen/MPAS-Model that referenced this pull request Jan 11, 2021
…elop

Update ocean BGC threading to make multi-threaded runs BFB with
single-threaded runs in coupled E3SM cases.

Checked with E3SM test
SMS.ne30_oECv3.BGCEXP_CNTL_CNPRDCTC_1850.anvil_intel.clm-bgcexp on
latest maint-1.1 branch:

pure-MPI run on 20 nodes with 18 tasks/node, all components stacked
multi-threaded run on 20 nodes with 18x2 tasks x threads
Coupler history files cpl.hi.0001-01-02-00000.nc at the end of 1-day
runs are BFB.

[BFB]

Fixes E3SM-Project/E3SM#1931
dustinswales pushed a commit to dustinswales/MPAS-Model that referenced this pull request Jul 31, 2025
* Updating MYNN-EDMF and adding bl_mynn_ess namelist option.

* Adding config_tempo_cldfra option used to alter mynn namelists option when running with prog sgs clouds.

* updating version number
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants