Skip to content

Remove private attribute for pointers in threaded framework for PGI#449

Merged
mark-petersen merged 1 commit intoMPAS-Dev:developfrom
amametjanov:azamat/framework/pgi-cpr-omp-workaround
Jun 11, 2020
Merged

Remove private attribute for pointers in threaded framework for PGI#449
mark-petersen merged 1 commit intoMPAS-Dev:developfrom
amametjanov:azamat/framework/pgi-cpr-omp-workaround

Conversation

@amametjanov
Copy link
Contributor

@amametjanov amametjanov commented Feb 11, 2020

Available PGI versions on Summit (18.7, 18.10, 19.9, 19.10, 20.1) either lead to build-time or run-time errors without this update. This adds CPRPGI preprocessor macro for PGI compiler build targets and the macro removes pointers from private clause of omp do workshare regions in src/framework/mpas_dmpar.F. With this update, all available PGI versions compile and run without any errors. Additional details of error messages are in comments of MPAS-Dev/MPAS-Model#449 PR.

Fixes E3SM-Project/E3SM#3236, E3SM-Project/E3SM#3396

[bit-for-bit]

@mgduda mgduda self-requested a review February 11, 2020 22:35
@mgduda
Copy link
Contributor

mgduda commented Feb 11, 2020

Quoting @philipwjones on E3SM PR 3396: "This is not likely to be an acceptable solution but can be a local mod if people need a workaround near-term."

I think a little more analysis is needed to justify the changes in this PR going onto the develop branch.

@mgduda
Copy link
Contributor

mgduda commented Feb 11, 2020

For what it's worth, there seems to be a similar issue in the flang compiler; see flang Issue 627. I think newer versions of the PGI compiler may be using LLVM, so this is perhaps not a coincidence.

@philipwjones
Copy link
Contributor

@mgduda thanks for the ref to flang . This error/bug also appeared in an earlier PGI release but was fixed for a while. Now appears to be back. Regarding my statement that it might not be an acceptable solution, that referred to the fact that in the ocean all variables are default private (first private), so the extra private declaration here is probably redundant. But I didn't know what other components used as default in parallel regions. If it is also the case there, this mod is probably b4b for others too. What is the convention in the atm? Any chance you could run a quick check? I can try and look at other components. Since we need to use PGI on Summit for all GPU work, this is currently inhibiting that work. We'll keep pressing on the compiler side too.

@mark-petersen mark-petersen self-assigned this May 26, 2020
@mark-petersen
Copy link
Contributor

@philipwjones and @amametjanov sorry I missed this. What is the status?

@philipwjones
Copy link
Contributor

@mark-petersen PGI has reproduced the problem and it had been sent to engineers, but they warned the fix might not happen soon. Meanwhile, it's still broken on Summit. Might be nice to still have the short-term workaround in.

@mark-petersen mark-petersen self-requested a review May 26, 2020 16:38
@mark-petersen
Copy link
Contributor

@mgduda could you also review this with the PR bunch today? The makefile flag does not affect others, and is a way for us to run with the PGI compiler and threading until PGI fixes the root problem.

The -DCPRPGI is added here to compiler sets pgi, pgi-llnl, pgi-nersc.

@mark-petersen
Copy link
Contributor

@mgduda can we merge this?

@mgduda
Copy link
Contributor

mgduda commented Jun 2, 2020

Sorry, all, for holding this up. I have no objections, and I'll leave an approving review; but first, I think it would be good if the merge commit message (which is usually taken to be the original description of the PR) contained a more detailed description of the problem that this PR is addressing, how the changes in the PR fix the problem, etc. So, I'd recommend describing:

  1. What is the error this PR fixes (including a compiler error message, if applicable)?
  2. Which versions of the PGI compiler are known to be broken (and which are known to work)?
  3. How the error is fixed (in this case, with the addition of a new preprocessing macro in all pgi build targets, which activates different code in such-and-such code module)?

@mgduda
Copy link
Contributor

mgduda commented Jun 2, 2020

@amametjanov The description of the PR right now states that it "Remove[s] private attribute for pointers", but it looks like the entire private clause has been removed. As a non-expert in OpenMP, isn't it problematic that non-pointer, non-loop variables like nAdded are now no longer private? Are additional modifications needed in other cores (besides the ocean core) to make all variables default private?

@mark-petersen
Copy link
Contributor

@amametjanov if you could edit the first comment on this PR to include more detail and address these comments, then we can merge it in. I always use that first comment as the merge message.

@philipwjones
Copy link
Contributor

PGI Update - was informed today (6/4) that the PGI fix for this has been implemented and should appear in the 20.7 (July) release. Think it's still useful to keep this fix in for when we may need to run with previous compiler versions.

@amametjanov amametjanov force-pushed the azamat/framework/pgi-cpr-omp-workaround branch from 8e4a132 to 21b4888 Compare June 4, 2020 23:47
@amametjanov
Copy link
Contributor Author

Keeping non-pgi modules constant at

module load DefApps python/3.5.2 subversion/1.9.3 git/2.13.0 cmake/3.13.4 essl/6.1.0-2
 netlib-lapack/3.8.0 netcdf/4.6.1 netcdf-fortran/4.4.4 spectrum-mpi/10.3.1.2-20200121
 parallel-netcdf/1.8.1 hdf5/1.10.4

and varying 5 available pgi modules on Summit, the errors are as follows.
For pgi/18.7 and pgi/18.10, there is a build error:

PGF90-F-0000-Internal compiler error. print_token(): missing token       0
  (/gpfs/alpine/cli115/proj-shared/azamat/e3sm_scratch/e3sm-tests/20200604-cprpgi-macro/SMS_P42x2.T62_oQU120.CMPASO-NYF.summit_pgi.20200604_pgi18-7/bld/cmake-bld/framework/mpas_dmpar.f90: 8236)

with line 8236 having:

8235       !$omp do private(commListPtr, listPosition, bufferOffset, nAdded, fieldCursor, exchListPtr, iExch, iBuffer) 
8236       do listItem = 1, commListSize
8237         commListPtr => exchangeGroup % sendList

For pgi/19.9 (D), pgi/19.10 and pgi/20.1, runs are aborted during initialization:

Program terminated with signal 5, Trace/breakpoint trap.
#0  mpas_dmpar::mpas_dmpar_exch_group_pack_buffer_field1d_real ()
    at /gpfs/alpine/cli115/proj-shared/azamat/e3sm_scratch/e3sm-tests/20200604-cprpgi-macro/SMS_P42x2.T62_oQU120.CMPASO-NYF.summit_pgi.20200604_pgi20-1/bld/cmake-bld/framework/mpas_dmpar.f90:8448
8448              exchListPtr => fieldCursor % sendList % halos(haloLayer) % exchList

Regarding

Are additional modifications needed in other cores (besides the ocean core) to make all variables default private?

I don't think so. This is PGI- and pointer-specific.

Updated PR description above for a merge commit message.

@mark-petersen mark-petersen self-requested a review June 8, 2020 15:06
Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only affects PGI compilers

@mark-petersen
Copy link
Contributor

@mgduda, the PR description in the commit message is now improved. Can you approve?

Copy link
Contributor

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the improved PR description. This looks good to me.

@mark-petersen mark-petersen merged commit 5e12fea into MPAS-Dev:develop Jun 11, 2020
@amametjanov amametjanov deleted the azamat/framework/pgi-cpr-omp-workaround branch June 26, 2020 14:23
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 27, 2020
#3737)

Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 29, 2020
Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants