Skip to content

GPU implementation of ocean baroclinic velocity tendencies#572

Closed
philipwjones wants to merge 1 commit intoMPAS-Dev:ocean/developfrom
philipwjones:ocean/GPUvel
Closed

GPU implementation of ocean baroclinic velocity tendencies#572
philipwjones wants to merge 1 commit intoMPAS-Dev:ocean/developfrom
philipwjones:ocean/GPUvel

Conversation

@philipwjones
Copy link
Contributor

This PR is a substantial modification to all the ocean baroclinic velocity tendencies and includes:

  • a complete GPU implementation in which all tendencies - except tidal for now - are computed on the accelerator (using OpenACC) and all data is transfered at the top level driver (ocn_tend_vel)
  • a number of CPU optimizations performed along the way
  • elimination of meshPool, configPool
  • misc cleanup of comments and stuff

This PR contains similar changes that are in #513 #536 #569 at least so will need to modify/rebase once those are merged.

Performance speedup for this part of the code was 2.8x using 2 GPUs on Summit compared with an 8-rank MPI only case. Details of performance will depend on configuration. CPU performance improved by ~20% in the same 8-rank QU240 test. More speedup is expected as we migrate more data to the device elsewhere. The computational part excluding data transfer showed a 10x speedup.

This is not quite b4b due to changes in order of operations in a couple of routines and not b4b using the accelerator (different chip architecture). But in both cases, the differences are at roundoff level. Tested most of the options (eg for hmix, pgrad), though since it was tested in standalone QU240, the forcing routines really didn't get much of a workout.

  This commit is a substantial modification to all the ocean
  baroclinic velocity tendencies and includes:
  - a complete GPU implementation in which all tendencies are
    computed on the accelerator (using OpenACC) and all data
    is transfered at the top level driver (ocn_tend_vel)
  - a number of CPU optimizations performed along the way
  - elimination of meshPool, configPool
@pwolfram
Copy link
Contributor

Quick questions @philipwjones:

  1. This still works with tidal tendencies, correct?
  2. Is there a plan to move over the tidal tendencies?

@pwolfram
Copy link
Contributor

cc @sbrus89 and Nairita

@mattdturner
Copy link
Collaborator

mattdturner commented May 26, 2020

This still works with tidal tendencies, correct?

Unless I'm mistaken, if running with GPUs a info message will be printed and then the tidal tendency code will be calculated in the CPU instead of the GPU. So the calculations are still run, its just not on the GPU. Notice how the only code in #ifdef MPAS_OPENACC is the call to mpas_log_write, and the rest of the code is run no matter what pre-compiler flags are used.

I was mistaken.

@philipwjones
Copy link
Contributor Author

@pwolfram If you are running on GPUs (OPENACC enabled) and with tidal forcing, this will exit with an error. It still works for CPU only runs. This is temporary - trying to get a lot of GPU code integrated before the end of the month and an ECP deliverable. But the way the tidal forcing modifies zMid and moves pointers back and forth interferes with the copies on the GPU and it was going to take some thought on how to manage that in an efficient way. Sorry - will get back to it once I finish integrating other stuff

@pwolfram
Copy link
Contributor

@philipwjones, we should have removed changes to zMid in the new version but it may not yet be on ocean/develop (cc @mark-petersen because we are in discussions about this merge). Can you please make an issue of potential concerns so we can avoid them or refractor here? Don't want this to not be included in the GPU-enabled version because of some programming issue on our side. Also know we all want to write better code moving forward too-- thanks!

@mark-petersen mark-petersen self-assigned this May 27, 2020
@philipwjones
Copy link
Contributor Author

Tried to rebase this yesterday, but there have been some reorganization of code since this was first submitted and the rebase got a little ugly and I don't have high confidence. So...will probably do a fresh checkout of a more recent version and re-implement the GPU mods, maybe incorporating some new ideas from Az first. Will update the branch when this is done.

@mattdturner
Copy link
Collaborator

So...will probably do a fresh checkout of a more recent version and re-implement the GPU mods, maybe incorporating some new ideas from Az first

That's probably a good idea. We just had an issue where the merge conflict resolution in one of my PRs reverted prior changes (bugfix in #672), so starting over with a fresh checkout in a few subroutines might be better.

@philipwjones
Copy link
Contributor Author

Replaced by #772 and another future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants