Fixes to COMPASS to support conda MPI#480
Fixes to COMPASS to support conda MPI#480mark-petersen merged 8 commits intoMPAS-Dev:ocean/developfrom
Conversation
|
This PR is based off #468 so it should be merged with (or after) that PR. |
TestingI successfully ran all steps of the |
e28457c to
9126dab
Compare
d1ee55b to
ace9375
Compare
ace9375 to
8f5b2d1
Compare
|
I got this working on my laptop and on Grizzly today. This required modifications to the approach, but these changes mean COMPASS users will not have to do anything other than load a compass conda environment, and So far, it is not necessary to specify the |
5e93015 to
cd7ae92
Compare
This merge moves soma/4km/32to4km and soma/8km/32to8km test cases into a new subdirectory called "broken" since these test cases are not working and won't be fixed anytime soon. With this change, `./list_testcases.py` and `./setup_testcases.py` won't pick up these tests because their driver config files aren't at the expected directory level.
The Maine, QU60 and SOQU60to15 test cases now have the links to the python script for defining their vertical grids that they need to be set up successfully.
We no longer define a path to metis in the config file, so the version from the conda environment needs to be used instead.
Add support for a "conda_mpi" attribute to "step" tags. If this attribute is set to "true" and MPI is present in the conda environment, that command will be called with `mpirun` from the conda envrionment. This is needed to support compass conda enviornments with mpich. Python scripts and modules that use the netcdf4 package with mpich support don't work properly on many compute nodes (e.g. Grizzly at LANL and Anvil at ANL) unless they are prefixed with `mpirun -np 1`
The paraview extractor can now be called as a function rather than a script, and this is done during base-mesh generation and culling. SCRIP files can now also be created with a function, so a script call is replaced with a function here as well. With these changes, calls to python scripts that use NetCDF in the parallel conda enviornment will now work as long as they are called with `mpirun -np 1`
Also rename the load script for convenience.
This will make sure the compatible version of MPI gets used.
Since we can't detect automatically that this is a python script, (and that it needs to support compass mpi) we need to say so explicitly
cd7ae92 to
190338d
Compare
Testing of all ocean test cases on grizzly:Successful testsTests checked here were successful, those unchecked have not run yet:
Tests that failAll these tests failed for reasons unrelated to the PR These tests need to copy
These tests are missing local links to a python script called
All the above test cases should be fixed in #514 This test case is missing a local link to a file called
This test case crashes during forward run on 4 nodes (144 cores) with insufficient memory and
Tests that were skippedSome have prerequisites that are broken, others are too big to test:
Ran only partly (because it takes too long):
|
|
@mark-petersen, this has been thoroughly tested on Grizzly and is now ready to test and merge. To test, please make sure you use the environment Important: you need to use this environment both to set up the test cases and to run them. If you don't use this conda environment during setup, links to the wrong |
…evelop Optionally add links to load_compass_env.sh in test cases #492 This is specified either in the config file or at the command line. Like #480, this involves changes to common COMPASS infrastructure and we should consider making a separate PR to develop instead of merging those changes to ocean/develop. closes #490
|
@mark-petersen, I'm not sure what is different in So it seems like there's still something to sort out here and maybe the MPICH environment still isn't ready for general use. |
|
@xylar, |
|
|
|
Can you please be more specific about what "broken link" means? Also, what is best way you propose to remediate the issue now that the file location is identified? |
|
@pwolfram, I suggest you try setting up the test case with |
@xylar I confirmed on LANL IC that mapping_analysis fails using error message
I can get details
This may be a clue: the mpirun command from the details
So we don't need to revert this commit, we could temporarily point One side note, it seems to be hanging on something for a long time IC. With all steps disabled, the |
… ocean/develop Optionally add links to load_compass_env.sh in test cases MPAS-Dev#492 This is specified either in the config file or at the command line. Like MPAS-Dev#480, this involves changes to common COMPASS infrastructure and we should consider making a separate PR to develop instead of merging those changes to ocean/develop. closes MPAS-Dev#490
This merge converts several script calls to function calls, which seems to work more reliably with conda MPI:
With these changes, calls to python scripts that use NetCDF in the parallel conda enviornment will now work as long as they are called with
mpirun -np 1This merge also adds support for a
conda_mpiattribute tosteptags in COMPASS XML files.If this is set to
trueorfalse, the step will havempirun -np 1prepended to the executable in conda environments with MPI support. If noconda_mpiattribute is specified,mpirun -np 1is prepended only to python scripts (calls starting withpythonor ending with.py).This is needed to support compass conda environments with MPI. Python scripts and modules that use the
netcdf4package with mpich support will work property if they are called withmpirun.Changes are made to
setup_testcase.pyin b1a2b3b, so this commit should be merged todevelopin a separate PR.