Using OpenMP and OpenMPI together under Slurm

Question

I have written a C++ code that uses both OpenMP and OpenMPI. I want to use (let's say) 3 nodes (so size_Of_Cluster should be 3) and use OpenMP in each node to parallelize the for loop (There are 24 cores in a node). In essence I want MPI ranks be assigned to nodes. The Slurm script I have written is as follows. (I have tried many variations but could not come up with the "correct" one. I would be grateful if you could help me.)

#!/bin/bash
#SBATCH -N 3
#SBATCH -n 72
#SBATCH -p defq
#SBATCH -A akademik
#SBATCH -o %J.out
#SBATCH -e %J.err
#SBATCH --job-name=MIXED

module load slurm
module load shared
module load gcc
module load openmpi

export OMP_NUM_THREADS=24

mpirun -n 3 --bynode ./program

Using srun did not help.

On my local Mac with M1Max chip and using just 3 cores (no OpenMP), the pure OpenMPI algorithm takes about 30 mins to complete. When I used OpenMP + OpenMPI and run it using the script example in the question, I expected a quicker completion time. But it did not run quicker. In one try (I do not remember the exact script now), it took about 45 mins. So I suspect that the ranks are not distributed among nodes, but I am not sure. In short, all I desire is to assign ranks to nodes, and appreciate any help in this reagard. — Furkan Semih Dündar
– Furkan Semih Dündar, Commented Jan 7, 2023 at 14:29

Victor Eijkhout · Accepted Answer · 2023-01-07 17:15:54Z

0

The relevant lines are:

#SBATCH -N 3
#SBATCH -n 72

export OMP_NUM_THREADS=24

This means you have 72 MPI processes, and each creates 24 thread. For that to be efficient you probably need 24x72 cores. Which you don't have. You should specify:

#SBATCH -n 3

Then you will have 3 processes, with 24 threads per process.

You don't have to worry about the placement of the ranks on the nodes: that is done by the run time. You could for instance let each process print MPI_Get_processor_name to confirm.

edited Jan 7, 2023 at 17:15

answered Jan 7, 2023 at 15:06

Victor Eijkhout

6,0202 gold badges29 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Furkan Semih Dündar Over a year ago

Thank you very much for your kind answer. I tested the script with a simple program, and as you mentioned MPI_Get_processor_name function returned the desired output. However when I run my original program it took about 45 mins to complete. So it seems that the problem is not with the script. Maybe I should optimize the OpenMP for loops. Best wishes.

fchen Over a year ago

Use -N 3 instead of -n 3 for sbatch. Avoid mpirun (the argument depends on OpenMPI, mpich2, intel mpi et al), try to make srun work. In one of my cloud cases, my binaries were compiled with Intel icc/icpc, I have to do: I_MPI_PMI_LIBRARY=/opt/gridview/slurm/lib/libpmi2.so srun --mpi=pmi2

Collectives™ on Stack Overflow

Using OpenMP and OpenMPI together under Slurm

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related