1

The following code example simply calls MPI_Barrier in a loop. On a 2 computer cluster of Intel machines, it runs correctly. When run from an Intel machine, with an AMD machine, it completes the first 3 loops without issue but consistently fails on the 4th loop.

#include <iostream>
#include <mpi.h>

int main()
{
    int argc = 0;
    MPI_Init(&argc, nullptr);

    const int count = 100;
    for (int i = 0; i < count; ++i)
    {
        std::cout << " Attempting Barrier " << i + 1 << std::endl;
        MPI_Barrier(MPI_COMM_WORLD);
        std::cout << " Completed Barrier " << i + 1 << std::endl;
    }

    MPI_Finalize();
}

command line: mpiexec -hosts 2 localhost amd_machine -wdir "\network\path" \path-to-exe

output:

[0] Attempting Barrier 1
[1] Attempting Barrier 1
[0] Completed Barrier 1
[0] Attempting Barrier 2
[1] Completed Barrier 1
[0] Completed Barrier 2
[1] Attempting Barrier 2
[0] Attempting Barrier 3
[0] Completed Barrier 3
[0] Attempting Barrier 4
[1] Completed Barrier 2
[1] Attempting Barrier 3
[1] Completed Barrier 3
[1] Attempting Barrier 4

job aborted:
[ranks] message

[0] terminated

[1] fatal error
Fatal error in MPI_Barrier: Other MPI error, error stack:
MPI_Barrier(MPI_COMM_WORLD) failed
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.  (errno 10060)
6
  • What if you run between two Intel machines? Commented Jan 9 at 22:58
  • Have you tried making -wdir a directory local to the machine in question? Just a thought, Commented Jan 9 at 23:26
  • I've no experience with MPI beyond running a couple demo programs, but I do wonder if the library versions match on the two systems, could well believe this sort of thing is the result of some sort of version conflict. Commented Jan 10 at 4:27
  • @GillesGouaillardet: Works between two Intel machines. Commented Jan 10 at 14:39
  • @catnip: wdir has to be visible by all nodes Commented Jan 10 at 14:40

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.