MPI 生成和合併問題 (Issue with MPI spawn and merge)


問題描述

MPI 生成和合併問題 (Issue with MPI spawn and merge)

我正在嘗試開始在 MPI 中創建動態進程。我有一個父代碼(main.c)試圖產生新的工作/子進程(worker.c)並將兩者合併到一個內部通信器中。父代碼(main.c)是

#include<stdio.h>
#include "mpi.h"

MPI_Comm child_comm;
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

if(rank == 0 )
{
   int  num_processes_to_spawn = 2;
   MPI_Comm_spawn("worker", MPI_ARGV_NULL, num_processes_to_spawn, MPI_INFO_NULL, 0, MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE );

MPI_Comm intra_comm;
MPI_Intercomm_merge(child_comm,0, &intra_comm);
MPI_Barrier(child_comm);


int tmp_size;
MPI_Comm_size(intra_comm, &tmp_size);
printf("size of intra comm world = %d\n", tmp_size);

MPI_Comm_size(child_comm, &tmp_size);
printf("size of child comm world = %d\n", tmp_size);

MPI_Comm_size(MPI_COMM_WORLD, &tmp_size);
printf("size of parent comm world = %d\n", tmp_size);

}

MPI_Finalize();

工人(子)代碼是:

    #include<stdio.h> 
    #include "mpi.h"
    int main( int argc, char *argv[] )
    {
    int numprocs, myrank;
    MPI_Comm parentcomm;
    MPI_Comm intra_comm;

    MPI_Init( &argc, &argv );
    MPI_Comm_size( MPI_COMM_WORLD, &numprocs );
    MPI_Comm_rank( MPI_COMM_WORLD, &myrank );

    MPI_Comm_get_parent( &parentcomm );

    MPI_Intercomm_merge(parentcomm, 1, &intra_comm);
    MPI_Barrier(parentcomm);

    if(myrank == 0)
    {
    int tmp_size;
    MPI_Comm_size(parentcomm, &tmp_size);
    printf("child size of parent comm world = %d\n", tmp_size);

    MPI_Comm_size(MPI_COMM_WORLD, &tmp_size);
    printf("child size of child comm world = %d\n", tmp_size);

    MPI_Comm_size(intra_comm, &tmp_size);
    printf("child size of intra comm world = %d\n", tmp_size);

    MPI_Finalize( );
    return 0;
  }
 } 

我運行這段代碼使用

mpirun ‑np 12 main.c

拆分和合併後,我希望輸出為

size of intra comm world = 14
size of child comm world = 2
size of parent comm world = 12
child size of parent comm world = 12
child size of child comm world = 2
child size of intra comm world = 14

但我得到以下錯誤輸出。

   size of intra comm world = 3
    size of child comm world = 1
    size of parent comm world = 12
    child size of parent comm world = 2
    child size of child comm world = 2
    child size of intra comm world = 3

我不明白錯誤在哪裡,請有人告訴我錯誤在哪裡。

謝謝,克里斯


參考解法

方法 1:

Your code suffers from a few problems, which I'll try to list here:

  • In the master part, only process 0 calls MPI_Comm_spawn(). This isn't a mistake as such (especially since you use MPI_COMM_SELF as parent communicator), but it de facto excludes all other processes from the subsequent merging.
  • In both the master and worker parts, you use MPI_Comm_size() to get the size of the remote communicator instead of MPI_Comm_remote_size(). Therefore you will only get the size of the local communicator inside the inter‑communicator, instead of the size of the remote communicator.
  • In the master code, only process 0 calls MPI_Finalise() (not to mention that main() and MPI_Init() are missing)

Here are some fixed version of your codes:

master.c

#include <stdio.h>
#include <mpi.h>

int main( int argc, char *argv[] ) {

    MPI_Init( &argc, &argv );
    int rank;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );

    MPI_Comm child_comm;
    int  num_processes_to_spawn = 2;
    MPI_Comm_spawn( "./worker", MPI_ARGV_NULL,
                    num_processes_to_spawn, MPI_INFO_NULL,
                    0, MPI_COMM_WORLD,
                    &child_comm, MPI_ERRCODES_IGNORE );

    MPI_Comm intra_comm;
    MPI_Intercomm_merge( child_comm, 0, &intra_comm );

    if ( rank == 0 ) {
        int tmp_size;
        MPI_Comm_size( intra_comm, &tmp_size );
        printf( "size of intra comm world = %d\n", tmp_size );

        MPI_Comm_remote_size( child_comm, &tmp_size );
        printf( "size of child comm world = %d\n", tmp_size );

        MPI_Comm_size( MPI_COMM_WORLD, &tmp_size );
        printf( "size of parent comm world = %d\n", tmp_size );
    }

    MPI_Finalize();

    return 0;
}

worker.c

#include <stdio.h> 
#include <mpi.h>

int main( int argc, char *argv[] ) {

    MPI_Init( &argc, &argv );

    int myrank;
    MPI_Comm_rank( MPI_COMM_WORLD, &myrank );

    MPI_Comm parentcomm;
    MPI_Comm_get_parent( &parentcomm );

    MPI_Comm intra_comm;
    MPI_Intercomm_merge( parentcomm, 1, &intra_comm );

    if ( myrank == 0 ) {
        int tmp_size;
        MPI_Comm_remote_size( parentcomm, &tmp_size );
        printf( "child size of parent comm world = %d\n", tmp_size );

        MPI_Comm_size( MPI_COMM_WORLD, &tmp_size );
        printf( "child size of child comm world = %d\n", tmp_size );

        MPI_Comm_size( intra_comm, &tmp_size );
        printf( "child size of intra comm world = %d\n", tmp_size );
    }

    MPI_Finalize();

    return 0;
}

Which gives on my laptop:

~> mpirun ‑n 12 ./master
child size of parent comm world = 12
child size of child comm world = 2
child size of intra comm world = 14
size of intra comm world = 14
size of child comm world = 2
size of parent comm world = 12

(by marcGilles)

參考文件

  1. Issue with MPI spawn and merge (CC BY‑SA 2.5/3.0/4.0)

#mpi #openmpi






相關問題

MPI 在根進程上收集數組 (MPI gather array on root process)

如何為 xcode 安裝 Openmpi? (how to install Openmpi for xcode?)

在 ARM 上的 Linux 上運行 MPI (OpenMPI) 應用程序時出現問題 (Problems running MPI (OpenMPI) app on Linux on ARM)

在 C++ 和 MPI 中獨立並行寫入文件 (independent parallel writing into files in C++ and MPI)

傳輸一些數據後 MPI_Bcast 掛起 (MPI_Bcast hanging after some data transferred)

來自一個文件的多個 mpirun 與多個文件運行 (Multiple mpiruns from one file vs multiple file runs)

Isend/Irecv 不起作用,但 Send/Recv 可以 (Isend/Irecv doesn`t work but Send/Recv does)

MPI 要求在 localhost 上進行身份驗證 (MPI asks authentication on localhost)

MPI 生成和合併問題 (Issue with MPI spawn and merge)

mpiexec 拋出錯誤“mkstemp 失敗,沒有這樣的文件或目錄” (mpiexec throws error "mkstemp failed No such file or directory")

使用 MPI_Isend 時出現分段錯誤 (Segmentation Fault when using MPI_Isend)

MPI_Comm_split 不適用於 MPI_Bcast (MPI_Comm_split not working with MPI_Bcast)







留言討論