You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, First I would like to thank the contributors for providing such an elegant and easy-to-go library to profile MPI programs.
MY problem:
I built a mpi cluster within a lan with up to 8 devices (Linux Ubuntu 20.04) according to the MPI tutorial.
I want to use Caliper to profile my applications over multiple devices. And before that, I wrote a simple hello world to test if it works.
The code is as below:
#include <mpi.h>
#include <stdio.h>
#include <caliper/cali.h>
#include <caliper/cali-manager.h>
// ...
// ...
int main(int argc, char** argv) {
//l Initialize the MPI environment
cali::ConfigManager mgr;
mgr.add("runtime-report,event-trace(output=trace.cali)");
int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE) {
fprintf(stderr, "xxx MPI does not provide needed thread support!\n");
return -1;
// Error - MPI does not provide needed threading level
}
// MPI_Init(&argc, &argv);
mgr.start();
// ...
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
// CALI_MARK_BEGIN("iemann_slice_precompute");
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
//CALI_MARK_END("iemann_slice_precompute");
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
//
mgr.flush();
mgr.stop();
MPI_Finalize();
}
the program works perfectly with multi-threads on a single device.
sky@nx01:~/cloud$ mpirun -np 2 ./hello
Hello world from processor nx01, rank 0 out of 2 processors
Hello world from processor nx01, rank 1 out of 2 processors
Path Min time/rank Max time/rank Avg time/rank Time %
MPI_Comm_dup 0.000952 0.001182 0.001067 13.165525
MPI_Get_processor_name 0.000133 0.000193 0.000163 2.011228
Function Count (min) Count (max) Time (min) Time (max) Time (avg) Time %
9 13 0.040653 0.040994 0.040823 92.516799
MPI_Comm_dup 2 2 0.001527 0.002249 0.001888 4.278705
MPI_Recv 4 4 0.000935 0.000935 0.000935 1.059478
MPI_Comm_free 1 1 0.000170 0.000287 0.000228 0.517841
MPI_Get_processor_name 1 1 0.000170 0.000285 0.000228 0.515575
MPI_Send 4 4 0.000421 0.000421 0.000421 0.477048
MPI_Finalize 1 1 0.000069 0.000134 0.000102 0.230026
MPI_Probe 2 2 0.000186 0.000186 0.000186 0.210762
MPI_Get_count 2 2 0.000171 0.000171 0.000171 0.193766
When I test them over two devices(nodes), the program could not return normally and got stuck in somewhere.
sky@nx01:~/cloud$ mpirun -np 2 --host nx01,nx02 ./hello
Hello world from processor nx02, rank 1 out of 2 processors
Hello world from processor nx01, rank 0 out of 2 processors
Path Min time/rank Max time/rank Avg time/rank Time %
MPI_Comm_dup 0.003007 0.003007 0.003007 29.905520
MPI_Get_processor_name 0.000132 0.000132 0.000132 1.312780
Is there anybody who encounters the same issue or figure out where the bug locates?
Thanks a lot for answering.
The text was updated successfully, but these errors were encountered:
This is unusual, Caliper shouldn't affect MPI progress when going from intra- to inter-node communication. Does this only happen when Caliper is enabled? It's possible the issue is in the underlying program. In particular, pay close attention to the order of communications between the processes and make sure you're not stuck in a blocking MPI_Send. It's possible that an MPI_Send finishes immediately for a target process on the same node but waits for a matching MPI_Recv to be called first when it goes over the network.
This is unusual, Caliper shouldn't affect MPI progress when going from intra- to inter-node communication. Does this only happen when Caliper is enabled? It's possible the issue is in the underlying program. In particular, pay close attention to the order of communications between the processes and make sure you're not stuck in a blocking MPI_Send. It's possible that an MPI_Send finishes immediately for a target process on the same node but waits for a matching MPI_Recv to be called first when it goes over the network.
Hi, @daboehme Thanks for your reply.
The MPI_Send in the profiling log reminds me that the Caliper may duplicate the MPI_Comm. And the program may stuck in the MPI_Send or MPI_recv.
In the multi-process single device profiling report, we can find Both MPI_Send and MPI_Recv are called.
However, in the hello_world example, I simply call the function MPI_Get_rank, without MPI_Send or MPI_Recv.
So, I agree the problem may exist in the Caliper that some processes with MPI_Send/Recv finished too early.
Hi, First I would like to thank the contributors for providing such an elegant and easy-to-go library to profile MPI programs.
MY problem:
I built a mpi cluster within a lan with up to 8 devices (Linux Ubuntu 20.04) according to the MPI tutorial.
I want to use Caliper to profile my applications over multiple devices. And before that, I wrote a simple hello world to test if it works.
The code is as below:
the program works perfectly with multi-threads on a single device.
When I test them over two devices(nodes), the program could not return normally and got stuck in somewhere.
Is there anybody who encounters the same issue or figure out where the bug locates?
Thanks a lot for answering.
The text was updated successfully, but these errors were encountered: