Optimal Sizes of data for sends and receives in MPI_问答_开发者

Optimal Sizes of data for sends and receives in MPI

开发者 https://www.devze.com 2023-02-15 22:31 出处：网络

I am writing a parallel application with MPI in which the master process has data of size approximately as large as the cache(4MB on the platform I am working on) to send over to each process. As 4MB

相关专题：mpi recv send

I am writing a parallel application with MPI in which the master process has data of size approximately as large as the cache(4MB on the platform I am working on) to send over to each process. As 4MB might be too large for the master to send at a time, it is necessary that it break the entire data into smaller chunks of a certain size suitable for sending and receiving.

My question is, Is there any suggestion on what should be the opt开发者_如何转开发imal size for sending and receiving each smaller chunk given the size of the entire data?

Thanks.

4MB won't be any problem for any MPI implementation out there; I'm not sure what you mean by "too large" though.

A rule of thumb is that, if you can easily send the data all in one message, that is usually faster -- the reason being that there is some finite amount of time required to send and receive any one message (the latency) that comes from the function calls, calls to the transport layer, etc. On top of that, there is some, usually close-to-fixed amount of time it takes to send any additional byte of data (which is one over the bandwidth.) That's only a very crude approximation to the real complexity of sending messages (especially large messages) between processors, but it's a very useful approximation. Within that model, the fewer messages you send, the better, because you incur the latency overhead fewer times.

The above is almost always true if you are contemplating sending many little messages; however, if you're talking about sending (say) 4 1MB messages vs 1 4MB messages, even under that model the difference may be small, and may be overwhelmed by other effects specific to your transport. If you want a more accurate assessment of how long things take for your platform, there's really no substitute for empirical measurement of how long things actually take. The best way would just be to try it in your code a few ways and see what is best. That's really the only definitive answer. A second method would be to take a look at MPI "microbenchmarks":

The Intel MPI Benchmarks (IMB)
The Ohio State University MPI Benchmarks (OSU)

both of the above include benchmarks of how long it takes to send and receive messages of various sizes; you compile the above with your MPI and you can simply read off how long it takes to send/receive (say) a 4MB message vs 4x 1MB messages and that may give you some clues as to how to proceed.