-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpi: add test that physical cores are not oversubscribed #17
Conversation
a0327e0
to
259dd37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple minor comments. Be sure to remove WIP before merging :-)
mpi/vcpu.c
Outdated
#include <sched.h> | ||
#define MASTER 0 | ||
|
||
int sched_getcpu (void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prototype declaration should not be necessary if sched.h
is included as documented in the sched_getcpu(3) man page:
Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
sched_getcpu():
Since glibc 2.14:
_GNU_SOURCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if I leave this? Both cray's compilers and gcc throw a warning if I pull it out:
(s=33,d=0) tioga11 ~/flux-test-collective/mpi (vcpu-test)$ mpicc -o vcpu vcpu.c
vcpu.c: In function 'main':
vcpu.c:65:11: warning: implicit declaration of function 'sched_getcpu'; did you mean 'sched_getparam'? [-Wimplicit-function-declaration]
65 | cpu = sched_getcpu ();
| ^~~~~~~~~~~~
| sched_getparam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get this warning because you are not defining _GNU_SOURCE
before including sched.h
as documented in the "Feature Test Macro Requirements" I referenced above.
It is slightly bad practice to define a prototype for an external function if there exists a valid header for it. In the case the prototype is incorrect you may have a runtime crash which would have been easily avoided by including the correct prototype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just recalled that @garlick had asked for a small blurb at the top of the test describing its pass/fail conditions. Added that too.
I'm fuzzy on what the actual failure mode is here and what it might have to do with pmix? Can you elaborate or is there an issue to point to? |
A failure mode might have output that looks like this:
On corona, where there are 48 cores per node, it shouldn't be packing 4 MPI ranks on one core by default. (Note that I haven't actually ever seen this happen, I made that output up.) If I try and force this behavior, it ends up looking like this:
Re: what it might have to do with pmix...apparently, if pmix is configured improperly, it can have behavior similar to what I showed above -- packing a ton of MPI ranks on one core instead of spreading them out like it should. Again, I haven't ever seen it do this, but this was something @nhanford suggested we test for in case that behavior ever pops up. One thing worth mentioning is that we considered using this code instead of the mpi hello-world test from flux-core (originally, that's how it was written) but @grondo suggested keeping it as a separate test to give us a broader testsuite. Does that help? Sorry if it doesn't, I'm not an expert on this myself... |
Hello @garlick, This is a quick way of categorically ruling out Open MPI failing over to singleton mode when PMIx is misconfigured. Also, TCE-32 on the internal LC Jira shows an example along similar lines where affinity was being set incorrectly. However, I don't know if we ever ruled out PMIx on that one, but iti is still worthwhile to detect discrepancies between what Flux set for affinity and where processes actually ended up, IMHO. |
Mmmm I think because of the |
Maybe a simpler test to detect multiple singletons would be to just have rank 1 print its rank and if it's 1 pass, and if 0 fail? Edit: Duh, that wouldn't work - but maybe just have it fail if the size is 1 and always run > 1? Why the communicator split? |
p.s. if @nhanford is on board here, this can be merged. I was just trying to understand. |
I think we need to keep this test, since it's still looking at a behavior that could be problematic (core oversubscription when that shouldn't happen). However, I think we need an additional test to detect if multiple singletons are being created, because this test won't pick that up as its currently designed. One smoke test would be to have the rank 0 MPI process create a new file in /tmp/fluxci and call MPI_Abort if that file has already been created. Then, if there were multiple rank 0 instances the test would fail. Thanks @garlick and @nhanford for the comments, you made me realize I had misunderstood what behavior we were looking for! |
A file check would work but it would be easier and equivalent to fail if
the mpi size is 1. (Each singleton will think it is rank 0 of a parallel
program of size 1)
…On Wed, Feb 28, 2024, 8:46 AM William Hobbs ***@***.***> wrote:
I think we need to keep this test, since it's still looking at a behavior
that could be problematic (core oversubscription when that shouldn't
happen). However, I think we need an additional test to detect if multiple
singletons are being created, because this test won't pick that up as its
currently designed.
One smoke test would be to have the rank 0 MPI process create a new file
in /tmp/fluxci and call MPI_Abort if that file has already been created.
Then, if there were multiple rank 0 instances the test would fail.
Thanks @garlick <https://github.com/garlick> and @nhanford
<https://github.com/nhanford> for the comments, you made me realize I had
misunderstood what behavior we were looking for!
—
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABJPW36W7LGVZQZAFK2TKLYV5NOZAVCNFSM6AAAAABDVYHGYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRZGQYDKMRQGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'm going to explain what I thought we were trying to test for in I thought we were looking for output that looked like this:
where it had unnecessarily packed multiple MPI ranks onto one core instead of spreading them out. In order to detect this, I had each rank fetch its rank number and The rank 0 MPI task then iterated over that array to check that each one was unique. This worked well for detecting errors on one node, but failed when I tried to run it across two, because the In order to solve that, I split the communicators by node, and had each node iterate over its own array to check that no two When I said:
I was actually wrong; this test won't pick up singletons, but it has nothing to do with the communicator split. This test won't pick up singletons because it relies on the assumption that all the ranks initialized when the test starts will report to each other their core number, which isn't the case if they're singletons (silly Hobbs). (And even if they could all report their core number, they might still be unique.) So, in the case of n singletons being created, you'll have n arrays of size 1 containing 1 |
Nice idea. I'll make this in a new PR. If we're good to keep this test, I'm removing WIP. |
Problem: A common failure in some MPI implementations is that all the MPI ranks end up packed on the same core. This causes lock contention, achieves no parallelization, and is bad, so we want to catch if that's happening. Add a test to our CI that collects the physical cpu number for each MPI rank, then checks that no two are the same. Note that this will fail if cores are deliberately oversubscribed (but they shouldn't be in CI). Co-authored-by: Nathan Hanford <[email protected]>
Add a clang-format file to standardize all C formatting for future tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
(and fun too 🎉 ) |
Problem: A common failure in some MPI implementations is that all the MPI ranks end up packed on the same core. This usually shows up if pmix is configured improperly. It causes lock contention, achieves no parallelization, and is bad, so we want to catch if that's happening.
Add a test to our CI that collects the physical cpu number for each MPI rank, then checks that no two are the same. Note that this will fail if cores are deliberately oversubscribed (but they shouldn't be in CI.
This is work-in-progress. A few things left to do:
MPI_Allreduce
(even though that's already tested inhello.c
) but currentlyMPI_Allgather
is used.