Running multiple MPI jobs on a single node has issues #2900

astro-friedel · 2023-10-04T14:04:40Z

Describe the bug
One of the HPC machines I work with has 4 GPU's per node. There are times when we want to run 4 independent MPI jobs on a node. The currently suggested way to do this, using a SimpleLauncher and invoking mpirun on the command line, does not work. All 4 jobs end up trying to run on the same GPU. For this particular machine we have to invoke jsrun rather than mpirun on the command line to get each job on its own GPU. It would be nice to be able to use the JsrunLauncher to start the job, thus circumventing the need to invoke it from the command line from the SimpleLauncher.

To Reproduce
Steps to reproduce the behavior, for e.g:

Setup Parsl 2023.09.25 with Python 3.11 on cluster
Run the script to start 4 jobs on a single node via parsl, using a SimpleLauncher and mpirun
Wait, see that all jobs are on a single GPU

or

Setup Parsl 2023.09.25 with Python 3.11 on cluster
Run the script to start 4 jobs on a single node via parsl, using a JsrunLauncher
Wait until time allocation expires (or hit ^C)
See error messages from MPI indicating that MPI_INIT was called multiple times
Also note that the first job to hit the node probably completed, but the other 3 exited with an error, yet Parsl did not return until the allocates time expired.

Expected behavior
The expected behavior would be for each job to run on an independent GPU to completion, or if an error was encountered, then Parsl would exit, and not use the entire time allocation.

Environment

OS: RHEL (customized at LLNL)
Python version: 3.11
Parsl version: 2023.09.25

Distributed Environment

Where are you running the Parsl script from ? login node
Where do you need the workers to run ? compute nodes

The text was updated successfully, but these errors were encountered:

benclifford · 2023-10-04T14:08:37Z

tagging @yadudoc as he's working on MPI stuff and this is a relevant use case

benclifford · 2023-10-04T14:13:53Z

Specifically for your second test case, @astro-friedel I have added the safe-exit tag -- that is the tag for issues related to shutdown not behaving correctly, especially in the case of errors.
I'd be interested if you could upload a runinfo/ of such a run, and if you can see any obvious things breaking

benclifford · 2023-10-12T07:25:29Z

Cross-reference #2905 - without digging in further I'm unclear if the prototype in PR #2905 addresses the core GPU scheduling problem here. But I feel like it is a very relevant use case and it would be great if @astro-friedel and @yadudoc could figure this out against that PR

yadudoc · 2023-10-12T15:49:17Z

@astro-friedel Just to clarify, were you able to get things working with using SimpleLauncher + jsrun from the app commandline? Could you paste in the launcher command you had to use?

My concern about this particular case is that MPI applications are generally used for multi-node runs, and the new MPI functionality assumes that each app owns N>1 nodes. However, it might be that the same application might be scaled down for test/dev runs, and if we do not support that it might mess up the user experience.

astro-friedel added the bug label Oct 4, 2023

benclifford added the safe-exit label Oct 4, 2023

benclifford added the MPI label Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running multiple MPI jobs on a single node has issues #2900

Running multiple MPI jobs on a single node has issues #2900

astro-friedel commented Oct 4, 2023

benclifford commented Oct 4, 2023

benclifford commented Oct 4, 2023

benclifford commented Oct 12, 2023

yadudoc commented Oct 12, 2023

Running multiple MPI jobs on a single node has issues #2900

Running multiple MPI jobs on a single node has issues #2900

Comments

astro-friedel commented Oct 4, 2023

benclifford commented Oct 4, 2023

benclifford commented Oct 4, 2023

benclifford commented Oct 12, 2023

yadudoc commented Oct 12, 2023