Skip to content

Commit

Permalink
fixup! waiting-for-jobs: add new guide
Browse files Browse the repository at this point in the history
  • Loading branch information
chu11 committed Mar 28, 2023
1 parent 720ab1c commit 3360364
Showing 1 changed file with 13 additions and 11 deletions.
24 changes: 13 additions & 11 deletions jobs/waiting-for-jobs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ The most basic way to wait for a job to complete on a submitted job is the ``--w
The above command submits a job that simply sleeps for 30 seconds on one processor (``-n1``) and then runs ``/bin/false``. The :ref:`jobid <fluid>` is immediately output, but the command won't return until the 30 second job has completed.

After the command has finished we print the exit code from ``flux submit``. You'll notice the exit code is ``1``, which is the final exit code of the job, which in this case was ``1`` because we ran ``/bin/false``.
After the command has finished we print the exit code from ``flux submit``, which is ``1``, because we ran ``/bin/false``.

---------------
Flux Job Status
---------------

In most cases, you do not want to sit and wait for the current job submission to complete. You would like to do other things, such as submit more jobs, and then wait for those specific jobs to complete.
In most cases, you do not want to sit and wait for the current job submission to complete. You would like to do other things, such as submit more jobs, and then wait for specific jobs to complete.

The ``flux job status`` command is the most basic way to wait for a specific job, based on jobid, to complete. Pass it one or more jobids to wait on, and ``flux job status`` will return once all of the jobs have completed. It will exit with largest exit code from any of the jobids specified. If the job(s) have already completed, ``flux job status`` returns immediately. It can be run as many times as the user would like against the same jobid.
The ``flux job status`` command is the most basic way to wait for a specific job, based on jobid, to complete. After submitting all the jobs you want, pass ``flux job status`` one or more jobids to wait on. ``flux job status`` will return after all of the jobs have completed and exit with largest exit code from the jobids specified. If the job(s) have already completed, ``flux job status`` returns immediately. It can be run as many times as the user would like against the same jobid(s).

Here are several examples. In this first one, we submit a simple job that sleeps for 30 seconds then runs ``/bin/true``. Afterwards, we pass the jobid to ``flux job status`` and wait for it to return when the job has finished. After it has completed we can see that the exit code from ``flux job status`` is ``0``, as we expect from ``/bin/true``.

Expand Down Expand Up @@ -115,6 +115,8 @@ Perhaps the biggest advantage of ``flux job wait`` is that apriori knowledge of
In this above example, we submit three jobs, sleeping for 60, 45, and 30 seconds respectively before running ``/bin/true``. We then run ``flux job wait`` without any inputs. You'll notice the jobids for the ``sleep 30`` job, then ``sleep 45`` job, then ``sleep 60`` job are returned in that order. Finally, without any jobs left running with the ``waitable`` flag, ``flux job wait`` indicates there are no more waitable jobs.

Using ``flux job wait`` in this way can be useful to post-process jobs as they complete and you don't necessarily care about the order in which jobs complete.

Another option is that all jobs can be waited on via the ``--all`` option to ``flux job wait``. Lets try that in the below example.

.. code-block:: console
Expand All @@ -134,25 +136,25 @@ Another option is that all jobs can be waited on via the ``--all`` option to ``f
This example is similar to the above, except one of the jobs runs ``/bin/false`` instead of ``/bin/true``. When ``flux job wait --all`` is executed, you'll notice a message output indicating that one job has failed (the one that ran ``/bin/false``). And similar to ``flux job status``, the exit code of ``1`` is returned due to the highest exit code of all the jobs.

The biggest disadvantage of ``flux job wait`` compared to ``flux job status`` is that jobs can only waited on once.
The biggest disadvantage of ``flux job wait`` compared to ``flux job status`` is that jobs can only be waited on once.

$ flux submit --flags waitable -n1 bash -c "sleep 30; /bin/true"
ƒBbk3qrdro
$ flux job wait ƒBbk3qrdro
$ flux job wait ƒBbk3qrdro
flux-job: invalid job id, or job may be inactive and not waitable
$ flux job wait
ƒBbk3qrdro
$ flux job wait
flux-job: there are no more waitable jobs

Here we've submitted yet another sleep job, and try to wait on the job twice with ``flux job wait``. As you can see, an error is returned on the second attempt to wait on the job.
Here we've submitted yet another sleep job. The first call to ``flux job wait`` waits for the job to complete. If we run the ``flux job wait`` command again, we're told there are no more waitable jobs.

You might be wondering, if you want to wait for a set of known jobids, is it better to use ``flux jobs status`` or ``flux job wait``? Generally speaking, ``flux job wait`` is faster and more efficient than ``flux job status``. It is especially more efficient with the ``--all`` option, instead of passing in a large list of jobids to ``flux job status``.
You might be wondering, if you want to wait for a set of known jobids, is it better to use ``flux jobs status`` or ``flux job wait``? Generally speaking, ``flux job wait`` is faster and more efficient than ``flux job status``. However, its primary advantages are use of the ``--all`` option and the fact that jobids do not need to be specified. If you want to wait for specific jobs regularly, you may wish to stick with ``flux job status``.

As summary conclusion, here are a list of the pros and cons of using ``flux job status`` vs ``flux job wait``.

Pros:

- ``flux job wait`` more efficient when waiting for a set of jobs
- ``flux job wait`` more efficient, especially with the ``--all`` option
- Jobids do not need to be specified to ``flux job wait``
- Easy to wait for all of your jobs to finish with the ``--all`` option

Cons:

Expand Down

0 comments on commit 3360364

Please sign in to comment.