Skip to content

Commit

Permalink
Merge pull request #1092 from OpenFreeEnergy/parallel_runs_docs
Browse files Browse the repository at this point in the history
Update docs for parallel repeat file structure
  • Loading branch information
jthorton authored Jan 29, 2025
2 parents e45b34e + 7081d12 commit ccc28db
Showing 1 changed file with 66 additions and 0 deletions.
66 changes: 66 additions & 0 deletions docs/guide/execution/quickrun_execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,72 @@ The ``quickrun`` command can be integrated into as:

openfe quickrun transformation.json -o results.json

Parallel execution of repeats with Quickrun
===========================================

Serial execution of multiple repeats of a transformation can be inefficient when simulation times are long.
Higher throughput can be achieved with parallel execution by running one repeat per HPC job. Most protocols are set up to
run three repeats in serial by default, but this can be changed by either:

1. Defining the protocol setting ``protocol_repeats`` - see the :ref:`protocol configuration guide <cookbook/choose_protocol.nblink>` for more details.
2. Using the ``openfe plan-rhfe-network`` (or ``plan-rbfe-network``) command line flag ``--n-protocol-repeats``.

Each transformation can then be executed multiple times via the
``openfe quickrun`` command to produce a set of repeats, however, you need to ensure to use unique results
files for each repeat to ensure they don't overwrite each other. We recommend using folders named ``results_x`` where x is 0-2
to store the repeated calculations as our :ref:`openfe gather <cli_gather>` command also supports this file structure.

Here is an example of a simple script that will create and submit a separate job script (``\*.job`` named file)
for every alchemical transformation (for the simplest SLURM use case) in a network running each repeat in parallel and writing the
results to a unique folder:

.. code-block:: bash
for file in network_setup/transformations/*.json; do
relpath="${file:30}" # strip off "network_setup/"
dirpath=${relpath%.*} # strip off final ".json"
jobpath="network_setup/transformations/${dirpath}.job"
if [ -f "${jobpath}" ]; then
echo "${jobpath} already exists"
exit 1
fi
for repeat in {0..2}; do
cmd="openfe quickrun ${file} -o results_${repeat}/${relpath} -d results_${repeat}/${dirpath} --n-protocol-repeats 1"
echo -e "#!/usr/bin/env bash\n${cmd}" > "${jobpath}"
sbatch "${jobpath}"
done
done
This should result in the following file structure after execution:

::

results_parallel/
├── results_0
│   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex
│   │   └── shared_RelativeHybridTopologyProtocolUnit-79c279f04ec84218b7935bc0447539a9_attempt_0
│   │   ├── checkpoint.nc
│   │   ├── simulation.nc
│   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json
├── results_1
│   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex
│   │   └── shared_RelativeHybridTopologyProtocolUnit-a3cef34132aa4e9cbb824fcbcd043b0e_attempt_0
│   │   ├── checkpoint.nc
│   │   ├── simulation.nc
│   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json
└── results_2
├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex
│   └── shared_RelativeHybridTopologyProtocolUnit-abb2b104151c45fc8b0993fa0a7ee0af_attempt_0
│   ├── checkpoint.nc
│   ├── simulation.nc
└── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json

The results of which can be gathered from the CLI using the ``openfe gather`` command, in this case you should direct
it to the root directory which includes the repeat results and it will automatically collate the information

::

openfe gather results_parallel

See Also
--------
Expand Down

0 comments on commit ccc28db

Please sign in to comment.