Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-experiment jobs #485

Open
3 tasks
pearce8 opened this issue Dec 9, 2024 · 0 comments
Open
3 tasks

Multi-experiment jobs #485

pearce8 opened this issue Dec 9, 2024 · 0 comments
Assignees

Comments

@pearce8
Copy link
Collaborator

pearce8 commented Dec 9, 2024

Let's start with single node experiments only for now. For CI, we need to launch multiple experiments as a single scheduler job.

  • Generate the scheduler command for a single node (without the launcher command)
  • Generate a launcher command for an individual experiment (without the scheduler command)
  • Use ramble -D . on --where (filtering) to get specific experiments into a single job (e.g., single node)

@scheibelp would it be possible to refactor the allocation modifier to generate the scheduler and the launcher command separately?

multi_job_submit.sh

#/bin/bash
#SBATCH -N $1

WORKSPACES="workspace1
workspace2
workspace3"

for WRKSPC in $WORKSPACES;
do
  ramble -D $WRKSPC on --where '{n_nodes} == $SLURM_JOB_NUM_NODES' --executor='{execute_experiment}'
done

Usage:

sbatch multi_job_submit.sh 2
sbatch multi_job_submit.sh 4

Current template:

#!/bin/bash
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0

{allocation_directives}

cd {experiment_run_dir}

{pre_exec}
{command}
{post_exec}

No directive template:

#!/bin/bash
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0

cd {experiment_run_dir}

{pre_exec}
{command}
{post_exec}

directive only template:

#!/bin/bash
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0

{allocation_directives}

Ramble exec template:

#/bin/bash
#SBATCH -N {n_nodes}

ramble -D . on --where '\{n_nodes\} == $SLURM_JOB_NUM_NODES' --executor='\{execute_experiment\}'
ramble:
  applications:
    hostname:
      workloads:
        parallel:
          experiments:
            wrapper_job_{n_nodes}:
              variables:
                n_nodes: [1, 2, 4, 8]
  ... other experiments ...
    saxpy: (1, 2, 4, 8, ... nodes)
  
ramble workpsace setup
ramble on --where '"{application_name}" == "hostname"' --executor="sbatch {ramble_exec}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants