Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/single node check #500

Draft
wants to merge 21 commits into
base: develop
Choose a base branch
from
Draft

Feature/single node check #500

wants to merge 21 commits into from

Conversation

rfhaque
Copy link
Collaborator

@rfhaque rfhaque commented Dec 14, 2024

Description

This PR adds a check for the single node experiment
- The single_node variant [default=True] ONLY asserts that the given experiment must run on 1 node
- This check is enforced in the determine_allocation method of the allocation modifier. If single node is enabled and the experiment requires/requests more than 1 node, the ramble setup throws an error
- strong/weak scaling modes are decoupled from the single node mode (it's possible to scale on a single node)
- An experiment trying to run on more than one node must explicitly specify ~single_node

Take 2:

  • We add an integer variant max_node_limit to the system specs (lib/benchpark/system.py) which is the max number of nodes allowed for an experiment on that system instantiation. Default value 1, value of 0 means no limits
  • In the experiment setup (lib/benchpark/experiment.py), we add a filter clause '{n_nodes} > 0 and {n_nodes} <= {max_node_limit}'.
  • The exclude clause filters all experiments that require more than max_node_limit nodes during ramble setup. Currently, it's not an error if zero experiments satisfy the max_node_limit
./bin/benchpark system init --dest=tioga llnl-elcapitan rocm=5.5.1 compiler=cce max_node_limit=1
./bin/benchpark experiment init --dest ./saxpy saxpy +openmp~cuda~rocm caliper=mpi,time
./bin/benchpark setup ./saxpy ./tioga workspace/

Dependencies: FIXME:Add a list of any dependencies.

(Potentially) Fixes #492

Type of Change

  • { } Adding a system, benchmark, or experiment
  • { } Modifying an existing system, benchmark, or experiment
  • { } Documentation update
  • { } Build/CI update
  • { } Benchpark core functionality

Checklist:

If adding/modifying a system:

  • { } Create a new directory for the system and a new system.py file
  • { } Add a new dry run unit test in .github/workflows
  • { } System appears in System Specifications table in docs catalogue section

If adding/modifying a benchpark:

  • { } Add a new application.py and (maybe) package.py under a new directory
    for this benchmark
  • { } Configure an experiment
  • { } Benchmark appears in Benchmarks and Experiments table in docs catalogue
    section

If adding/modifying a experiment:

  • { } Extend experiment.py under existing directory for specific benchmark
  • { } Define a single node and multi-node experiments

If adding/modifying core functionality:

  • { } Update docs
  • { } Update .github/workflows and .gitlab/ci unit tests (if needed)

@rfhaque rfhaque requested review from scheibelp and pearce8 December 14, 2024 17:37
@github-actions github-actions bot added the experiment New or modified experiment label Dec 14, 2024
@github-actions github-actions bot added the ci Involving Project CI & Unit Tests label Dec 14, 2024
@pearce8
Copy link
Collaborator

pearce8 commented Jan 6, 2025

@rfhaque what is the status of this PR? How does it currently work? What docs would you need to add?

When you want to run strong or weak scaling, do you need to explicitly disable the single-node test because only one variant is possible at a time? If that is the case, we need to document it well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Involving Project CI & Unit Tests experiment New or modified experiment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Single node experiment
2 participants