Feature/single node check #500

rfhaque · 2024-12-14T17:37:49Z

Description

~~This PR adds a check for the single node experiment~~
~~- The single_node variant [default=True] ONLY asserts that the given experiment must run on 1 node~~
- This check is enforced in the determine_allocation method of the allocation modifier. If single node is enabled and the experiment requires/requests more than 1 node, the ramble setup throws an error
~~- strong/weak scaling modes are decoupled from the single node mode (it's possible to scale on a single node)~~
~~- An experiment trying to run on more than one node must explicitly specify ~single_node~~

Take 2:

We add an integer variant max_node_limit to the system specs (lib/benchpark/system.py) which is the max number of nodes allowed for an experiment on that system instantiation. Default value 1, value of 0 means no limits
In the experiment setup (lib/benchpark/experiment.py), we add a filter clause '{n_nodes} > 0 and {n_nodes} <= {max_node_limit}'.
The exclude clause filters all experiments that require more than max_node_limit nodes during ramble setup. Currently, it's not an error if zero experiments satisfy the max_node_limit

./bin/benchpark system init --dest=tioga llnl-elcapitan rocm=5.5.1 compiler=cce max_node_limit=1
./bin/benchpark experiment init --dest ./saxpy saxpy +openmp~cuda~rocm caliper=mpi,time
./bin/benchpark setup ./saxpy ./tioga workspace/

Dependencies: FIXME:Add a list of any dependencies.

(Potentially) Fixes #492

Type of Change

{ } Adding a system, benchmark, or experiment
{ } Modifying an existing system, benchmark, or experiment
{ } Documentation update
{ } Build/CI update
{ } Benchpark core functionality

Checklist:

If adding/modifying a system:

{ } Create a new directory for the system and a new system.py file
{ } Add a new dry run unit test in .github/workflows
{ } System appears in System Specifications table in docs catalogue section

If adding/modifying a benchpark:

{ } Add a new application.py and (maybe) package.py under a new directory
for this benchmark
{ } Configure an experiment
{ } Benchmark appears in Benchmarks and Experiments table in docs catalogue
section

If adding/modifying a experiment:

{ } Extend experiment.py under existing directory for specific benchmark
{ } Define a single node and multi-node experiments

If adding/modifying core functionality:

{ } Update docs
{ } Update .github/workflows and .gitlab/ci unit tests (if needed)

modifiers/allocation/modifier.py

…e_check

pearce8 · 2025-01-06T21:14:19Z

@rfhaque what is the status of this PR? How does it currently work? What docs would you need to add?

When you want to run strong or weak scaling, do you need to explicitly disable the single-node test because only one variant is possible at a time? If that is the case, we need to document it well.

rfhaque added 2 commits December 14, 2024 09:25

Add check for single node experiment

4a07c3b

merge with develop

8083b24

rfhaque requested review from scheibelp and pearce8 December 14, 2024 17:37

github-actions bot added the experiment New or modified experiment label Dec 14, 2024

lint

040bdfb

rfhaque commented Dec 14, 2024

View reviewed changes

modifiers/allocation/modifier.py Outdated Show resolved Hide resolved

rfhaque added 3 commits December 14, 2024 09:43

lint

b0129bf

Remove scaling mode checks in experiments

4f61c94

workflows

2a22bbd

github-actions bot added the ci Involving Project CI & Unit Tests label Dec 14, 2024

rfhaque and others added 15 commits December 14, 2024 10:24

Update run.yml

f167afc

Fix kripke single node case

3be9c78

single node mode

bbecd3b

workflows

b59bba8

workflows

a4d8404

workflows

eb68570

workflows

28351d7

Merge remote-tracking branch 'origin/develop' into feature/single_nod…

5c8bba3

…e_check

Add max_node_limit to system specs

4af1c3a

Fix fugaku system id in workflow

fe01f03

lint

d34513a

lint

a1ddec4

Merge remote-tracking branch 'origin/develop' into feature/single_nod…

3252038

…e_check

Remove SingleNode class

9828525

Merge branch 'develop' into feature/single_node_check

9d17de6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/single node check #500

Feature/single node check #500

rfhaque commented Dec 14, 2024 •

edited by slabasan

Loading

pearce8 commented Jan 6, 2025

Feature/single node check #500

Are you sure you want to change the base?

Feature/single node check #500

Conversation

rfhaque commented Dec 14, 2024 • edited by slabasan Loading

Description

Type of Change

Checklist:

pearce8 commented Jan 6, 2025

rfhaque commented Dec 14, 2024 •

edited by slabasan

Loading