Skip to content

Commit

Permalink
feat: finishing up group in group example
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch committed Apr 29, 2024
1 parent 3ff140f commit 677b9ca
Show file tree
Hide file tree
Showing 10 changed files with 139 additions and 42 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: Check Spelling
uses: crate-ci/typos@7ad296c72fa8265059cc03d1eda562fbdfcd6df2 # v1.9.0
with:
files: ./README.md ./spec.md
files: ./README.md ./spec-1.md

- name: Lint and format Python code
run: |
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,6 @@ jobs:
run: |
which jobspec
flux start jobspec run ./examples/hello-world-jobspec.yaml
flux start jobspec run ./examples/hello-world-batch.yaml
flux start jobspec run ./examples/group-with-group.yaml
flux start jobspec run ./examples/task-with-group.yaml
flux start python3 ./examples/flux/receive-job.py
84 changes: 80 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ It is a transformational layer, or a simple language that converts steps needed
for a specific clusters scheduler. We are currently prototyping off of the Flux JobSpec, and intent
to derive some variant between that and something more. It is JobSpec... the next generation! 🚀️

⭐️ [Read the specification](spec.md) ⭐️
⭐️ [Read the specification](spec-1.md) ⭐️

Some drafts are included in [docs/drafts](docs/drafts)

## Usage

Expand Down Expand Up @@ -59,18 +60,21 @@ it there for the time being, mostly because it looks nicer. I'm sure someone wil
jobspec run ./examples/hello-world-jobspec.yaml
```
```console
=> flux workload
=> flux submit ƒDjkLvNF9 OK
=> flux submit ƒDjzAyfhh OK
```

Add debug to see commands submit

```bash
jobspec --debug run ./examples/hello-world-jobspec.yaml
```
```console
=> flux submit ƒJYffwYkP OK
=> flux workload
=> flux submit ƒ2i6n8XHSP OK
flux submit --job-name task-1 -N 1 bash -c echo Starting task 1; sleep 3; echo Finishing task 1
=> flux submit ƒJYw8t4F1 OK
=> flux submit ƒ2i6qafcUw OK
flux submit --job-name task-2 -N 1 bash -c echo Starting task 2; sleep 3; echo Finishing task 2
```

Expand All @@ -81,7 +85,78 @@ jobspec run -t flux ./examples/hello-world-jobspec.yaml
jobspec run --transformer flux ./examples/hello-world-jobspec.yaml
```

#### 3. Python Examples
#### 3. Nested Examples

Try running some advanced examples. Here is a group within a task.

```bash
jobspec --debug run ./examples/task-with-group.yaml
```
```console
=> flux workload
=> flux submit ƒ2iiMFBqxT OK
flux submit --job-name task-1 -N 1 bash -c echo Starting task 1; sleep 3; echo Finishing task 1
=> flux batch ƒ2iiQpk7Qj OK
#!/bin/bash
flux submit --job-name task-2-task-0 --flags=waitable bash -c echo Starting task 2; sleep 3; echo Finishing task 2
flux job wait --all
flux job submit /tmp/jobspec-.bvu1v7vk/jobspec-5y9n9u0y
```

That's pretty intuitive, because we see that there is a flux submit first, followed by a batch that has a single task run. The last line "flux submit" shows how we are submitting the script that was just shown.
What about a group within a group?

```bash
$ jobspec --debug run ./examples/group-with-group.yaml
```
```console
=> flux workload
=> flux batch ƒ2jEE7NPXM OK
#!/bin/bash
flux submit --job-name group-1-task-0 --flags=waitable bash -c echo Starting task 1 in group 1; sleep 3; echo Finishing task 1 in group 1
flux job submit --flags=waitable /tmp/jobspec-.ljjiywaa/jobspec-kb5y5lsl
# rm -rf /tmp/jobspec-.ljjiywaa/jobspec-kb5y5lsl
flux job wait --all
flux job submit /tmp/jobspec-.45jezez5/jobspec-8dr1udhx
```

The UI here needs some work, but here is what we see above.

```console
# This is the start of the workload - the entire next gen jobspec always produces one workload
=> flux workload

# This is the top level group that has the other group within - it's the top level "flux batch" that we submit
=> flux batch ƒ2e7Ay6jvo OK

# This is showing the first script that is written
#!/bin/bash

# Here is the first job submit, now namespaced to group-1 (if the user, me, didn't give it a name)
flux submit --job-name group-1-task-0 --flags=waitable bash -c echo Starting task 1 in group 1; sleep 3; echo Finishing task 1 in group 1

# This is submitting group-2 - the jobspec is written in advance
flux job submit --flags=waitable /tmp/jobspec-.ljjiywaa/jobspec-kb5y5lsl

# And this will be how we clean it up as we go - always after it's submit. I'm commenting it out for now because rm -rf makes me nervous!
# rm -rf /tmp/jobspec-.ljjiywaa/jobspec-kb5y5lsl

# This is the actual end of the batch script
flux job wait --all

# This is showing submitting the batch script above, kind of confusing because it looks like it's within it (it's not, just a bad UI for now)
flux job submit /tmp/jobspec-.45jezez5/jobspec-8dr1udhx
```

And because I didn't clean it up, here is the contents of the batch in the batch for group-2

```bash
#!/bin/bash
flux submit --job-name group-2-task-0 --flags=waitable bash -c echo Starting task 1 in group 2; sleep 3; echo Finishing task 1 in group 2
flux job wait --all
```

#### 4. Python Examples

It could also be the case that you want something running inside a lead broker instance to receive Jobspecs incrementally and then
run them. This Python example can help with that by showing how to accomplish the same, but from within Python.
Expand All @@ -90,6 +165,7 @@ run them. This Python example can help with that by showing how to accomplish th
python3 ./examples/flux/receive-job.py
```
```console
=> flux workload
=> flux submit ƒKCJG2ESB OK
=> flux submit ƒKCa5iZsd OK
```
Expand Down
File renamed without changes.
27 changes: 27 additions & 0 deletions examples/group-with-group.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
version: 1

# This is an example of a group with a nested group,
# which means we would have a flux batch within a flux batch!
resources:
common:
count: 1
type: node

groups:
- name: group-1
resources: common
tasks:
- command:
- bash
- -c
- "echo Starting task 1 in group 1; sleep 3; echo Finishing task 1 in group 1"
- group: group-2

- name: group-2
depends_on: ["group-1"]
resources: common
tasks:
- command:
- bash
- -c
- "echo Starting task 1 in group 2; sleep 3; echo Finishing task 1 in group 2"
28 changes: 0 additions & 28 deletions examples/hello-world-batch.yaml

This file was deleted.

4 changes: 4 additions & 0 deletions jobspec/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ def parse(self, jobspec):
"""
raise NotImplementedError

def announce(self):
pass

def run(self, filename):
"""
Run the transformer
Expand All @@ -74,6 +77,7 @@ def run(self, filename):
# Get validated transformation steps
# These will depend on the transformer logic
steps = self.parse(jobspec)
self.announce()

# Run each step to submit the job, and that's it.
for step in steps:
Expand Down
13 changes: 8 additions & 5 deletions jobspec/transformer/flux/steps.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import copy
import json
import os
import uuid
Expand Down Expand Up @@ -160,18 +161,20 @@ def write_tasks_script(self):
"""
Generate the batch script.
"""
data = script_prefix
data = copy.deepcopy(script_prefix)
for task in self.tasks:
if task.name == "batch":
cmd, _ = self.generate_command(waitable=True)
cmd, _ = task.generate_command(waitable=True)
data.append(" ".join(cmd))
# This is the jobspec
data.append("# rm -rf {cmd[-1]}")
data.append(" ".join(task.generate_command(waitable=True)))
data.append(f"# rm -rf {cmd[-1]}")
else:
data.append(" ".join(task.generate_command(waitable=True)))

# Ensure all jobs are waited on
data.append("flux job wait --all")
return {"mode": 33216, "data": "\n".join(data), "encoding": "utf-8"}
script = "\n".join(data)
return {"mode": 33216, "data": script, "encoding": "utf-8"}

def generate_command(self, waitable=False):
"""
Expand Down
15 changes: 14 additions & 1 deletion jobspec/transformer/flux/workload.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import jobspec.core as js
import jobspec.core.resources as rcore
from jobspec.logger import LogColors
from jobspec.runner import TransformerBase

from .steps import batch, stage, submit
Expand All @@ -20,6 +21,13 @@ class FluxWorkload(TransformerBase):
name = "flux"
description = "Flux Framework workload"

def announce(self):
"""
Announce prints an additional prefix during run
"""
prefix = "flux workload".ljust(15)
print(f"=> {LogColors.OKCYAN}{prefix}{LogColors.ENDC}")

def parse(self, jobspec):
"""
Parse the jobspec into tasks for flux.
Expand Down Expand Up @@ -47,6 +55,11 @@ def parse(self, jobspec):
# We copy because otherwise the dict changes size when the task parser removes
groups = copy.deepcopy(self.group_lookup)
for name, group in groups.items():
# Check if the group was already removed by another group task,
# and don't run if it was!
if name not in self.group_lookup:
continue

self.tasks.insert(
0, self.parse_group(group, name, self.resources, requires=self.requires)
)
Expand Down Expand Up @@ -161,7 +174,7 @@ def parse_tasks(self, tasks, resources=None, attributes=None, requires=None, nam
steps = []
for i, task in enumerate(tasks):
# Create a name based on the index or the task name
name = task.get("name") or f"task-{i}"
name = task.get("name") or f"{name_prefix}task-{i}"

# Case 1: We found a group! Parse here and add to steps
group_name = task.get("group")
Expand Down
5 changes: 3 additions & 2 deletions spec-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ For this example, let's look at the "Mini Mummi" workflow, which requires:
4. Within the top level batch, submitting other jobs to run training
5. And doing the same for testing.

While we could imagine another level of nexting (for example, the machine learning tasks each being a group with a subset of tasks) let's start with this design for now. It might look like this. Note that the requires (and other) sections are removed for brevity:
While we could imagine another level of nesting (for example, the machine learning tasks each being a group with a subset of tasks) let's start with this design for now. It might look like this. Note that the requires (and other) sections are removed for brevity:

```yaml
version: 1
Expand Down Expand Up @@ -383,7 +383,7 @@ groups:
- command: ["pennant", "/opt/pennant/test/params.pnt"]
depends_on: ["build"]
attributes:
envirionment:
environment:
LD_LIBRARY_PATH: /usr/local/cuda/lib
```

Expand Down Expand Up @@ -624,6 +624,7 @@ Additional properties and attributes we might want to consider

- Next step - PR to add support for dependency.name to flux-core. I see where this needs to be added (and the logic) and would learn a lot from a hackathon (where I can lead and write the code)
- Would like to get it supported in flux core first, but if not possible need to consider using jobids here (not ideal)
- Environment still needs to be added to the implementation, along with using requires and attributes meaningfully.
- for groups in groups (batches) along with submit we need logic to wait for completion before the instance cleans up.
- user: variables specific to the user
- parameters: task parameters or state that inform resource selection
Expand Down

0 comments on commit 677b9ca

Please sign in to comment.