Skip to content

Commit

Permalink
Merge branch 'develop' into restart_gdb
Browse files Browse the repository at this point in the history
  • Loading branch information
rstyd committed Apr 16, 2024
2 parents 62dd333 + 46d2e40 commit 0f45c4b
Show file tree
Hide file tree
Showing 17 changed files with 268 additions and 70 deletions.
48 changes: 11 additions & 37 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -1,44 +1,18 @@
# Based on https://github.com/actions/starter-workflows/blob/main/pages/static.yml
name: Publish docs
name: Build Docs

on:
workflow_dispatch: {}
push:
branches: [main]

# Needed for publishing to Github Pages
permissions:
contents: read
pages: write
id-token: write

concurrency:
group: "pages"
cancel-in-progress: true
branches: [main, develop]
pull_request:
types: [opened, synchronize, edited]
branches: [main, develop]

jobs:
publish:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
docs:
name: Build Docs
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: BEE Install
run: |
sudo apt-get update
sudo apt-get install python3 python3-venv curl build-essential \
zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev \
libreadline-dev libffi-dev libbz2-dev libyaml-dev
curl -sSL https://install.python-poetry.org | python3 -
poetry update
poetry install
- name: Build Docs
run: |
poetry run make -C docs/sphinx html
- name: Upload
uses: actions/upload-pages-artifact@v1
with:
path: docs/sphinx/_build/html
- name: Publish
id: deployment
uses: actions/deploy-pages@v1
- uses: actions/checkout@v4
- name: Install BEE and Build Docs
run: ./ci/docs.sh
2 changes: 1 addition & 1 deletion .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
# available on 20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install and Configure
run: |
. ./ci/env.sh
Expand Down
35 changes: 35 additions & 0 deletions .github/workflows/publish-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Based on https://github.com/actions/starter-workflows/blob/main/pages/static.yml
name: Publish docs

on:
push:
branches: [main]

# Needed for publishing to Github Pages
permissions:
contents: read
pages: write
id-token: write

concurrency:
group: "pages"
cancel-in-progress: true

jobs:
publish:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: BEE Install and Build Docs
run: |
./ci/docs.sh
- name: Upload
uses: actions/upload-pages-artifact@v1
with:
path: docs/sphinx/_build/html
- name: Publish
id: deployment
uses: actions/deploy-pages@v1
2 changes: 1 addition & 1 deletion .github/workflows/pylama.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
name: PyLama Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Lint
run: |
pip install pylama==8.4.1 pyflakes==3.0.1 pylint==2.15.9 pydocstyle==6.1.1 2>&1 >/dev/null
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
# available on 20.04
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Install and Configure
run: |
. ./ci/env.sh
Expand Down
11 changes: 11 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,14 @@ Major features: adds the capability to include post- and pre-processing scripts
- Fix Checkpoint/Restart capability
- Add testing for Checkpoint/Restart
- Adds capability to reset the beeflow files (deletes all artifacts) especially useful for developers.

0.1.8

Features: Fixes sphinx version to enable publishing documentation, now includes
CI for testing documentation builds

- Update sphinx version, update actions and release docs (#812)
- Add separate action for testing docs
- Fix beeflow config new error


21 changes: 17 additions & 4 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,30 @@ Verify all current changes in develop run correctly on nightly tests.
5. Once merged, on github web interface create a release and tag based on main branch
that matches the version in pyproject.toml
6. Follow step 2 but uncheck Allow specified actors to bypass and don't forget save
7. Finally, on the main branch, first run a ``poetry build`` and then a
``poetry publish``. The second command will ask for a username and password (You may need to add the --username --password options to ``poetry build``)
for PyPI.
7. Log into your PYPI account and get a token for hpc-beeflow via:

> Your projects > hpc-beeflow > Manage > Settings > Create a token

8. Finally, on the command line: checkout the main branch and make sure you pull the latest verison

Then publish by:
``poetry build``

``poetry publish -u __token__ -p pypi-<long-token>``


Check the documentation at: `https://lanl.github.io/BEE/ <https://lanl.github.io/BEE/>`_

Also upgrade the pip version in your python or anaconda environment and check the version:

`` pip install --upgrade pip``

`` pip install --upgrade hpc-beeflow``

**WARNING**: Once a version is pushed to PyPI, it cannot be undone. You can
'delete' the version from the package settings, but you can no longer publish
an update to that same version.

8. After the version is published change the version in develop to a pre-release of the next version
(example new version will be 0.1.x edit pyproject.toml version to be 0.1.xrc1
(example new version will be 0.1.x edit pyproject.toml version to be 0.1.Xdev

3 changes: 1 addition & 2 deletions beeflow/client/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@
from beeflow.common.db.bdb import connect_db
from beeflow.wf_manager.common import dep_manager

db_path = wf_utils.get_db_path()


class ComponentManager:
"""Component manager class."""
Expand Down Expand Up @@ -467,6 +465,7 @@ def stop(query='yes'):

def kill_active_workflows(active_states, workflow_list):
"""Kill workflows with active states."""
db_path = wf_utils.get_db_path()
db = connect_db(wfm_db, db_path)
success = True
for name, wf_id, state in workflow_list:
Expand Down
7 changes: 6 additions & 1 deletion beeflow/common/integration/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,11 @@ def task_states(self):
"""Get the task states of the workflow."""
return bee_client.query(self.wf_id)[1]

def get_task_state_by_name(self, name):
"""Get the state of a task by name."""
task_states = self.task_states
return [task_state for _, task_name, task_state in task_states if task_name == name][0]

def cleanup(self):
"""Clean up any leftover workflow data."""
# Remove the generated tarball
Expand Down Expand Up @@ -243,5 +248,5 @@ def check_completed(workflow):

def check_workflow_failed(workflow):
"""Ensure that the workflow completed in a Failed state."""
ci_assert(workflow.status == 'Failed',
ci_assert(workflow.status == 'Archived/Failed',
f'workflow did not fail as expected (final status: {workflow.status})')
21 changes: 21 additions & 0 deletions beeflow/common/integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,27 @@ def build_failure(outer_workdir):
f'task was not in state BUILD_FAIL as expected: {task_state}')


@TEST_RUNNER.add()
def dependent_tasks_fail(outer_workdir):
"""Test that dependent tasks don't run after a failure."""
workdir = os.path.join(outer_workdir, uuid.uuid4().hex)
os.makedirs(workdir)
workflow = utils.Workflow('failure-dependent-tasks',
'ci/test_workflows/failure-dependent-tasks',
main_cwl='workflow.cwl', job_file='input.yml',
workdir=workdir, containers=[])
yield [workflow]
utils.check_workflow_failed(workflow)
# Check each task state
fail_state = workflow.get_task_state_by_name('fail')
utils.ci_assert(fail_state == 'FAILED',
f'task fail did not fail as expected: {fail_state}')
for task in ['dependent0', 'dependent1', 'dependent2']:
task_state = workflow.get_task_state_by_name(task)
utils.ci_assert(task_state == 'DEP_FAIL',
f'task {task} did not get state DEP_FAIL as expected: {task_state}')


@TEST_RUNNER.add(ignore=True)
def checkpoint_restart(outer_workdir):
"""Test the clamr-ffmpeg checkpoint restart workflow."""
Expand Down
5 changes: 4 additions & 1 deletion beeflow/common/wf_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,10 @@ def command(self):
nonpositional_inputs = []
for input_ in self.inputs:
if input_.value is None:
raise ValueError("trying to construct command for task with missing input value")
raise ValueError(
("trying to construct command for task with missing input value "
f"(id: {input_.id})")
)

if input_.position is not None:
positional_inputs.append(input_)
Expand Down
12 changes: 6 additions & 6 deletions beeflow/enhanced_client/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

46 changes: 32 additions & 14 deletions beeflow/wf_manager/resources/wf_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,16 @@
db_path = wf_utils.get_db_path()


def archive_workflow(db, wf_id):
def archive_workflow(db, wf_id, final_state=None):
"""Archive a workflow after completion."""
# Archive Config
workflow_dir = wf_utils.get_workflow_dir(wf_id)
shutil.copyfile(os.path.expanduser("~") + '/.config/beeflow/bee.conf',
workflow_dir + '/' + 'bee.conf')

db.workflows.update_workflow_state(wf_id, 'Archived')
wf_utils.update_wf_status(wf_id, 'Archived')
wf_state = f'Archived/{final_state}' if final_state is not None else 'Archived'
db.workflows.update_workflow_state(wf_id, wf_state)
wf_utils.update_wf_status(wf_id, wf_state)

bee_workdir = wf_utils.get_bee_workdir()
archive_dir = os.path.join(bee_workdir, 'archives')
Expand All @@ -40,6 +41,18 @@ def archive_workflow(db, wf_id):
subprocess.call(['tar', '-czf', archive_path, wf_id], cwd=workflows_dir)


def set_dependent_tasks_dep_fail(db, wfi, wf_id, task):
"""Recursively set all dependent task states of this task to DEP_FAIL."""
# List of tasks whose states have already been updated
set_tasks = [task]
while len(set_tasks) > 0:
dep_tasks = wfi.get_dependent_tasks(set_tasks.pop())
for dep_task in dep_tasks:
wfi.set_task_state(dep_task, 'DEP_FAIL')
db.workflows.update_task_state(dep_task.id, wf_id, 'DEP_FAIL')
set_tasks.extend(dep_tasks)


class WFUpdate(Resource):
"""Class to interact with an existing workflow."""

Expand Down Expand Up @@ -109,13 +122,14 @@ def put(self):
wf_utils.schedule_submit_tasks(wf_id, tasks)
return make_response(jsonify(status='Task {task_id} restarted'))

if job_state in ('COMPLETED', 'FAILED'):
if job_state == 'COMPLETED':
for output in task.outputs:
if output.glob is not None:
wfi.set_task_output(task, output.id, output.glob)
else:
wfi.set_task_output(task, output.id, "temp")
tasks = wfi.finalize_task(task)
log.info(f'next tasks to run: {tasks}')
wf_state = wfi.get_workflow_state()
if tasks and wf_state != 'PAUSED':
wf_utils.schedule_submit_tasks(wf_id, tasks)
Expand All @@ -126,19 +140,23 @@ def put(self):
archive_workflow(db, wf_id)
pid = db.workflows.get_gdb_pid(wf_id)
dep_manager.kill_gdb(pid)
if wf_state == 'FAILED':
log.info("Workflow failed")
log.info("Shutting down GDB")
wf_id = wfi.workflow_id
archive_workflow(db, wf_id)
pid = db.workflows.get_gdb_pid(wf_id)
dep_manager.kill_gdb(pid)

# If the job failed and it doesn't include a checkpoint-restart hint,
# then fail the entire workflow
if job_state == 'FAILED':
set_dependent_tasks_dep_fail(db, wfi, wf_id, task)
log.info("Workflow failed")
log.info("Shutting down GDB")
wf_id = wfi.workflow_id
archive_workflow(db, wf_id, final_state='Failed')
pid = db.workflows.get_gdb_pid(wf_id)
dep_manager.kill_gdb(pid)

if job_state == 'BUILD_FAIL':
log.error(f'Workflow failed due to failed container build for task {task.name}')
wfi.set_workflow_state('Failed')
wf_utils.update_wf_status(wf_id, 'Failed')
db.workflows.update_workflow_state(wf_id, 'Failed')
archive_workflow(db, wf_id, final_state='Failed')
pid = db.workflows.get_gdb_pid(wf_id)
dep_manager.kill_gdb(pid)

resp = make_response(jsonify(status=(f'Task {task_id} belonging to WF {wf_id} set to'
f'{job_state}')), 200)
Expand Down
10 changes: 10 additions & 0 deletions ci/docs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/sh
# Install BEE and build the docs in CI.
sudo apt-get update
sudo apt-get install python3 python3-venv curl build-essential \
zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev \
libreadline-dev libffi-dev libbz2-dev libyaml-dev
curl -sSL https://install.python-poetry.org | python3 -
poetry update
poetry install
poetry run make -C docs/sphinx html
2 changes: 2 additions & 0 deletions ci/test_workflows/failure-dependent-tasks/input.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
fname: some_file_that_doesnt_exist
cat_argument: -n
Loading

0 comments on commit 0f45c4b

Please sign in to comment.