Skip to content

Commit

Permalink
adding first shot at tests with minikube (#13)
Browse files Browse the repository at this point in the history
* adding first shot at tests with minikube
Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch authored Jan 14, 2023
1 parent 5040f49 commit 116774f
Show file tree
Hide file tree
Showing 16 changed files with 415 additions and 18 deletions.
39 changes: 38 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ on:

jobs:
formatting:
name: Formatting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand All @@ -24,3 +23,41 @@ jobs:
source activate black
pip install -r .github/dev-requirements.txt
pre-commit run --all-files
test-runs:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
test: ["lammps"]

steps:
- name: Clone the code
uses: actions/checkout@v3

- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: ^1.18

- name: Install flux-cloud
run: |
conda create --quiet --name fc jinja2
export PATH="/usr/share/miniconda/bin:$PATH"
source activate fc
pip install .[all]
- name: Start minikube
uses: medyagh/setup-minikube@697f2b7aaed5f70bf2a94ee21a4ec3dde7b12f92 # v0.0.9

- name: Test ${{ matrix.test }}
env:
name: ${{ matrix.test }}
run: |
export PATH="/usr/share/miniconda/bin:$PATH"
source activate fc
export SHELL=/bin/bash
eval $(minikube -p minikube docker-env)
# We need to delete the minikube cluster to bring it up again
minikube delete
/bin/bash ./tests/test.sh ${name}
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
exclude: ".all-contributorsrc"
exclude: ".all-contributorsrc|tests"
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and **Merged pull requests**. Critical items to know are:
The versions coincide with releases on pip. Only major versions will be released as tags on Github.

## [0.0.x](https://github.com/converged-computing/flux-cloud/tree/main) (0.0.x)
- support for adding a job size to a job (to only run on that minicluster size) (0.1.1)
- bug with config edit, and adding support for settings availability zones (0.1.0)
- refactor of experiment design to handle separate minicluster size
- add support for running experiments with local (MiniKube)
Expand Down
52 changes: 51 additions & 1 deletion fluxcloud/main/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,20 @@ def run_timed(self, name, cmd, cleanup_func=None):
if res.returncode != 0:
raise ValueError("nonzero exit code, exiting.")

def run_command(self, cmd, cleanup_func=None):
"""
Run a timed command, and handle nonzero exit codes.
"""
logger.debug("\nRunning Command:" + " ".join(cmd))
res = utils.run_command(cmd)

# An optional cleanup function (also can run if not successful)
if cleanup_func is not None:
cleanup_func()

if res.returncode != 0:
raise ValueError("nonzero exit code, exiting.")

def __str__(self):
return "[flux-cloud-client]"

Expand Down Expand Up @@ -91,7 +105,14 @@ def experiment_is_run(self, setup, experiment):
continue

# Jobname is used for output
for jobname, _ in jobs.items():
for jobname, job in jobs.items():

# Do we want to run this job for this size and machine?
if not self.check_job_run(job, size, experiment):
logger.debug(
f"Skipping job {jobname} as does not match inclusion criteria."
)
continue

# Add the size
jobname = f"{jobname}-minicluster-size-{size}"
Expand Down Expand Up @@ -131,6 +152,28 @@ def down(self, *args, **kwargs):
"""
raise NotImplementedError

def check_job_run(self, job, size, experiment):
"""
Determine if a job is marked for a MiniCluster size.
"""
if "sizes" in job and size not in job["sizes"]:
return False
if "size" in job and job["size"] != size:
return False
if (
"machine" in job
and "machine" in experiment
and job["machine"] != experiment["machine"]
):
return False
if (
"machines" in job
and "machine" in experiment
and experiment["machine"] not in job["machines"]
):
return False
return True

@save_meta
def apply(self, setup, experiment):
"""
Expand Down Expand Up @@ -170,6 +213,13 @@ def apply(self, setup, experiment):
# Jobname is used for output
for jobname, job in jobs.items():

# Do we want to run this job for this size and machine?
if not self.check_job_run(job, size, experiment):
logger.debug(
f"Skipping job {jobname} as does not match inclusion criteria."
)
continue

# Add the size
jobname = f"{jobname}-minicluster-size-{size}"
job_output = os.path.join(experiment_dir, jobname)
Expand Down
15 changes: 9 additions & 6 deletions fluxcloud/main/clouds/local/minikube.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#
# SPDX-License-Identifier: Apache-2.0

import fluxcloud.utils as utils
from fluxcloud.logger import logger
from fluxcloud.main.client import ExperimentClient
from fluxcloud.main.decorator import save_meta
Expand Down Expand Up @@ -45,14 +46,16 @@ def pre_apply(self, experiment, jobname, job):
logger.warning('"image" not found in job, cannot pre-pull for MiniKube')
return

cmd = ["minikube", "ssh", "docker", "pull", job["image"]]
pull_id = "pull-minikube-image-" + job["image"].replace("/", "-")
# Does minikube already have the image pulled?
existing = utils.run_capture(["minikube", "image", "ls"], True)
if job["image"] in existing["message"]:
return

# cmd = ["minikube", "ssh", "docker", "pull", job["image"]]
cmd = ["minikube", "image", "load", job["image"]]

# Don't pull again if we've done it once
if pull_id in self.times:
logger.warning("Image already marked as pulled in run metadata.")
return
return self.run_timed(pull_id, cmd)
return self.run_command(cmd)

@save_meta
def down(self, setup, experiment=None):
Expand Down
17 changes: 10 additions & 7 deletions fluxcloud/main/clouds/local/scripts/cluster-create-minikube
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ ROOT=$(dirname $(dirname ${HERE}))
# Source shared helper scripts
. $ROOT/shared/scripts/helpers.sh

function install_operator() {
tmpfile=$(mktemp /tmp/flux-operator.XXXXXX.yaml)
rm -rf $tmpfile
run_echo wget -O $tmpfile https://raw.githubusercontent.com/${REPOSITORY}/${BRANCH}/examples/dist/flux-operator.yaml
kubectl apply -f $tmpfile
rm -rf $tmpfile
}

# Defaults
CLUSTER_NAME="flux-cluster"
CLUSTER_VERSION="1.23"
Expand Down Expand Up @@ -86,6 +94,7 @@ minikube status
retval=$?
if [[ "${retval}" == "0" ]]; then
print_blue "A MiniKube cluster already exists."
install_operator
echo
exit 0
fi
Expand All @@ -96,16 +105,10 @@ fi

# Create the cluster
run_echo minikube start --nodes=${SIZE}
install_operator

# Show nodes
run_echo kubectl get nodes

# Deploy the operator TODO should be variables here
tmpfile=$(mktemp /tmp/flux-operator.XXXXXX.yaml)
rm -rf $tmpfile
run_echo wget -O $tmpfile https://raw.githubusercontent.com/${REPOSITORY}/${BRANCH}/examples/dist/flux-operator.yaml
kubectl apply -f $tmpfile
rm -rf $tmpfile

run_echo kubectl get namespace
run_echo kubectl describe namespace operator-system
4 changes: 4 additions & 0 deletions fluxcloud/main/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@
"repeats": {"type": "number"},
"workdir": {"type": "string"},
"image": {"type": "string"},
"machine": {"type": "string"},
"machines": {"type": "array", "items": {"type": "string"}},
"size": {"type": "number"},
"sizes": {"type": "array", "items": {"type": "number"}},
},
"required": ["command"],
}
Expand Down
1 change: 1 addition & 0 deletions fluxcloud/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,6 @@
confirm_uninstall,
ensure_no_extra,
get_installdir,
run_capture,
run_command,
)
2 changes: 1 addition & 1 deletion fluxcloud/utils/terminal.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def check_install(software, quiet=True, command="--version"):
"""
cmd = [software, command]
try:
version = run_command(cmd, software)
version = run_command(cmd)
except Exception:
return False
if version:
Expand Down
2 changes: 1 addition & 1 deletion fluxcloud/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright 2022-2023 Lawrence Livermore National Security, LLC
# SPDX-License-Identifier: Apache-2.0

__version__ = "0.1.0"
__version__ = "0.1.1"
AUTHOR = "Vanessa Sochat"
EMAIL = "[email protected]"
NAME = "flux-cloud"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
LAMMPS (29 Sep 2021 - Update 2)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Reading data file ...
triclinic box = (0.0000000 0.0000000 0.0000000) to (22.326000 11.141200 13.778966) with tilt (0.0000000 -5.0260300 0.0000000)
1 by 1 by 1 MPI processor grid
reading atoms ...
304 atoms
reading velocities ...
304 velocities
read_data CPU = 0.003 seconds
Replicating atoms ...
triclinic box = (0.0000000 0.0000000 0.0000000) to (44.652000 22.282400 27.557932) with tilt (0.0000000 -10.052060 0.0000000)
1 by 1 by 1 MPI processor grid
bounding box image = (0 -1 -1) to (0 1 1)
bounding box extra memory = 0.03 MB
average # of replicas added to proc = 8.00 out of 8 (100.00%)
2432 atoms
replicate CPU = 0.000 seconds
Neighbor list info ...
update every 20 steps, delay 0 steps, check no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 11
ghost atom cutoff = 11
binsize = 5.5, bins = 10 5 6
2 neighbor lists, perpetual/occasional/extra = 2 0 0
(1) pair reax/c, perpetual
attributes: half, newton off, ghost
pair build: half/bin/newtoff/ghost
stencil: full/ghost/bin/3d
bin: standard
(2) fix qeq/reax, perpetual, copy from (1)
attributes: half, newton off, ghost
pair build: copy
stencil: none
bin: none
Setting up Verlet run ...
Unit style : real
Current step : 0
Time step : 0.1
Per MPI rank memory allocation (min/avg/max) = 215.0 | 215.0 | 215.0 Mbytes
Step Temp PotEng Press E_vdwl E_coul Volume
0 300 -113.27833 437.52122 -111.57687 -1.7014647 27418.867
10 299.38517 -113.27631 1439.2857 -111.57492 -1.7013813 27418.867
20 300.27107 -113.27884 3764.3739 -111.57762 -1.7012246 27418.867
30 302.21063 -113.28428 7007.6914 -111.58335 -1.7009363 27418.867
40 303.52265 -113.28799 9844.84 -111.58747 -1.7005186 27418.867
50 301.87059 -113.28324 9663.0443 -111.58318 -1.7000524 27418.867
60 296.67807 -113.26777 7273.7928 -111.56815 -1.6996137 27418.867
70 292.19999 -113.25435 5533.6428 -111.55514 -1.6992157 27418.867
80 293.58677 -113.25831 5993.4151 -111.55946 -1.6988533 27418.867
90 300.62636 -113.27925 7202.8651 -111.58069 -1.6985591 27418.867
100 305.38276 -113.29357 10085.748 -111.59518 -1.6983875 27418.867
Loop time of 29.9065 on 1 procs for 100 steps with 2432 atoms

Performance: 0.029 ns/day, 830.737 hours/ns, 3.344 timesteps/s
99.9% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 22.01 | 22.01 | 22.01 | 0.0 | 73.60
Neigh | 0.58707 | 0.58707 | 0.58707 | 0.0 | 1.96
Comm | 0.0092962 | 0.0092962 | 0.0092962 | 0.0 | 0.03
Output | 0.00033026 | 0.00033026 | 0.00033026 | 0.0 | 0.00
Modify | 7.2985 | 7.2985 | 7.2985 | 0.0 | 24.40
Other | | 0.001511 | | | 0.01

Nlocal: 2432.00 ave 2432 max 2432 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 10685.0 ave 10685 max 10685 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 823958.0 ave 823958 max 823958 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 823958
Ave neighs/atom = 338.79852
Neighbor list builds = 5
Dangerous builds not checked
Total wall time: 0:00:30
Loading

0 comments on commit 116774f

Please sign in to comment.