Initial commit.

geerlingguy · Nov 9, 2022 · 67b04a9 · 67b04a9
commit 67b04a9
Show file tree

Hide file tree

Showing 13 changed files with 634 additions and 0 deletions.
diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml
@@ -0,0 +1,4 @@
+# These are supported funding model platforms
+---
+github: geerlingguy
+patreon: geerlingguy
diff --git a/.github/stale.yml b/.github/stale.yml
@@ -0,0 +1,56 @@
+# Configuration for probot-stale - https://github.com/probot/stale
+
+# Number of days of inactivity before an Issue or Pull Request becomes stale
+daysUntilStale: 90
+
+# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.
+# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.
+daysUntilClose: 30
+
+# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)
+onlyLabels: []
+
+# Issues or Pull Requests with these labels will never be considered stale. Set to `[]` to disable
+exemptLabels:
+  - pinned
+  - security
+  - planned
+
+# Set to true to ignore issues in a project (defaults to false)
+exemptProjects: false
+
+# Set to true to ignore issues in a milestone (defaults to false)
+exemptMilestones: false
+
+# Set to true to ignore issues with an assignee (defaults to false)
+exemptAssignees: false
+
+# Label to use when marking as stale
+staleLabel: stale
+
+# Limit the number of actions per hour, from 1-30. Default is 30
+limitPerRun: 30
+
+pulls:
+  markComment: |-
+    This pull request has been marked 'stale' due to lack of recent activity. If there is no further activity, the PR will be closed in another 30 days. Thank you for your contribution!
+
+    Please read [this blog post](https://www.jeffgeerling.com/blog/2020/enabling-stale-issue-bot-on-my-github-repositories) to see the reasons why I mark pull requests as stale.
+
+  unmarkComment: >-
+    This pull request is no longer marked for closure.
+
+  closeComment: >-
+    This pull request has been closed due to inactivity. If you feel this is in error, please reopen the pull request or file a new PR with the relevant details.
+
+issues:
+  markComment: |-
+    This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!
+
+    Please read [this blog post](https://www.jeffgeerling.com/blog/2020/enabling-stale-issue-bot-on-my-github-repositories) to see the reasons why I mark issues as stale.
+
+  unmarkComment: >-
+    This issue is no longer marked for closure.
+
+  closeComment: >-
+    This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,31 @@
+---
+name: CI
+'on':
+  pull_request:
+  push:
+    branches:
+      - master
+
+jobs:
+
+  lint:
+    name: Lint
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Check out the codebase.
+        uses: actions/checkout@v2
+
+      - name: Set up Python 3.
+        uses: actions/setup-python@v2
+        with:
+          python-version: '3.x'
+
+      - name: Install test dependencies.
+        run: pip3 install yamllint ansible
+
+      - name: Lint all the YAMLs.
+        run: yamllint .
+
+      - name: Run the HPL benchmark playbook without networking.
+        run: ansible-playbook main.yml --tags "setup,benchmark"
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+hosts.ini
diff --git a/.yamllint b/.yamllint
@@ -0,0 +1,11 @@
+---
+extends: default
+rules:
+  line-length:
+    max: 140
+    level: warning
+  truthy: false
+
+ignore: |
+  **/.github/workflows/ci.yml
+  **/stale.yml
diff --git a/README.md b/README.md
@@ -0,0 +1,70 @@
+# Top500 Benchmark - HPL Linpack
+
+[![CI](https://github.com/geerlingguy/top500-benchmark/workflows/CI/badge.svg?branch=master&event=push)](https://github.com/geerlingguy/top500-benchmark/actions?query=workflow%3ACI)
+
+A common generic benchmark for clusters (or extremly powerful single node workstations) is Linpack, or HPL (High Performance Linpack), which is famous for its use in rankings in the [Top500 supercomputer list](https://top500.org) over the past few decades.
+
+I wanted to see where my various clusters and workstations would rank, historically ([you can compare to past lists here](https://hpl-calculator.sourceforge.net/hpl-calculations.php)), so I built this Ansible playbook which installs all the necessary tooling for HPL to run, connects all the nodes together via SSH, then runs the benchmark and outputs the result.
+
+## Why not PTS?
+
+Phoronix Test Suite includes [HPL Linpack](https://openbenchmarking.org/test/pts/hpl) and [HPCC](https://openbenchmarking.org/test/pts/hpcc) test suites. I may see how they compare in the future.
+
+When I initially started down this journey, the PTS versions didn't play nicely with the Pi, especially when clustered. And the PTS versions don't seem to support clustered usage at all!
+
+## Benchmarking - Cluster
+
+Make sure you have Ansible installed (`pip3 install ansible`), then set up a `hosts.ini` file in this directory based on the `example.hosts.ini` file.
+
+Each host should be reachable via SSH using the username set in `ansible_user`. Other Ansible options can be set under `[cluster:vars]` to connect in more exotic clustering scenarios (e.g. via bastion/jump-host).
+
+Tweak any settings inside `config.yml` as desired (the most important being `hpl_root`—this is where the compiled MPI, ATLAS, and HPL benchmarking code will live).
+
+Then run the benchmarking playbook inside this directory:
+
+```
+ansible-playbook main.yml
+```
+
+This will run three separate plays:
+
+  1. Setup: downloads and compiles all the code required to run HPL. (This play takes a long time—up to many hours on a slower Raspberry Pi!)
+  2. SSH: configures the nodes to be able to communicate with each other.
+  3. Benchmark: creates an `HPL.dat` file and runs the benchmark, outputting the results in your console.
+
+After the entire playbook is complete, you can also log directly into any of the nodes (though I generally do things on node 1), and run the following commands to kick off a benchmarking run:
+
+```
+cd ~/tmp/hpl-2.3/bin/rpi
+mpirun -f cluster-hosts ./xhpl
+```
+
+> The configuration here was tested on smaller 1, 4, and 6-node clusters with 6-64 GB of RAM. Some settings in the `config.yml` file that affect the generated `HPL.dat` file may need diffent tuning for different cluster layouts!
+
+### Benchmarking a Single Node
+
+To run locally on a single node, clone or download this repository to the node where you want to run HPL. Make sure the `hosts.ini` is set up with the default options (with just one node, `127.0.0.1`).
+
+Then, run the following command so the cluster networking portion of the playbook is not run:
+
+```
+ansible-playbook main.yml --tags "setup,benchmark"
+```
+
+> For testing, you can start an Ubuntu docker container:
+> 
+> ```
+> docker run -it --rm -v $PWD:/code geerlingguy/docker-ubuntu2204-ansible:latest bash
+> ```
+>
+> Then go into the code directory (`cd /code`) and run the playbook using the command above.
+
+## Results
+
+In my testing on Raspberry Pi OS Bullseye, in November 2021, I got the following results:
+
+| Benchmark | Configuration | Result | Wattage | Gflops/W |
+| --- | --- | --- | --- | --- |
+| HPL (1.5 GHz default clock) | Turing Pi 2 (4x CM4) | 44.942 Gflops | 24.5W | 1.83 Gflops/W |
+| HPL (2.0 GHz overclock) | Turing Pi 2 (4x CM4) | 51.327 Gflops | 33W | 1.54 Gflops/W |
+| HPL (1.5 GHz default clock) | DeskPi Super6c (6x CM4) | TODO Gflops | TODOW | TODO Gflops/W |
diff --git a/ansible.cfg b/ansible.cfg
@@ -0,0 +1,5 @@
+[defaults]
+nocows = true
+inventory = hosts.ini
+interpreter_python = /usr/bin/python3
+stdout_callback = yaml
diff --git a/config.yml b/config.yml
@@ -0,0 +1,15 @@
+---
+# Working directory where HPL and associated applications will be compiled.
+hpl_root: /home/pi
+
+# HPL.dat configuration options.
+# See: https://www.advancedclustering.com/act_kb/tune-hpl-dat-file/
+# See also: https://hpl-calculator.sourceforge.net/HPL-HowTo.pdf
+nodecount: "{{ ansible_play_hosts | length | int }}"
+ram_in_gb: "{{ ( ansible_memtotal_mb / 1024 * 0.75 ) | int | abs }}"
+hpl_dat_opts:
+  # sqrt((Memory in GB * 1024 * 1024 * 1024 * Node count) / 8) * 0.9
+  Ns: "{{ (((((ram_in_gb | int) * 1024 * 1024 * 1024 * (nodecount|int)) / 8) | root) * 0.90) | int }}"
+  NBs: 192
+  Ps: 2
+  Qs: 2
diff --git a/example.hosts.ini b/example.hosts.ini
@@ -0,0 +1,14 @@
+# For single node benchmarking (default), use this:
+[cluster]
+127.0.0.1 ansible_connection=local
+
+# For cluster benchmarking, delete everything above this line and uncomment:
+# [cluster]
+# node-01.local
+# node-02.local
+# node-03.local
+# node-04.local
+# node-05.local
+#
+# [cluster:vars]
+# ansible_user=username