This program runs BSP intervals with configurable workloads and collects data on their lengths, both on individual nodes and collectively for modeling and analyzing the performance of HPC systems and system software. It generates data in JSON format for analysis by, for example, python programs and ipython notebooks. This code is derived from code originally written by Oscar Mondragon in 2016 at the University of New Mexico. It was heavily modified in 2018 and 2019 by Patrick Bridges, Keira Haskins, Sahba Tashakkori, and Christopher Leap of UNM and Patrick Widener of Sandia National Labs to run more flexible and configurable workloads, to more accurately measure time on individual nodes, and to more informative output in JSON format by default to just the root node of the computaiton to make the overall program a more flexible, scalable, and powerful system analysis tool.
The code relies on several external libraries to compile:
- MPI - This program relies extensively on MPI for basic communciation and systme measurement. It is also possible to recast the basic approach into other BSP-like formulations, and this is potentially useful for evaluating the scalabilty of those programming systems, but directly using MPI gives fine-grain, low-level control of exactly what communication and data movement is done when, which is important for evaluating the underlying system accurately.
- SPRNG - the scalable parallel random number generator library from the University of Florida, for accurate generation of parallel random numbers. Because almost all analysis of the generated data is based on statistics with strong independence assumptions, we are careful with random numbers to make sure the random number generator doesn't introduce any bias into our results.
- GSL - the GNU Scientific Library is also used to handle and sample distributions based on the random numbers generated by SPRNG
- gsl-spring.h - a simple headerfile that ties SPRNG version 5 into GSL so we can use GSL as our uniform interface to generating random numbers from a variety of distributions
- BLAS - The matrix multiply workload is generated by the system linear algebra libraries, which we rely on being appropriately tuned to the system under test
- STREAM - Jeff Hammond's stream benchmark used for a pure memory system workload, converted into a simple library
- FBENCH - John Walker's trigonometry benchmark to exercise pure floating point performance, compiled against (again optimized) system math libraries, again converted into a simple library
- XXX - Some I/O workload.