# HPC Applications and Benchmarks

## Benchmark Suites

Transaction Processing Performance Council TPC

OSU Micro Benchmarks OMB:

Intel MPI Benchmarks IMB:

DOE Mini applications

Coral Benchmark Suite

ECP Suite

LLNL Proxy Apps - Lawrence Livermore National Lab. Link

NCCL/RCCL Tests

Linux-rdma Perftest

## Generic HPC Benchmarks

- HPL - High Performance Linpack:
- Factoring and solving large dense system of linear equations
- Dominant calculation is matrix-matrix multiplication (mostly done by GPU today)

- HPCG - High Performance Conjugate Gradient
- Complement HPL and target a broader set of HPC applications governed by differential equations, which tend to have much stronger needs for high bandwidth and low latency
- Tend to access data using irregular patterns
- Iterative and heavily use neighborhood collectives

- HPCC
- Consist of 7 test (HPL is one of them)
- Each test focuses on a different aspect, e.g. floating point, memory access, communication, etc.

## Some of HPC Application-Level Benchmarks

- WRF - Weather Research and Forecasting
- Numerical weather prediction system

- GROMACS
- Molecular dynamics
- Primarily designed for biochemical molecules like proteins, lipids and nucleic acids
- Differential equations, linear algebra, 3D stencil, 3D FFT
- Uses OpenMP

- LAMMPS - Large-scale Atomic/Molecular Massively Parallel Simulator
- Molecular dynamics
- Focus on materials modeling, solid state and soft matter
- Conjugate gradient, DFT
- Multiple benchmarks (Lenard-Jones, polymer chain, eam, etc.)

- OpenFOAM
- Computational Fluid Dynamic
- Includes chemical reactions, turbulence/heat transfer, acoustics, solid mechanics/electromagnetics, aerodynamics

- NAMD
- Molecular dynamic - large bio-molecular systems
- Based on Charm++

- LS-Dyna
- Structural analysis
- Car crash, explosions, deformation, jet engine blade containment, bird strike
- Stencils, system of PDEs

- Fluent
- Fluids, acoustic, optics, avionics, etc.

## ML Benchmarks

- Deep Bench from Baidu
- Uses the neural network libraries to benchmark the performance of basic operations
- Dense matrix multiplication, convolutions and communication

- PARAM from Meta
- Repository of communication and compute micro-benchmarks as well as full workloads
- stand-alone compute and communication benchmarks using cuDNN, MKL, NCCL, MPI libraries
- Application benchmarks - DLRM at this point
- ML Framework - pytorch

## Other Micro Benchmarks and Proxy Apps

Adios: ADIOS is developed as part of the United States Department of Energy’s Exascale Computing Project. It is a framework for scientific data I/O to publish and subscribe to data when and where required.

ExaMiniMD: ExaMiniMD is a proxy application and research vehicle for particle codes, in particular Molecular Dynamics (MD). Compared to previous MD proxy apps (MiniMD, COMD), its design is significantly more modular in order to allow independent investigation of different aspects.

MACSio: MACSio is being developed to fill a long existing void in co-design proxy applications that allow for I/O performance testing and evaluation of tradeoffs in data models, I/O library interfaces and parallel I/O paradigms for multi-physics, HPC applications.

mcb: The Monte Carlo Benchmark (MCB) is intended for use in exploring the computational performance of Monte Carlo algorithms on parallel architectures.

OpenMD: OpenMD is an open source molecular dynamics engine which is capable of efficiently simulating liquids, proteins, nanoparticles, interfaces, and other complex systems using atom types with orientational degrees of freedom (e.g. “sticky” atoms, point dipoles, and coarse-grained assemblies).

SAMRAI: SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) is an object-oriented C++ software library that enables exploration of numerical, algorithmic, parallel computing, and software issues associated with applying structured adaptive mesh refinement (SAMR) technology in large-scale parallel application development. SAMRAI provides software tools for developing SAMR applications that involve coupled physics models, sophisticated numerical solution methods, and which require high-performance parallel computing hardware. SAMRAI enables integration of SAMR technology into existing codes and simplifies the exploration of SAMR methods in new application domains.

Siesta: SIESTA is both a method and its computer program implementation, to perform efficient electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids. SIESTA’s efficiency stems from the use of strictly localized basis sets and from the implementation of linear-scaling algorithms which can be applied to suitable systems.

SimpleMOC: The purpose of this mini-app is to demonstrate the performance characterterics and viability of the Method of Characteristics (MOC) for 3D neutron transport calculations in the context of full scale light water reactor simulation.

souffle: Souffle is a logic programming language inspired by Datalog. It overcomes some of the limitations in classical Datalog. For example, programmers are not restricted to finite domains, and the usage of functors (intrinsic, user-defined, records/constructors, etc.) is permitted. Souffl´e has a component model so that large logic projects can be expressed.

sphynx: SPHYNX is an SPH hydrocode with its focus on Astrophysical applications. SPHYNX includes state-of-the-art methods that allow it to address subsonic hydrodynamical instabilities and strong shocks, which are ubiquitous in astrophysical scenarios. SPHYNX, is of Newtonian type and grounded on the Euler-Lagrange formulation of the smoothed-particle hydrodynamics technique.

splatt: SPLATT is a library and C API for sparse tensor factorization. SPLATT supports shared-memory parallelism with OpenMP and distributed-memory parallelism with MPI.

sw4lite-RAJA: sw4lite is a bare bone version of SW4 intended for testing performance optimizations in a few important numerical kernels of SW4.

thornado mini: Thornado mini solves the equation of radiative transfer in the multi-group two-moment approximation. The Discontinuous Galekin (DG) method is used for spatial discretization, and an implicit-explicit (IMEX) method is used to integrate the moment equations in time. The hyperbolic (streaming) part is treated explicitly, while the collision term is treated implicitly.

Trillinos: The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.

tycho2: A mini-app for neutral-particle, discreteordinates (SN), transport on parallel-decomposed meshes of tetrahedra.

vlasiator: In Vlasiator, ions are represented as velocity distribution functions, while electrons are magnetohydrodynamic fluid, enabling a self-consistent global plasma simulation that can describe multi-temperature plasmas to resolve non-MHD processes that currently cannot be self-consistently described by the existing global space weather simulations. The novelty is that by modelling ions as velocity distribution functions the outcome will be numerically noiseless.

vmd: VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.

WRF: WRF is a stateof-the-art atmospheric modeling system designed for both meteorological research and numerical weather prediction. It offers a host of options for atmospheric processes and can run on a variety of computing platforms.

yambo: YAMBO implements ManyBody Perturbation Theory (MBPT) methods (such as GW and BSE) and TimeDependent Density Functional Theory (TDDFT), which allows for accurate prediction of fundamental properties as band gaps of semiconductors, band alignments, defect quasi-particle energies, optics and out-of-equilibrium properties of materials.

arbor-0.3: Arbor is a high-performance library for computational neuroscience simulations with multi-compartment, morphologicallydetailed cells, from single cell models to very large networks. Arbor is written from the ground up with many-cpu and gpu architectures in mind, to help neuroscientists effectively use contemporary and future HPC systems to meet their simulation needs.

Caffe-MPI: The Caffe-MPI is designed for high density GPU clusters; The new version supports InfiniBand (IB) high speed network connection and shared storage system that can be equipped by distributed file system, like NFS and GlusterFS. The training dataset is read in parallel for each MPI process. The hierarchical communication mechanisms were developed to minimize the bandwidth requirements between computing nodes.

CFDEMcoupling: CFDEM® coupling provides an open source parallel coupled CFD-DEM framework combining the strengths of LIGGGHTS® DEM code and the Open Source CFD package OpenFOAM®(

*). The CFDEM®coupling toolbox allows to expand standard CFD solvers of OpenFOAM®(*) to include a coupling to the DEM code LIGGGHTS®.Elemental: Elemental is a modern C++ library for distributed-memory dense and sparse-direct linear algebra, conic optimization, and lattice reduction. The library was initially released in Elemental: A new framework for distributed memory dense linear algebra and absorbed, then greatly expanded upon, the functionality from the sparse-direct solver Clique, which was originally released during a project on Parallel Sweeping Preconditioners.

Gadget: GADGET-4 is a massively parallel code for N-body/hydrodynamical cosmological simulations. It is a flexible code that can be applied to a variety of different types of simulations, offering a number of sophisticated simulation algorithms.

hemelb: HemeLB uses the lattice Boltzmann method to simulate fluid flow in complex geometries, such as a blood vessel network.

horovod: Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.

meshkit: MeshKit is an opensource library of mesh generation functionality. MeshKit has general mesh manipulation and generation functions such as Copy, Move, Rotate and Extrude mesh. In addition, new quad mesh and embedded boundary Cartesian mesh algorithm (EBMesh) are developed to be used. Interfaces to several public-domain tetrahedral meshing algorithms (Gmsh, netgen) are also offered.

metag partitioning: Parallel metagenomic assembler designed to handle very large datasets. Program identifies the disconnected subgraphs in the de Bruijn graph, partitions the input dataset and runs a popular assember Velvet independently on the partitions. This software is a high performance version of the khmer library for assembly.

MITgcm: it can be used to study both atmospheric and oceanic phenomena; one hydrodynamical kernel is used to drive forward both atmospheric and oceanic models it has a non-hydrostatic capability and so can be used to study both small-scale and large scale processes.

MLSL-IntelMLSL: Intel(R) Machine Learning Scaling Library (Intel(R) MLSL) is a library providing an efficient implementation of communication patterns used in deep learning.

mxx: mxx is a C++/C++11 template library for MPI. The main goal of this library is to provide two things: First, simplified, efficient, and type-safe C++11 bindings to common MPI operations. Second, a collection of scalable, high-performance standard algorithms for parallel distributed memory architectures, such as sorting.

Nek5000: High-order methods have the potential to overcome the current limitations of standard CFD solvers.It features state-of-the-art, scalable algorithms that are fast and efficient on platforms ranging from laptops to the world’s fastest computers. Applications span a wide range of fields, including fluid flow, thermal convection, combustion and magnetohydrodynamics.

phyml: PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework. The main tool in this package builds phylogenies under the maximum likelihood criterion. It implements a large number of substitution models coupled to efficient options to search the space of phylogenetic tree topologies.

PrincetonCBEMDMPI: CBEMD: Parallel Molecular Dynamics Under Various Thermodynamic Ensembles.

- Lulesh: LULESH is a highly simplified application, hard-coded to only solve a simple Sedov blast problem with analytic answers
- but represents the numerical algorithms, data motion, and programming style typical in scientific C or C++ based applications.

miniVite: miniVite is a proxy app that implements a single phase of Louvain.

- ntchemini: NTChem is a high-performance software package for the molecular electronic structure calculation for general purpose on the K computer.

## Mantevo

miniAMR: miniAMR applies a stencil calculation on a unit cube computational domain, which is divided into blocks. The blocks all have the same number of cells in each direction and communicate ghost values with neighboring blocks.

miniMD: miniMD is a parallel molecular dynamics (MD) simulation package written in C++ and intended for use on parallel supercomputers and new architechtures for testing purposes. The software package is meant to be simple, lightweight, and easily adaptable to new hardware.

miniFE: MiniFE is an proxy application for unstructured implicit finite element codes. It is a similar to HPCCG and pHPCCG but provides a much more complete vertical covering of the steps in this class of applications.

miniSMAC: Solves the finite-differenced 2D incompressible Navier-Stokes equations with Spalart-Allmaras oneequation turbulence model on a structured body conforming grid. The grid is partitioned into subgrids load balanced for the number of MPI ranks requested by the user

miniTri: miniTri is a proxy for a class of triangle based data analytics (Mantevo). This simple code is a self-contained piece of C++ software that uses triangle enumeration with a calculation of specific vertex and edge properties.

miniAero: MiniAero is a mini-application for the evaulation of programming models and hardware for next generation platforms. MiniAero is an explicit (using RK4) unstructured finite volume code that solves the compressible Navier-Stokes equations.

miniXyce: At this time, miniXyce is a simple linear circuit simulator with a basic parser that performs transient analysis on any circuit with resistors (R), inductors (L), capacitors (C), and voltage/current sources. The parser incorporated into this version of miniXyce is a single pass parser, where the netlist is expected to be flat (no hierarchy via subcircuits is allowed). Simulating the system of DAEs generates a nonsymmetric linear problem, which is solved using un-preconditioned GMRES. The time integration method used in miniXyce is backward Euler with a constant time-step. The simulator outputs all the solution variables at each time step in a ‘prn’ file.

References:

- Broadcom presentation in MUG’22
- Nderim Shatri, Msc Thesis