Application Packaging Tutorial¶
Target audience
The target audience for this tutorial is CSCS staff who want to provide an application or programming environment uenv for users on Alps. We assume that you are familiar with Spack and how to build uenv using Stackinator - the focus here will be on best practices.
This tutorial walks through configuring and maintaining a uenv recipe and deployment a representative HPC application. The uenv has to support some common use cases, and should cover most of the aspects of deploying your own uenv.
Arbor is a scientific software for neuroscience simulation, with features including:
- A C++ library with a Python interface
- Distributed execution through MPI
- Multicore simulation
- Support for both NVIDIA and AMD GPUs
Requirements¶
Before starting, we gather requirements for the use cases of the uenv on the system, in order to understand:
- which packages the uenv will provide;
- which interfaces the uenv will provide to those packages.
Supported workflows¶
For Arbor we wish to support two workflows:
- Application: provide optimised builds of Arbor for users to use directly
- BYO/Developer: Arbor is under active development, and some users require the ability to build the latest bleeding edge version, or build with a customised configuration. This workflow also supports Arbor developers, who need to implement new features, fix bugs and test on Alps.
Supported systems¶
Arbor is well-optimised for both CPU and GPU execution and users of systems with and without accelerators, so we will provide it for the following platforms:
- multicore:
zen2
/zen3
gh200
supported platforms
Supported targets on Alps are currently zen2
, zen3
, a100
, mi200
, and gh200
.
For more information, see the internal CSCS confluence.
Also, for information about which targets are available on which vClusters, see the config.yaml
.
Compilers¶
Arbor is a C++17 library that officially supports GCC and Clang, with a Python front end.
For this we choose the following compiler versions:
target | compiler | cuda | python |
---|---|---|---|
zen2 /zen3 |
gcc@13.2 |
- | python@3.11 |
gh200 |
gcc@13.2 |
cuda@12.4 |
python@3.11 |
Packages¶
The first step when building an application, use-case or workflow uenv is to determine which specs to add to the list.
If the aim was to provide arbor with cuda and Python support enabled, an environments.yaml
file that provides a single spec arbor@0.9 +python
could be sufficient, e.g.:
arbor:
compiler:
- toolchain: gcc
spec: gcc
mpi:
spec: cray-mpich
unify: true
specs:
- arbor@0.9 +python
variants:
- +mpi
- +cuda
- cuda_arch=90
views:
arbor:
links: root
This environment definition will build arbor, with all of its dependencies implicitly concretised by Spack. Such a simple recipe is sometimes sufficient, however one will often need to provide a more detailed set of specs. Reasons for more detailed specs include:
- to pin the version of a specific dependency, e.g.:
- to ensure that the version of a package is not dependent on which version of Spack is used;
- to a version that is well supported and tested on the target system;
- to a version that patches a bug on the target system.
- to specialise the spec of a specific dependency, e.g.:
- with non-default variants that support all features on the target system;
- with non-default variants that give the best performance on the target system;
- to use a specific compiler when more than one compiler toolchain is used to build packages in an environment.
- to explicitly list all of the dependencies to provide to users in an environment view
The objective for the Arbor uenv is to provide both Arbor and all of the tools and libraries to "build your own" Arbor. This requires providing all of the libraries and tools required to download the Arbor source code, run CMake, and build in a file system view.
As a starting point, we use the spack package for Arbor. From this we derive a list of dependencies:
- direct dependencies like
pugixml
,fmt
andpybind11
needed to build Arbor. - compiler and languages like
python
andcuda
.
The Recipe¶
With requirements in hand, it is now time to write the recipe.
Config¶
There are a few simple choices to make when writing the config.yaml
file:
name
-
Keep it simple, we choose
arbor
.Tip
Use the same name on all hardware targets, i.e. use a name like
arbor
orgromacs
instead ofarbor-gpu
orgromacs-x86
. By doing this users can more easily find your uenv on all vClusters - if they are on a system with an x86 CPU, they can assume that thearbor
uenv has been built appropriately.The uenv CLI tool also allows users to disambiguate which micro-architecture they require, if on a system that provides versions of a uenv built for multiple uarch:
spack
-
By default use the most recent version of Spack supported by Stackinator. At the time of writing, the most recent version of Spack is
v0.21
, for which it is recommended to use thereleases/v0.21
branch, which receives backports of bug fixes while not changing the API or recipe definitions.Warning
The
develop
branch should be avoided for uenv deployed on CSCS clusters unless it is absolutely necessary. mount
-
Normally application and development uenvs go in
/user-environment
and tools that you might want to use alongside a development or application uenv go in/user-tools
(e.g. a debugger). For Arbor, we choose the default/user-environment
path. description
-
Keep it simple, fit it on one line.
Compilers¶
Based on our requirements above, defining compilers is straightforward.
Environments¶
The environment definitions include the specs that we want to provide to end users, and the relevant cuda
and python
versions depending on each application.
arbor:
compiler:
- toolchain: gcc
spec: gcc
mpi:
spec: cray-mpich@8.1.29
unify: true
specs:
# arbor
- arbor@0.9 +python
# build tools
- cmake
- googletest
- ninja
- python@3.11
# C++ dependencies
- fmt
- pugixml
- nlohmann-json
- random123
# python packages
- py-mpi4py
- py-numpy
- py-pip
- py-pybind11
# etc
- osu-micro-benchmarks
variants:
- +mpi
packages:
- diffutils
- gettext
- gmake
- libxml2
- perl
views:
arbor:
link: roots
develop:
link: roots
exclude: [arbor]
arbor:
compiler:
- toolchain: gcc
spec: gcc
mpi:
spec: cray-mpich@8.1.29
gpu: cuda
unify: true
specs:
# arbor
- arbor@0.9 +python
# build tools
- cuda@12.4
- cmake
- googletest
- ninja
- python@3.11
# C++ dependencies
- fmt
- pugixml
- nlohmann-json
- random123
# python packages
- py-mpi4py
- py-numpy
- py-pip
- py-pybind11
# etc
- osu-micro-benchmarks
variants:
- +mpi
- +cuda
- cuda_arch=90
packages:
- diffutils
- gettext
- gmake
- libxml2
- perl
views:
arbor:
link: roots
develop:
link: roots
exclude: [arbor]
variants
Environments on GH200 will typically have the following variants set:
+cuda
sets that variant for all that support it, required for NVIDIA GPU builds.cuda_arch=90
is required forgh200
(usecuda_arch=80
for thea100
nodes)
views and roots
Always use view:link:roots
if possible to filter which packages are added to views.
The default all
setting and also the run
setting can add a lot of packages that were not explicitly in the list of your uenv's specs.
Packages in the view can lead to conflicts, which can be avoided by only including packages that are strictly required.
For example, if a view has a common dependency like libssl
in its /lib
path, and LD_LIBRARY_PATH
is set, system CLI tools like git
can crash because the link against the libssl
in the uenv at runtime.
Modules¶
We add a module file, which controls which modules are provided by the uenv. This is because some users might want modules, and it doesn't hurt to provide them (this is a weak reason, and we accept that we will be on the hook for supporting them for users who incorporate them into their workflows).
Info
If you don't need to provide modules, set modules: False
in config.yaml
.
modules:
# Paths to check when creating modules for all module sets
prefix_inspections:
bin:
- PATH
lib:
- LD_LIBRARY_PATH
lib64:
- LD_LIBRARY_PATH
default:
arch_folder: false
# Where to install modules
roots:
tcl: /user-environment/modules
tcl:
all:
autoload: none
hash_length: 0
exclude_implicits: true
exclude: ['%gcc@7.5.0', 'gcc %gcc@7.5.0']
projections:
all: '{name}/{version}'
modules:
# Paths to check when creating modules for all module sets
prefix_inspections:
bin:
- PATH
lib:
- LD_LIBRARY_PATH
lib64:
- LD_LIBRARY_PATH
default:
arch_folder: false
# Where to install modules
roots:
tcl: /user-environment/modules
tcl:
all:
autoload: none
hash_length: 0
exclude_implicits: true
exclude: ['%gcc@7.5.0', 'gcc %gcc@7.5.0']
projections:
all: '{name}/{version}'
Testing¶
Failure
write reframe tests.
Deployment¶
Configuring the pipeline¶
The target systems for deploying the Arbor uenv to users are Eiger (zen2
) and Santis (gh200
).
To enable the CI/CD pipeline to build and deploy the uenv on these systems, update the config.yaml
file in the alps-uenv repository:
Tip
To test that the pipeline yaml is correctly configured before pushing the changes and making a PR, you can run a basic test with the new uenv:
system=santis uarch=gh200 uenv=arbor:v0.9 ./ci/configure-pipeline
system=eiger uarch=zen2 uenv=arbor:v0.9 ./ci/configure-pipeline
If there are no obvious error messages, you are good to go!
Running the pipeline¶
To run the pipeline that will automatically build and test your uenv, first create a PR:
- Push your changes to a branch (preferably in a fork of the main alps-uenv repository).
- Open a PR with your changes.
Once the PR is created, the pipeline has to be triggered for each individual combination of uenv/version/uarch/vCluster by using a specially formatted
cscs-ci run alps;system=eiger;uarch=zen2;uenv=arbor:v0.9
cscs-ci run alps;system=santis;uarch=gh200;uenv=arbor:v0.9
Checking the build¶
Log onto the target system, e.g. santis
, and use the uenv image find --build
command to search for the build.
> uenv image find --build arbor
uenv/version:tag uarch date id size
arbor/v0.9:1250847801 gh200 2024-04-12 89c9a36f21b496a2 3.6GB
arbor/v0.9:1249535229 gh200 2024-04-11 0a2d82448ecaafd7 3.6GB
Info
The --build
flag is required with the find
and pull
commands to interact with images that have been built by the pipeline, but not yet deployed.
Pick the version to pull (if it isn't clear which version to pull, inspect the logs of the CI/CD job that built the image).
# pull the image using its id
uenv image pull --build 89c9a36f21b496a2
# then start the image to test it
uenv image start 89c9a36f21b496a2
Docs¶
Failure
Write about how to document.