Machine learning applications and frameworks¶

CSCS supports a wide range of machine learning (ML) applications and frameworks on its systems. Most ML workloads are containerized to ensure portability, reproducibility, and ease of use across environments.

Users can choose between running containers, using provided uenv software stacks, or building custom Python environments tailored to their needs.

Running machine learning applications with containers¶

Containerization is the recommended approach for ML workloads on Alps, as it simplifies software management and maximizes compatibility with other systems.

Users are encouraged to build their own containers, starting from popular sources such as the Nvidia NGC Catalog, which offers a variety of pre-built images optimized for HPC and ML workloads. Examples include:
- PyTorch NGC container
- TensorFlow NGC container
For frequently changing dependencies, consider creating a virtual environment (venv) mounted into the container.

Helpful references:

Running containers on Alps: Container Engine Guide
Building custom container images: Container Build Guide

Using provided uenv software stacks¶

Alternatively, CSCS provides pre-configured software stacks (uenvs) that can serve as a starting point for machine learning projects. These environments provide optimized compilers, libraries, and selected ML frameworks.

Available ML-related uenvs:

PyTorch — available on Clariden and Daint

To extend these environments with additional Python packages, it is recommended to create a Python Virtual Environment (venv). See this PyTorch venv example for details.

Note

While many Python packages provide pre-built binaries for common architectures, some may require building from source.

Building custom Python environments¶

Users may also choose to build entirely custom software stacks using Python package managers such as uv or conda. Most ML libraries are available via the Python Package Index (PyPI).

To ensure optimal performance on CSCS systems, we recommend starting from an environment that already includes:

CUDA, cuDNN
MPI, NCCL
C/C++ compilers

This can be achieved either by:

building a custom container image based on a suitable ML-ready base image,
or starting from a provided uenv (e.g., PrgEnv GNU or PyTorch uenv),

and extending it with a virtual environment.