Libfabric¶
Libfabric, or Open Fabrics Interfaces (OFI), is a low level networking library that abstracts away various networking backends. It is used by Cray MPICH, and can be used together with OpenMPI, NCCL, and RCCL to make use of the Slingshot network on Alps.
Using libfabric¶
If you are using a uenv provided by CSCS, such as prgenv-gnu, Cray MPICH is linked to libfabric and the high speed network will be used. No changes are required in applications.
If you are using containers, the system libfabric can be loaded into your container using the CXI hook provided by the container engine. Using the hook is essential to make full use of the Alps network.
Tuning libfabric¶
Tuning libfabric (particularly together with Cray MPICH, OpenMPI, NCCL, and RCCL) depends on many factors, including the application, workload, and system.
For a comprehensive overview libfabric options for the CXI provider (the provider for the Slingshot network), see the fi_cxi
man pages.
Note that the exact version deployed on Alps may differ, and not all options may be applicable on Alps.
See the Cray MPICH known issues page for issues when using Cray MPICH together with libfabric.
Todo
More options?