Storage¶
Sharing files and data¶
Newly created user folders are not accessible by other groups or users on CSCS systems. Linux Access Control Lists (ACLs) let you grant access to one or more groups or users.
In traditional POSIX, access permissions are granted to user/group/other
in mode read
/write
/execute
.
The permissions can be checked with the -l
option of the command ls
.
For instance, if user1
owns the folder test
, the output would be the following:
$ ls -lahd test/
drwxr-xr-x 2 user1 csstaff 4.0K Feb 23 13:46 test/
ACLs are an extension of these permissions to give one or more users or groups access to your data.
The ACLs of the same test
folder of user1
can be shown with the command getfacl
:
$ getfacl test
# file: test
# owner: user1
# group: csstaff
user::rwx
group::r-x
other::r-x
The command setfacl
is used to change ACLs for a file or directory.
To add users or groups to read/write/execute on a selected file or folder, use the -M,--modify-file
or -m,--modify
flags to modify the ACL of a file or directory.
give user2 read+write access to test
Where test
is owned by user1
.
The -X,--remove-file
and -x,--remove
options will remove ACL entries.
remove user2 access to test
This reverts the access that was granted in the previous example.
Access rights can also be granted recursively to a folder and its children (if they exist) using the option -R,--recursive
.
Note
This applies only to existing files - files added after this call won't inherit the permissions.
recursively grant user2 access to test and its contents
To set up a default so all newly created folders and dirs inside or your desired path will inherit the permissions, use the -d,--default
option.
recursively grant user2 access to test and its contents
user2
will have access to files created inside test
after this call:
Info
For more information read the setfacl man page: man setfacl
.
Many small files vs. HPC File Systems¶
Workloads that read or create many small files are not well-suited to parallel file systems, which are designed for parallel and distributed I/O.
Workloads that do not play nicely with Lustre include:
- Configuration and compiling applications.
- Using Python virtual environments
At first it can seem strange that a "high-performance" file system is significantly slower than a laptop drive for a "simple" task like compilation or loading Python modules, however Lustre is designed for high-bandwidth parallel file access from many nodes at the same time, with the attendant trade offs this implies.
Meta data lookups on Lustre are expensive compared to your laptop, where the local file system is able to aggressively cache meta data.
Python virtual environments with uenv¶
Python virtual environments can be very slow on Lustre, for example a simple import numpy
command run on Lustre might take seconds, compared to milliseconds on your laptop.
The main reasons for this include:
- Python virtual environments contain many small files, on which Python performs
stat()
,open()
andread()
commands when loading a module. - Python pre-compiles
.pyc
files for each.py
file in a project. - All of these operations create a lot of meta-data lookups.
As a result, using virtual environments can be slow, and these problems are only exacerbated when the virtual environment is loaded simultaneously by many ranks in an MPI job.
One solution is to use the tool mksquashfs
to compresses the contents of a directory - files, inodes and sub-directories - into a single file.
This file can be mounted as a read-only Squashfs file system, which is much faster because a single file is accessed instead of the many small files that were in the original environment.
Step 1: create the virtual environment¶
The first step is to create the virtual environment using the usual workflow.
The recommended way to create a new virtual environment is to use the uv tool, which supports relocatable virtual environments and asynchronous package downloads. The main benefit of a relocatable virtual environment is that it does not need to be created in the final path from where it will be used. This allows the use of shared memory to speed up the creation and initialization of the virtual environment and, since the virtual environment can be used from any location, the resulting squashfs image can be safely shared across projects.
# start the uenv
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
# and other useful tools
uenv start prgenv-gnu/24.11:v1 --view=default
# create and activate a new relocatable venv using uv
# in this case we explicitly select python 3.12
uv venv -p 3.12 --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv
cd /dev/shm/sqfs-demo
source .venv/bin/activate
# install software in the virtual environment using uv
# in this case we install install pytorch
uv pip install --link-mode=copy torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu126
# optionally, to reduce the import times, precompile all
# python modules to bytecode before creating the squashfs image
python -m compileall -j 8 -o 0 -o 1 -o 2 .venv/lib/python3.12/site-packages
A new virtual environment can also be created using the standard venv
module. However, virtual environments created by venv
are not relocatable, and thus they need to be created and initialized in the path from where they will be used. This implies that the installation process can not be optimized for file system performance and will still be slow on Lustre filesystems.
# start the uenv
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
# and other useful tools
uenv start prgenv-gnu/24.11:v1 --view=default
# for the example create a working path on SCRATCH
mkdir $SCRATCH/sqfs-demo
cd $SCRATCH/sqfs-demo
# create and activate the empty venv
python -m venv ./.venv
source ./.venv/bin/activate
# install software in the virtual environment
# in this case we install install pytorch
pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu126
how many files did that create?
An inode is created for every file, directory and symlink on a file system. In order to optimise performance, we want to reduce the number of inodes (i.e. the number of files and directories).
The following command can be used to count the number of inodes:
find
is used to list every path and file, and stat
is called on each of these to get the inode, and then sort
and wc
are used to count the number of unique inodes.
In our "simple" pytorch example, I counted 22806 inodes!
Step 2: make a squashfs image of the virtual environment¶
The next step is to create a single squashfs file that contains the whole virtual environment folder (i.e. /dev/shm/sqfs-demo/.venv
or $SCRATCH/sqfs-demo/.venv
).
This is performed using the mksquashfs
command, that is installed on all Alps clusters.
Hint
The -Xcompression-level
flag sets the compression level to a value between 1 and 9, with 9 being the most compressed.
We find that level 3 provides a good trade off between the size of the compressed image and performance: both uenv and the container engine use level 3.
I am seeing errors of the form Unrecognised xattr prefix...
You can safely ignore the (possibly many) warning messages of the form:
Tip
The default installed version of mksquashfs
on Alps does not support the best zstd
compression method.
Every uenv contains a better version of mksquashfs
, which is used by the uenv to compress itself when it is built.
The exact location inside the uenv depends on the target architecture, and version, and will be of the form:
Use this version for the best results, though it is also perfectly fine to use the system version.Step 3: use the squashfs¶
To use the optimised virtual environment, mount the squashfs image at the location of the original virtual environment when starting the uenv.
cd $SCRATCH/sqfs-demo
uenv start --view=default \
prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv
source .venv/bin/activate
Remember that virtual environments created by uv
are relocatable only if the --relocatable
option flag is passed to the uv venv
command as mentioned in step 1. In that case, the generated environment is relocatable and thus it is possible to mount it in multiple locations without problems.
cd $SCRATCH/sqfs-demo
uenv start --view=default \
prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv
source .venv/bin/activate
Note that the original virtual environment is still installed in $SCRATCH/sqfs-demo/.venv
, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.
A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy.
Warning
Virtual environments created by venv
are not relocatable as they contain symlinks to absolute locations inside the virtual environment. This means that the squashfs file must be mounted in the exact same location where the virtual environment was created.
Step 4: (optional) regenerate the virtual environment¶
The squashfs file is immutable - it is not possible to modify the contents of .venv
while it is mounted.
This means that it is not possible to pip install
more packages in the virtual environment.
If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes to the virtual environment, and run step 2 again to generate a new image.
Hint
If you save the updated copy in a different file, you can now "roll back" to the old version of the environment by mounting the old image.