Skip to content

Clariden

Todo

Introduction

This page is a cut and paste of some of Todi's old documentation, which we can turn into a template.

Cluster Details

Todo

a standardised table with information about

  • number and type of nodes

and any special notes

Logging into Clariden

Todo

how to log in, i.e. ssh clariden.cscs.ch via ela.cscs.ch

provide the snippet to add to your ~/.ssh/config, and link to where we document this (docs not currently available)

Software and services

Todo

information about CSCS services/tools available

  • container engine
  • uenv
  • CPE
  • ... etc

Running Jobs on Clariden

Clariden uses SLURM as the workload manager, which is used to launch and monitor distributed workloads, such as training runs.

See detailed instructions on how to run jobs on the Grace-Hopper nodes.

Storage

Todo

describe the file systems that are attached, and where.

This is where $SCRATCH, $PROJECT etc are defined for this cluster.

Refer to the specific file systems that these map onto (capstor, iopstor, waldur), and link to the storage docs for these.

Also discuss any specific storage policies. You might want to discuss storage policies for MLp one level up, in the MLp docs.

  • attached storage and policies

Calendar and key events

The system is updated every Tuesday, between 9 am and 12 pm. ...

Todo

notifications

a calendar widget would be useful, particularly if we can have a central calendar, and a way to filter events for specific instances

Change log

special text boxes for updates

they can be opened and closed.

2024-10-15 reservation daint available again

The reservation daint is available again exclusively for Daint users that need to run their benchmarks for submitting their proposals, additionally to the debug partition and free nodes. Please add the Slurm option --reservation=daint to your batch script if you want to use it

2024-10-07 New compute node image deployed

New compute node image deployed to fix the issue with GPU-aware MPI.

Max job time limit is decreased from 12 hours to 6 hours

2024-09-18 Daint users

In order to complete the preparatory work necessary to deliver Alps in production, as of September 18 2024 the vCluster Daint on Alps will no longer be accessible until further notice: the early access will still be granted on Tödi using the Slurm reservation option --reservation=daint

Known issues

TODO list of know issues - include links to known issues page