Job management with Slurm
We use Slurm as our workload manager. It has the task to alloce jobs to available resources on the cluster. In order to make use of the compute nodes of the cluster, you need to work with slurm commands. In the sections below we go through the basic slurm commands. For advanced sbatch script usage, please check out this [dedicated page].
Job submission
In this example we send a job script with sbatch to slurm. Slurm will look foor a compute node with available resources to run this script. See [dedicated page] on how to configure sbatch scripts.
Here, we allocate 4 cpu cores to our job with the "-c" argument. So in this case, or script will be executed as a job with 4 cpu cores.Note
With the current cluster configurations, every core you allocate automatically gives you 4GB of memory. For example, if you request 4 CPU cores you will get 16GB of memory.
GPU submission
GPU nodes can be accessed by requesting the gpu partition with "-p":
Cancel your job
Interactive
If you like to access your job in a terminal:
Or on the gpu node:
Note
Exit with ctrl-D
Slurm system information
To see available partition, nodes and their status:
Checking job status
You should be able to find your own jobs in the queue:
orThese jobs might be running, or waiting on available resources or other dependend jobs. In case they are stuck, you should see the reason at the end of the row, e.g. (ReqNodeNotAvail).
You can check the status of a job with:
Job efficiency
It is recommended to monitor the efficiency of your jobs. Over-allocating resources can lead to longer queue times and an increased likelihood of job suspension due to system policies. Under-allocating resources increases the risk of job failure due to insufficient memory or CPU.
Check CPU Efficiency and Memory Efficiency and adjust your jobs if necessary.