Job management with Slurm

We use Slurm as our workload manager. It has the task to alloce jobs to available resources on the cluster. In order to make use of the compute nodes of the cluster, you need to work with slurm commands. In the sections below we go through the basic slurm commands. For advanced sbatch script usage, please check out this [dedicated page].

Job submission

In this example we send a job script with sbatch to slurm. Slurm will look foor a compute node with available resources to run this script. See [dedicated page] on how to configure sbatch scripts.

sbatch -c 4 [script]

Here, we allocate 4 cpu cores to our job with the "-c" argument. So in this case, or script will be executed as a job with 4 cpu cores.

Note

With the current cluster configurations, every core you allocate automatically gives you 4GB of memory. For example, if you request 4 CPU cores you will get 16GB of memory.

GPU submission

GPU nodes can be accessed by requesting the gpu partition with "-p":

sbatch -c 4 -p gpu [script]

Cancel your job

scancel [jod-id]

Interactive

If you like to access your job in a terminal:

srun -c 1 --pty bash -i

Or on the gpu node:

srun -c 1 -p gpu --pty bash -i

Note

Exit with ctrl-D

Slurm system information

To see available partition, nodes and their status:

sinfo

Checking job status

You should be able to find your own jobs in the queue:

squeue -u [username]

squeue --me

These jobs might be running, or waiting on available resources or other dependend jobs. In case they are stuck, you should see the reason at the end of the row, e.g. (ReqNodeNotAvail).

You can check the status of a job with:

sacct -j [job-id]

Job efficiency

It is recommended to monitor the efficiency of your jobs. Over-allocating resources can lead to longer queue times and an increased likelihood of job suspension due to system policies. Under-allocating resources increases the risk of job failure due to insufficient memory or CPU.

seff -j [jod-id]

Check CPU Efficiency and Memory Efficiency and adjust your jobs if necessary.