Introduction to Slurm
SLURM: A job scheduler
A simple job submission to slurm:
#!/bin/bash
#SBATCH -t 0-00:10
#SBATCH -A user
#SBATCH --mem=4G
./serial_hello
sleep 10m
- Then you can run the bash script with:
sbatch job_serial.sh
- Using these commands users can view job queue, cancel any specific jobs, and see history:
# Show what's waiting in the queue
squeue -u user_name
# show expected start time of jobs
squeue --start
# cancel a specific job
scancel job_id
# cancel all jobs by a specific user
scancel -u user_name
# Cancel all the pending jobs for the user
scancel -t PENDING -u <username>
# Cancel one or more jobs by name
scancel -name <jobname>
# This one provides more details about a specified job.
scontrol show job <jobid>
# Hold a job, prevent it from starting
scontrol hold <jobid>
# Release a job hold, allowing the job to continue
scontrol resume <jobid>
# Requeue a running, suspended or finished job
scontrol requeue <jobid>
The sacct
command is a flexible way to explore the properties of jobs that are queued, running or complete (in any state). The following call to sacct
provides formatted information about the jobs for the specified user that were in any state during a specified interval.
sacct -aX -u guest171 -S2020-06-14T18:50 -o jobid,jobname%36,submit,start,state
For a list of output format variables available to be returned by sacct call it with the –helpformat flag as follows:
sacct --helpformat
# Output:
Account AdminComment AllocCPUS AllocGRES
AllocNodes AllocTRES AssocID AveCPU
AveCPUFreq AveDiskRead AveDiskWrite AvePages
AveRSS AveVMSize BlockID Cluster
Comment ConsumedEnergy ConsumedEnergyRaw CPUTime
CPUTimeRAW DerivedExitCode Elapsed ElapsedRaw
Eligible End ExitCode GID
Group JobID JobIDRaw JobName
Layout MaxDiskRead MaxDiskReadNode MaxDiskReadTask
MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask MaxPages
MaxPagesNode MaxPagesTask MaxRSS MaxRSSNode
MaxRSSTask MaxVMSize MaxVMSizeNode MaxVMSizeTask
McsLabel MinCPU MinCPUNode MinCPUTask
NCPUS NNodes NodeList NTasks
Priority Partition QOS QOSRAW
ReqCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov
ReqCPUS ReqGRES ReqMem ReqNodes
ReqTRES Reservation ReservationId Reserved
ResvCPU ResvCPURAW Start State
Submit Suspended SystemCPU Timelimit
TotalCPU UID User UserCPU
WCKey WCKeyID WorkDir
- Working with modules (adding popular modules to bashrc is also a good practice):
# See what modules are loaded
module list
# See available modules, in general for specific modules
module avail
module avail gcc
# See available modules with more info.
module spider gcc
module spider gcc/8.3.0
# Load module
module load gcc/9.1.0
# Reset modules
module reset
# Module save the current list
module save list_name
# View module lists in the saved file
module list_name
# Restore the modules in a module list
module restore list_name
# I mostly use these modules (add to bashrc):
module load cuda/10.2.89
module load gcc/8.4.0
module load ibm-java
module load openmpi/4.0.3
- To run a parallel job for 1 hour, creat this file (job.sh):
#!/bin/bash
#SBATCH -A <account>
#SBATCH -t 0-01:00
#SBATCH -n 64
#SBATCH --mem-per-cpu=1G
srun ./mpi_application
- Then run with this command:
sbatch job.sh
Other slurm script commands
# | Command | Description |
---|---|---|
1 | #SBATCH -ntask=1 or #SBATCH --na | Requests for 1 processors on task, usually 1 cpu per task |
2 | #SBATCH --time=0-05:00 | Sets the maximum runtime of 5 hours for the job |
3 | #SBATCH --mail-user=<email> | Sets the email address for sending notifications about the job state |
4 | #SBATCH --mail-type=BEGIN | Sets the system to send email when the job enters the respective states |
5 | #SBATCH --job-name=<jobname> | Sets the job name |
6 | #SBATCH --ntasks=X | Requests for X tasks. When cpus-per-task=1, this requests X cores. |
7 | #SBATCH --nodes=X | Requests a minimum X nodes |
8 | #SBATCH --nodes=X-Y | Requests that a minimum of X nodes and a maximum of Y nodes be allocated |
9 | #SBATCH --cpus-per-task=X | Requests that a minimum of X CPUs per task be allocated to this job |
10 | #SBATCH --tasks-per-node=X | Requests minimum of X tasks be allocated per node |
11 | #SBATCH --mem=4000 | Requests 4000 MB of memory in total |
12 | #SBATCH --mem-per-cpu=4000 | Requests 4000 MB of memory per cpu |
13 | #SBATCH --gres=gpu:4 | Requests 4 gpu per node (might be different in another platform |
14 | #SBATCH --exclusive | Requests that the job run only on the nodes with noe other running jobs |
15 | #SBATCH --dependency=after:<job1_id> | Requests that the job start after job1 has started |
16 | #SBATCH --dependency=afterany:<job1>,<job2> | Requests that the job start after job1 or job 2 has finished |
17 | #SBATCH --dependency=afterok:<job1_id> | Requests that the job start after job1 has finished successfully |
18 | #SBATCH --account=<account> | To submit a job to a specific accounting group or role |
19 | #SBATCH --temp=200G | Asks for 200GB of temporary disk space |
20 | #SBATCH -o <my-output-file.out> | Saves the job output to a specific file |
- For number 3, we can set other states for a job
#SBATCH --mail-type=END
#SBATCH --mail-type=FAILE
#SBATCH --mail-type=ALL
- Slurm job environment variables
- Get account information:
- Get system information:
Interactive jobs
- Using
salloc
command instead ofsbatch
In this way there is no need to creat a batch file. You can submit your job with the command line itself.
# Options are: 10 minutes, 1 cpu core,
# 4 GB of total memory, and enabled GUI:
salloc -x11 --time=0-00:10 -n 1 -A user --mem=4G
# After previous command you can run your application
./application
# For an MPI app, Options: 1 hour, 4 cpu cores,
# 4 GB of memory per core, and enabled GUI
salloc -x11 --time=0-01:00 -n 4 -A user --mem-per-cpu=4G
mpirun -n 4 ./application
# For an OpenMP app:
# 4 hardware threads, total 4 GB of memory
salloc -x11 --time=0-01:00 -c 4 -A user --mem=4G
./openmp_app
export OMP_NUM_THREADS=3
# For GPU allocation:
# 1 node and 1 GPU
# On Graham cluster of computecanada the default is P100
# For more infor see [here](https://docs.computecanada.ca/wiki/Graham) and [here](https://docs.computecanada.ca/wiki/Using_GPUs_with_Slurm)
salloc -x11 --time=0-01:00 -n 1 -A user --mem=4G --gres=gpu:1
salloc -x11 --time=0-01:00 -n 1 -A user --mem=4G --gres=gpu:v100:1
salloc -x11 --time=0-01:00 -n 1 -A user --mem=4G --gres=gpu:t4:1
How to check the used GPU-hours of a specific account
sreport cluster accountutilizationbyuser cluster=mist account=accountName start=2020-04-01 end=2020-06-18 -t Hour --tres=GRES/gpu