multithreading – HPC cluster: select the number of CPUs and threads in SLURM sbatch
--ntasks=#: Number of tasks (use with distributed parallelism).
--ntasks-per-node=#: Number of tasks per node (use with distributed parallelism).
--cpus-per-task=#: Number of CPUs allocated to each task (use with shared memory parallelism).
From this question: if every node has 24 cores, is there any difference between these commands?
sbatch --ntasks 24 [...] sbatch --ntasks 1 --cpus-per-task 24 [...]
Answer: (by Matthew Mjelde)
Yes there is a difference between those two submissions. You are correct that usually
cpus-per-taskis for multithreading, but let’s look at your commands:
For your first example, the
sbatch --ntasks 24 […]will allocate a job with 24 tasks. These tasks in this case are only 1 CPUs, but may be split across multiple nodes. So you get a total of 24 CPUs across multiple nodes.
For your second example, the
sbatch --ntasks 1 --cpus-per-task 24 [...]will allocate a job with 1 task and 24 CPUs for that task. Thus you will get a total of 24 CPUs on a single node.
In other words, a task cannot be split across multiple nodes. Therefore, using
--cpus-per-taskwill ensure it gets allocated to the same node, while using
--ntaskscan and may allocate it to multiple nodes.
Another good Q&A from CÉCIs support website: Suppose you need 16 cores. Here are some use cases:
- you use mpi and do not care about where those cores are distributed:
- you want to launch 16 independent processes (no communication):
- you want those cores to spread across distinct nodes:
--ntasks=16 and --ntasks-per-node=1or
--ntasks=16 and --nodes=16
- you want those cores to spread across distinct nodes and no interference from other jobs:
--ntasks=16 --nodes=16 --exclusive
- you want 16 processes to spread across 8 nodes to have two processes per node:
- you want 16 processes to stay on the same node:
- you want one process that can use 16 cores for multithreading:
- you want 4 processes that can use 4 cores each for multithreading: