Question: total memory usage (–mem) exceeded the RealMemory in pcluster with SLURM


I have set RealMemory=58000 (though they are 64G instances) in the file slurm_parallelcluster_queue_partition.conf
When I run the following commands:
sbatch -N 1 -n 1 –mem=40000 –wrap=”srun first_task”
sbatch -N 1 -n 1 –mem=40000 –wrap=”srun second_task” and more

They will be assigned to single instance which lead to ‘Cannot allocate memory’ in Java. I assume they should be assigned to two instances since the total required memory already exceed the RealMemory.
Do I misunderstand something? Or do you have any suggestions on this unexpected behavior?

Also, these tasks won’t be assigned to a 32G instance, so the RealMemory and –mem should already work.

Thanks very much.

Read more here: Source link