Batch script to run many serial jobs parallel on a HPC with slurm
I want to run a large number of independent serial jobs in parallel using slurm. However, I run into the maximum number of 100 jobs that a user can submit. Therefore only 100 jobs are processed simultaneously in my script.
Is there a better way that I can submit the complete simulation as one big job?
#!/bin/bash
max_jobs=100
# Set the directory where the simulation folders are located
dir="/work/parameter_study/"
# Loop over the parameter cases
for param_case in {0001..0216}_sim; do
cd $dir/$param_case
# Loop over the Monte Carlo simulations
for mcs_case in {0001..1500}_MCS; do
cd $dir/$param_case/$mcs_case
#sed -i -e 's/\r$//' a.out
chmod 777 a.out
# Check if max_jobs is exceeded
while true
do
# Count rows without header
job_count=$(squeue -h -t PD,R | wc -l)
if [ $job_count -lt $max_jobs ]
then
break
fi
sleep 0.5
done
# Submit a job for each simulation using the a.out file
jobID=$(sbatch -p single -J ${param_case}_${msc_case} --wrap ./a.out)
echo "${jobID} ${param_case} ${mcs_case} - $(date '+%H:%M:%S')"
done
done
# Wait for all jobs to finish
wait
Read more here: Source link