hpc – Slurm – Execute a lot of serial jobs parallel

Batch script to run many serial jobs parallel on a HPC with slurm

I want to run a large number of independent serial jobs in parallel using slurm. However, I run into the maximum number of 100 jobs that a user can submit. Therefore only 100 jobs are processed simultaneously in my script.

Is there a better way that I can submit the complete simulation as one big job?

#!/bin/bash

max_jobs=100

# Set the directory where the simulation folders are located
dir="/work/parameter_study/"

# Loop over the parameter cases
for param_case in {0001..0216}_sim; do
    cd $dir/$param_case
    
    # Loop over the Monte Carlo simulations
    for mcs_case in {0001..1500}_MCS; do
        cd $dir/$param_case/$mcs_case
        
        #sed -i -e 's/\r$//' a.out
            chmod 777 a.out
        
        # Check if max_jobs is exceeded
        while true
        do
          # Count rows without header
          job_count=$(squeue -h -t PD,R | wc -l) 
  
          if [ $job_count -lt $max_jobs ]
          then
            break
          fi
  
          sleep 0.5
        done


        # Submit a job for each simulation using the a.out file
            jobID=$(sbatch -p single -J ${param_case}_${msc_case} --wrap ./a.out) 
        echo "${jobID} ${param_case} ${mcs_case} - $(date '+%H:%M:%S')"
        
    done
done

# Wait for all jobs to finish
wait

Read more here: Source link