python – Parallelize or qsub a bash script

Background Just for everyone else SGE = Sun Grid Engine and is the qsub system mentioned in the title of the question.

qsub:

If by parallelising you mean multithreading a single job across loads of cores then the first thing is how many cores per node are there?

Usually it is around 10 (but could be much more). That exact number is super important to know because if you request more cores than exist on a single node the job will never run. At best qsub will refuse the job, at worst it will forever be stuck in the queue. For multi-processing on qsub its …

qsub -pe omp 8  myscript.sh

Here its requesting 8 cores and myscript.sh will contain the shell command to run the python script. If there’s a node with 10-cores 2 are in use, it will then load the job to give its max. capacity.

Thus, qsub does not do course grain-parallelisation, that is MPI. Thus qsub will only parallelise across the cores in a single node. This is without question a limitation for cluster computing.

qsub is more than just submitting to a queue, you need to monitor the queuing system pre- and post- submission. Firstly, this is for the number of existing jobs across that system. Secondly, this is to see whats happening to your stuff.

qstat is the way to understand whats there:

qstat -f # whats what on the cluster
qstat -q

This lists all the queues.

qstat -u username # looks at what a given user is doing
qstat long-queue

Targets availability on a specific queue.

A complete list of qstat is here. Also qusage -l is useful.


“Parallelising” loops

If you are submitting loads of jobs which are all working independently in parallel it better to use a different strategy on qsub. It is better to submit each job separately to the queue via a qsub loop. Thus there would be two scripts the first is the script in the question, the second is a submission loop: this would comprise a qsub argument within a loop that would submit each job sequentially. Thus the array might be better in the qsub submission loop.

Monitoring the queue takes place as per normal via qstat.

Rationale

The reason for this is the way the queue works qsub prioritises single jobs over multi-threaded parallelisation, thus you get your results much quicker.

Read more here: Source link