parallel downloads from SRA with SRA toolkit or other ways to speed up downloads

parallel downloads from SRA with SRA toolkit or other ways to speed up downloads

0

Is there a way to parallelize downloads from NCBI using SRAToolkit on a HPC cluster? I tried using GNU parallel but I can not actually tell if the downloads are doing anything:

cat < /home/ptellier/scratch/phillip/data/escc_data/SRA_accessions.txt | parallel -j 4 fasterq-dump --threads 4 --progress {}

Unlike the regular command “fasterq-dump –progress” I can’t see any progress output when I parallelize it.

So far when I run the downloads in a for loop, the download is a few megabytes/s and there is around 800 gb to download into the cluster:

    for d in $SRA_DOWNLOADS
    do
       echo "downloading $d from sequence read archive"
       fasterq-dump --threads 4 --progress $d
    done

Is there anything else I can do to speed up these large downloads?

This is the data from SRA run selector that I was trying to access:
www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA672851&o=acc_s%3Aa


centos_linux64


HPC


Bash


SRA


SRAToolkit

• 140 views

Read more here: Source link