NCBI NR protein db nr.gz FASTA inflate error?
Hello all, I’m trying to download and makedb for the nr.gz FASTA file from NCBI. I originally used wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
to download the nr.gz file. It worked (seemingly). But when I try to run $diamond makedb --in nr.gz -d nr
I get the following error:
#CPU threads: 64
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: /global/scratch/users/*****/*****/nr.gz
Opening the database file... [0.028s]
Loading sequences... [1.93s]
Error: Inflate error.
I then tried $fixgz nr.gz nr.fixed.gz
and ran diamond makedb again, and got the same error:
#CPU threads: 64
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1);
Database input file: /global/scratch/users/*****/*****/nr.fixed.gz
Opening the database file... [0.031s]
Loading sequences... [0.118s]
Error: Inflate error.
I’ve also tried to gunzip nr.gz and and nr.fixed.gz and get gzip: nr.fixed.gz: invalid compressed data--format violated
How do I successfully download the nr.gz file? It’s huge and it sounds like ftp is often unstable, so the file gets corrupted? I’ve tried doing it multiple times with the same result. Is there an older version of nr.gz I could use?
• 26 views
Read more here: Source link