NCBI NR protein db nr.gz FASTA inflate error?

NCBI NR protein db nr.gz FASTA inflate error?

0

Hello all, I’m trying to download and makedb for the nr.gz FASTA file from NCBI. I originally used wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz

to download the nr.gz file. It worked (seemingly). But when I try to run $diamond makedb --in nr.gz -d nr

I get the following error:

#CPU threads: 64
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: /global/scratch/users/*****/*****/nr.gz
Opening the database file...  [0.028s]
Loading sequences...  [1.93s]
Error: Inflate error.

I then tried $fixgz nr.gz nr.fixed.gz and ran diamond makedb again, and got the same error:

#CPU threads: 64
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1);
Database input file: /global/scratch/users/*****/*****/nr.fixed.gz
Opening the database file...  [0.031s]
Loading sequences...  [0.118s]
Error: Inflate error.

I’ve also tried to gunzip nr.gz and and nr.fixed.gz and get gzip: nr.fixed.gz: invalid compressed data--format violated

How do I successfully download the nr.gz file? It’s huge and it sounds like ftp is often unstable, so the file gets corrupted? I’ve tried doing it multiple times with the same result. Is there an older version of nr.gz I could use?


nr.gz


alignment


Megan6


db


DIAMOND


NCBI

• 26 views

Read more here: Source link