Contigs number vs NNN gap % in WGS


Hi all,

I have two fasta files of bacterial whole genome sequence assembly (same genome)

  1. generated by spades

  2. generated by spades > extra scaffolding step using multi-csar, by comparing it to 5 close genomes.

quast statistics as below :

Genome 1 : N50 = 2133845 N75= 1046574 (#contigs =12) N’s per 100 kbp = 0.0

Genome 2 : N50 = 4052556 N75= 4052556 (#contigs = 2) N’s per 100 kbp = 51.80

It is clear that the scaffolder reduced the contigs number by generating NNNNN bridges among them.

NNNNs are considered gaps and lower the accuracy of the genome, however, it helps in predicting the order of these contigs and so maybe it is recommended in generating some tasks like circular visualization!!

If this fatsa file will be used in next annotation step, and in downstream analysis like;

AntiSMASH secondary metabolite prediction tool, RNA-seq etc…

Which file do you recommend keeping and using in the downstream analysis!!!??

