Merge contigs in fasta file

Merge contigs in fasta file


Hello All,

I am running variant calling on some species whose reference genomes have a very high number of contigs (sometimes >400,000). The variant caller I am using splits the job by the number of chromosomes, and is overwhelmed when this number is too high. Therefore I would like to concatenate the contigs for a given species reference fasta file into ~30 contigs.

I believe I could use code such as below to merge all the contigs into one:

> grep -v "^>" test.fasta | awk 'BEGIN { ORS=""; print
> ">Sequence_namen" } { print }' > new​.fasta

However I would like to merge them into ~30 contigs so the process can still be parallelised. I would also like to insert 1000 ‘N’ characters between each of the merged contigs within these merged contigs, to avoid mapping issues that could be caused by merging contig sequences from different parts of the genome.

Does anyone have any advice for how to do this or know of any application that could do something similar?

Thanks in advance for your help.





updated 1 hour ago by


written 2.3 years ago by



Read more here: Source link