Aligning “genomes” that come in contigs

Beginner problems: Aligning “genomes” that come in contigs


tl;dr- how do you align genomes when each genome is made up of multiple contigs?

Hi everyone,
I’m very new to genomic analysis and I am not even sure how to ask this question. Google is giving me little to go on. I appreciate any help.

I am working with ~20Mb protistan genomes. I want to compare them to each other via whole genome alignment. The genomes are either downloaded directly from NCBI or were generated/assembled by my collaborators.

Each “whole genome” is divided up into ~20 contigs. When I align two or three genome with Mauve via Geneious, there are no issues. Everything’s concatenated and aligned to each other. However, I will eventually need to align dozens of these genomes. My little laptop does not have enough RAM to toss all the genomes into Mauve and I am currently battling my way through Mauve on command line using our institution’s supercomputer.

I tried submitting the sequences to the online version of MAFFT. I end up with each contig aligned to other contigs, rather than “genome” aligned to genome.

I’m wondering if I can just take out the carrots (>) from each scaffold’s fasta file and replace them with Ns?

Concatenating via Geneious requires me to know the order of the contigs in relation to each other. I’m not sure what the order is.

Help me with the vocab regarding this problem. I’m still learning.







Read more here: Source link