Aligning multiple overlapping DNA sequence reads to predicted sequence not working
I work on a large protein (5 subunits, 5850 bp) that I have to sequence using multiple Sanger Sequence reactions (usually ~10). I want to be able to get the sequencing results and align them to my protein sequence so I can see the whole sequencing result and identify any gaps where I don’t have coverage. However, while I can align each sequencing reaction result individually to the template no problem, once I start aligning multiple sequences to the template at once (>5), some no longer align properly and end up getting spread over the whole sequence length with lots of gaps.
My question is what alignment strategy/algorithm should I use to avoid this happening? At the moment I am using mafft –globalpair from within the AliView sequence viewer app. I was using Muscle but this gave even worse results (started spreading sequencing reads over the whole sequence length with less sequences present that mafft).
Some ideas I’ve had. I assume that as more sequencing reactions are being introduced, the algorithm starts aligning to them rather than my protein sequence in some way. Can I make the alignment program give most weight to my protein sequence during the alignment? Alternatively am I better off performing pairwise alignments first and then combining these into a global alignment later. If so what is the best program to do the combining?
Thanks in advance! Any help appreciated.
• 12 views
Read more here: Source link