Alignment with inserts and keeping the indexing of ref seq intact

Parts of sequences are given below-

Reference sequence (pre-alignment):

ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGT

Reference sequence (post-alignment) and below it is Sample sequence (post-alignment):

--------------------------------------------------------------------------------------------------attaaa---------ggtt------------------tataccttc---------ccaggtaacaaa-------------ccaacc-----aactttcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgt
--------------------------------------------------------------------------------------------------------------------------------------------------------------taacaaa-------------ccaacc-----aactttcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgt

I’m adding a simpler to interpret example as per a comment on this post.
Say the ref seq is aattaaatttgggggtttt and the sample seq is ttaaggggttaaatttgggggt--t. Then post-alignment, they will be like-

--aa----ttaaatttgggggtttt 
ttaaggggttaaatttgggggt--t

Now, since my ref seq was

0    1    2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18
a    a    t    t    a    a    a    t    t    t    g    g    g    g    g    t    t    t    t

I want that post-alignment also, the indexing should be conserved-

           0    1                        2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18
 -    -    a    a    -    -    -    -    t    t    a    a    a    t    t    t    g    g    g    g    g    t    t    t    t
 t    t    a    a    g    g    g    g    t    t    a    a    a    t    t    t    g    g    g    g    g    t    -    -    t 

I want to keep the indexing of the reference sequence conserved(i.e. the first base in ref seq post-alignment is a, second is t, third is t, etc.), like they do in standard softwares, and then I want to run some quick analysis on it, say to check for conservation of a (mono/di)-nucleotide at some positions. If anybody has some insight on how to do it the most efficient way(memory-wise and time-wise), then that’d be great. I use Python for my work.

Read more here: Source link