# Alignment with inserts and keeping the indexing of ref seq intact

Parts of sequences are given below-

Reference sequence (pre-alignment):

``````ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGT
``````

Reference sequence (post-alignment) and below it is Sample sequence (post-alignment):

``````--------------------------------------------------------------------------------------------------attaaa---------ggtt------------------tataccttc---------ccaggtaacaaa-------------ccaacc-----aactttcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgt
--------------------------------------------------------------------------------------------------------------------------------------------------------------taacaaa-------------ccaacc-----aactttcgatctcttgtagatctgttctctaaacgaactttaaaatctgtgt
``````

I’m adding a simpler to interpret example as per a comment on this post.
Say the ref seq is `aattaaatttgggggtttt` and the sample seq is `ttaaggggttaaatttgggggt--t`. Then post-alignment, they will be like-

``````--aa----ttaaatttgggggtttt
ttaaggggttaaatttgggggt--t
``````

Now, since my ref seq was

``````0    1    2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18
a    a    t    t    a    a    a    t    t    t    g    g    g    g    g    t    t    t    t
``````

I want that post-alignment also, the indexing should be conserved-

``````           0    1                        2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18
-    -    a    a    -    -    -    -    t    t    a    a    a    t    t    t    g    g    g    g    g    t    t    t    t
t    t    a    a    g    g    g    g    t    t    a    a    a    t    t    t    g    g    g    g    g    t    -    -    t
``````

I want to keep the indexing of the reference sequence conserved(i.e. the first base in ref seq post-alignment is `a`, second is `t`, third is `t`, etc.), like they do in standard softwares, and then I want to run some quick analysis on it, say to check for conservation of a (mono/di)-nucleotide at some positions. If anybody has some insight on how to do it the most efficient way(memory-wise and time-wise), then that’d be great. I use Python for my work.