# How to automat the calculation of alignment scores (BLOSUMxx matrix based) between one short peptide sequence and 10000 other peptide sequences (using biopython)

I am running into an issue recently when trying to calculate the scores of a high number of short peptide alignments (10000).

I have to calculate all the alignement scores (calculation based on the BLOSUM62 matrix, but I could also want to use other BLOSUM matrix) of a set of 10000 alignments.

Those alignments are between one “reference” sequence (that does not vary) and 10000 other (almost random) sequences. Those sequences are all very short (9 to 20aa — my example bellow is with 9aa).

First, I already generated the list of 10000 (almost random) peptide sequences in an .xls (or .cvs) table. No issue for that.

Then, I found this biopython script to calculate the score of one alignement (here i put one example with 2 of my peptides): (the script come from this forum)

``````from Bio.SubsMat.MatrixInfo import blosum62 as blosum

blosum.update(((b,a),val) for (a,b),val in list(blosum.items()))

def score_pairwise(seq1, seq2, matrix, gap_s, gap_e, gap = True):
for A,B in zip(seq1, seq2):
diag = ('-'==A) or ('-'==B)
yield (gap_e if gap else gap_s) if diag else matrix[(A,B)]
gap = diag

seq1 = 'ALNSSGTLT'
seq2 = 'AVNDSG-LT'

print(sum(score_pairwise(seq1, seq2, blosum, -3, -1)))
``````

—> this give me the correct score of 27. All perfect.

The issue I am facing now is that although this script is perfect for me to calculate the score of the 2 sequences aligned (seq1 and seq2), I am struggling to find a neat way to apply this Python script on 10000 alignements.

As I said, seq1 does not vary so nothing to change here. Now, i wanted python to retrieve the 10000 other seq2 sequences from my xls/cvs sheet into a list so that the variable seq2 could take the various string values from that list. For that I did:

``````import pandas as pd
all_seq2_list.values.T.tolist()
``````

Which return me the list [‘randompeptide1’, ‘randompeptide2’,…, ‘randompeptide10000’]

So far so good.

I have a vague idea that I should find a way to make a loop where the seq2 variable changes each time, assigning it the new string value (peptide sequence) corresponding to the next sequence in my all_seq2_list but I don’t have enough technical knowledge in python to understand how to do that.

Bottom line is:
I am struggling with the construction of the loop and how to (for each iteration) give the next value from my all_seq2_list to my seq2 variable; and then calculate the 10000 resulting scores
I think it is probably not very difficult for someone who use python every day but as a newbie I am stuck.

Can anyone help me ?
Thank you!