python calculation of protein multiple sequence alignment

python calculation of protein multiple sequence alignment

2

Dear all 🙂

I am trying to compute the conservation score of each position of a protein multiple sequence alignment.
I already used the Shannon entropy, but I am not satisfied with it since it is not similarity-based but identity only.
So I thought that maybe it could be a good idea to use a substitution matrix. I tried to implement two methods:

  1. Protein–Protein Interfaces: Analysis of Amino Acid Conservation in Homodimers (doi.org/10.1002/1097-0134(20010101)42:1%3C108::AID-PROT110%3E3.0.CO;2-O)
  2. the “sum-of-pairs” method from AL2CO (doi.org/10.1093/bioinformatics/17.8.700)

The first method gives me wrong results (maybe because I used BLOSUM62 instead of PET91 used in the article…).
The second method (AL2CO) doesn’t give me satisfying results.

In practice, I would like a score in [0,1] with some sensitivity to sequence redundancy.
I have a workflow in python that process my alignment and calculate properties, so I try as much as possible to avoid external tools…

Do you have some bits of advice or maybe a hidden magick package that I didn’t found :-)?

Best,
Thibault.


python


alignment


conservation

• 741 views

Read more here: Source link