python calculation of protein multiple sequence alignment
Dear all 🙂
I am trying to compute the conservation score of each position of a protein multiple sequence alignment.
I already used the Shannon entropy, but I am not satisfied with it since it is not similarity-based but identity only.
So I thought that maybe it could be a good idea to use a substitution matrix. I tried to implement two methods:
- Protein–Protein Interfaces: Analysis of Amino Acid Conservation in Homodimers (doi.org/10.1002/1097-0134(20010101)42:1%3C108::AID-PROT110%3E3.0.CO;2-O)
- the “sum-of-pairs” method from AL2CO (doi.org/10.1093/bioinformatics/17.8.700)
The first method gives me wrong results (maybe because I used BLOSUM62 instead of PET91 used in the article…).
The second method (AL2CO) doesn’t give me satisfying results.
In practice, I would like a score in [0,1] with some sensitivity to sequence redundancy.
I have a workflow in python that process my alignment and calculate properties, so I try as much as possible to avoid external tools…
Do you have some bits of advice or maybe a hidden magick package that I didn’t found :-)?
• 741 views
Read more here: Source link