It’s not trivial. You could use the sequences from the trRosetta Pfam model set, which are representative of the family (download link).
We have a method for getting representative sequences in our paper if you are comfortable with using hmmsearch:
A representative target sequence was found for each family using hmmsearch to search the UniRef90 database with the Pfam HMM and taking the closest subsequence match by E-value.
hmmemit
from the HHMer package will extract a consensus sequence for each HMM from Pfam:
hmmemit -c -o model.fasta model.hmm
Not only is it fast – can be done for the whole Pfam in under 2 minutes – but it is also objective because it gets the sequence directly from the model based on a simple majority rule. Keep in mind that consensus sequences generated this way may not exist in nature, although there will always be some real sequences that are very similar.
Traffic: 1226 users visited in the last hour
Read more here: Source link