get only one representative fasta sequence per family

Pfam – get only one representative fasta sequence per family



can u help me with getting only one representative fasta sequence per family?
Is there way to simply do that?







It’s not trivial. You could use the sequences from the trRosetta Pfam model set, which are representative of the family (download link).

We have a method for getting representative sequences in our paper if you are comfortable with using hmmsearch:

A representative target sequence was found for each family using hmmsearch to search the UniRef90 database with the Pfam HMM and taking the closest subsequence match by E-value.

hmmemit from the HHMer package will extract a consensus sequence for each HMM from Pfam:

hmmemit -c -o model.fasta model.hmm

Not only is it fast – can be done for the whole Pfam in under 2 minutes – but it is also objective because it gets the sequence directly from the model based on a simple majority rule. Keep in mind that consensus sequences generated this way may not exist in nature, although there will always be some real sequences that are very similar.

before adding your answer.

Traffic: 1226 users visited in the last hour

Read more here: Source link