phylogenetics – Remove variable sequence component within a tree text file

I have a gene tree file of 436 orthologue genes from 6 species. I want to remove unwanted extensions as it looks massy after visualization. My file looks like:

(TRINITY_Clupea_DN5452_c0_g1_i1.p1:0.0824467436,TRINITY_Engraulis_DN43599_c0_g1_i1.p1:0.1634781085)100:0.0876433106,TRINITY_Sardina_DN15766_c0_g1_i2.p1:0.0164132018)………………

What i need:

(Clupea_DN5452:0.0824467436,Engraulis_DN43599:0.1634781085)100:0.0876433106,Sardina_DN15766:0.0164132018)………………

As “TRINITY” is identicial, i can remove it using sed. But after the species name ids are not identical. And i only need the 2nd and 3rd part of the identifier.

It will be helpful for me if i get some suggestion. Thanks.

Read more here: Source link