How to shorten header of multiple fasta sequences

How to shorten header of multiple fasta sequences

0

Hello everyone

I wanted to trimmed or shorten the header of multiple fasta as given here;

>PH01000278G0580 AAPIP1;1 PH_genemodel_v1 PH01000278..503019..506969 . + . ID=PH01000278G0580;Name=cytochrome P450, putative, expressed
MVLLVAIGVVVGVLVVSSLVLRWNEVRYSRKQGLPPGTMGWPLFGETTEFLKHGP
>PH01003036G0080 AANIP2;1 PH_genemodel_v1 PH01003036..45987..47350 . + . ID=PH01003036G0080;Name=chlorophyll A-B binding protein, putative, expressed
MAMASSSGLRSCSAVGVPSLLAPSSRSGRSGLPFCAYATTSGRVTMSAEWFPGQ


                                                                                                               TO


>PH01000278G0580 AAPIP1;1
MVLLVAIGVVVGVLVVSSLVLRWNEVRYSRKQGLPPGTMGWPLFGETTEFLKHGP
>PH01003036G0080 AANIP2;1
MAMASSSGLRSCSAVGVPSLLAPSSRSGRSGLPFCAYATTSGRVTMSAEWFPGQ

I found some command like

awk 'BEGIN{RS=">";}NR>1{ split($1,a," "); print ">"a[0]"n"$2; }' in.fasta > out.fasta
awk -F 'locus_tag=|]' 'NR %2 == 1 {print ">"$2 }; NR % 2 == 0 {print}'

But not works for even after playing with those commands multiple times


trimmed


header


fasta

• 19 views

Read more here: Source link