Collapse multifasta file by specific chromosome names

Collapse multifasta file by specific chromosome names

1

I have a multicast file with unique identifiers (‘SUBJECT.1’, ‘SUBJECT.2’ etc) like this:

>SUBJECT.1.1:1203-2742(+)
AAATTT
>SUBJECT.1:354-700(+)
CCCGGG
>SUBJECT.2:789-2000(+)
GGGCCC
>SUBJECT.2:2012-2742(+)
TTTAAA

how would I extract every line that’s associated to each unique identifier and concatenate them together to form an output file that looks like

>SUBJECT.1
AAATTTCCCGGG
>SUBJECT.2
GGGCCCTTTAAA


fasta

• 43 views

Read more here: Source link