How to make fasta manipulation more efficient

How to make fasta manipulation more efficient

1

I have a Multi-Sequence FASTA file of the form –

>SDF123.1 blah blah

ATCTCTGGAAACTCGGTGAAAGAGAGTAT

AGTGATGAGGATGAGTGAG…

>SBF123.1 blah blah

ATCTCTGGAAACTCGGTGAAAGAGAGTAT

AGTGATGAGGATGAGTGAG….

And I want to extract the individual FASTA files into individual files (like here)

I wrote the following AWK code, but it runs too slow, as compared to when I did not have the close command in it. By slow, I mean it only generates about a dozen files in a minute. I had to incorporate the close command, since without it, I was getting the awk error – too many open files.

Here is the code –

cat big_multi_sequence_file.fasta | awk -F ' ' '{
        if (substr($0, 1, 1)==">") {filename=(substr($1,2) ".fa")}
        print $0 >> filename; close (filename)
}'

How can I make this code more time efficient? I am new to awk.

Thank you!


fasta


sequence

• 21 views

Read more here: Source link