How to make fasta manipulation more efficient
I have a Multi-Sequence FASTA file of the form –
>SDF123.1 blah blah
ATCTCTGGAAACTCGGTGAAAGAGAGTAT
AGTGATGAGGATGAGTGAG…
>SBF123.1 blah blah
ATCTCTGGAAACTCGGTGAAAGAGAGTAT
AGTGATGAGGATGAGTGAG….
And I want to extract the individual FASTA files into individual files (like here)
I wrote the following AWK code, but it runs too slow, as compared to when I did not have the close
command in it. By slow, I mean it only generates about a dozen files in a minute. I had to incorporate the close
command, since without it, I was getting the awk error – too many open files
.
Here is the code –
cat big_multi_sequence_file.fasta | awk -F ' ' '{
if (substr($0, 1, 1)==">") {filename=(substr($1,2) ".fa")}
print $0 >> filename; close (filename)
}'
How can I make this code more time efficient? I am new to awk.
Thank you!
• 21 views
Read more here: Source link