Is it possible to use a loop to get the EMBOSS Merger function to work on multiple FASTA files?

Hello all,

Previously, I posted about a question in a similar vein (see here) BUT now, 2 weeks later, I think I am nearly there! I plan to update that previous post and explain what I’ve done once I’ve tackled this final bit. (TL;DR my other question: I used the hittable, not the FASTA headers which I should’ve realised ages ago)

The problem:

I have a multifasta file with all the sequences that I have identified as overlapping. These results are grouped by GenBank Accession number and nucleotide positon:

>AK310930|1:38-236_Homo_sapiens    
ATGAAGGCTCTCATTGTTCTGGGG

>AK310930|1:231-384_Homo_sapiens    
CTGCAGTGCTTTGCTGCAAG

>XM_010841625|1:145-445_PREDICTED:_Bison_bison    
ATGAA

>XM_010841625|1:444-512_PREDICTED:_Bison_bison    
TGGGT

I have seperate these entries into their own seperate files (thanks Pierre!) which are just simply called _1.fasta, _2.fasta ect.

Using the merge function from EMBOSS does work and I am delighted to have found something that does the job I’m after. The catch is, manually adding your entries in takes time and there is a real chance I am staring at upwards of 1000+ files I’ll have to use merger on.

How could I write a loop, suitable for someone on a macOS, that could run merge? Is that even possible? It took a noticeable amount of time for it to stitch two of these sequences together and I am worried about accidentally frying my MacBook (which is technically the unis!)!

Someone used perl to get a different EMBOSS function to work and it does look like it might be feesible but I really don’t have any knowledge of perl and have never used it!

Would something like this do the job?:

for file in *.fasta; do merger file1.seq file2.seq -sreverse2 -outseq merged.seq "$file"; done

Thank you kindly in advance, I’m trying to understand if this is feasible and I’m on the right path!

Read more here: Source link