I need some help with grep or any other command that will help do the job. I am very new to the command line. Any help is appreciated, thank you.
I recently did some amplicon sequencing of a multiplexed PCR reaction. I used nearly 90 primer pairs to multiplex a PCR reaction to generate amplicons. Sequencing libraries of these amplicons were made and the libraries read on a MiSeq instrument. 4 such reactions, differing in some primer pairs were used for sequencing. I now have the fastq files. Now i want to see the representation of each primer product in the fastq file, do decide which primer pool I should proceed with for my actual experiments. The MiSeq run was single-end and so I want to look for the forward primer sequence in the resultant fastq files.
I have been using grep to get answers but i only know how to do it individually
grep -c ^AAAGTGTGTGGGGATGATATGG ./*.fastq
c for count
^ to search for string at the beginning of the sequence
The results that I get from this is
./myfastq1.fastq:number ./myfastq2.fastq:number ./myfastq3.fastq:number ./myfastq4.fastq:number
Then I take the number and paste it in an excel file. I know- terrible!!!
I have been searching for help similar to what i need but with no positive outcome.
My request here is:
I have a tab delimited file
forwardprimers.txt with; (col1) primer name (col2) primer sequence, for 90 primers
I have 4 fastq files to query these primer sequences.
Is there a way to query the sequences in primer file with fastq file and get the counts for each primer name in a new output file. Thank you.
Read more here: Source link