Extract fastq reads by lists of sequences

Extract fastq reads by lists of sequences

0

Hello,

I have lists of sequence which I would like to find fastq reads that contain these sequences.

Is there a tool or any possible programming to find fastq reads from specific lists of sequences??

My lists of sequences look like following,

GATAAAAAAAAAAAAAAAC
GATAAAAAAAAAAAAAACC
GATAAAAAAAAAAAAAATC
GATAAAAAAAAAAAAAAGC
GATAAAAAAAAAAAAACAC
GATAAAAAAAAAAAAACCC
GATAAAAAAAAAAAAACTC
GATAAAAAAAAAAAAATAC
GATAAAAAAAAAAAAATCC
GATAAAAAAAAAAAAATGC
GATAAAAAAAAAAAAAGAC
GATAAAAAAAAAAAAAGCC
GATAAAAAAAAAAAAAGGC
GATAAAAAAAAAAAACAAC
GATAAAAAAAAAAAACACC
GATAAAAAAAAAAAACCAC
GATAAAAAAAAAAAACCCC
GATAAAAAAAAAAAACCTC
GATAAAAAAAAAAAATAAC
GATAAAAAAAAAAAATCAC
GATAAAAAAAAAAAATTAC
GATAAAAAAAAAAAAGAAC
GATAAAAAAAAAAAAGACC
GATAAAAAAAAAAACAAAC
GATAAAAAAAAAAACCCCC
GATAAAAAAAAAAATAAAC
GATAAAAAAAAAAAGAAAC
GATAAAAAAAAAACAAAAC
.
.
.
.

I have used grep to do this one by one but it’s taking too long (I have 40k 19mers).

grep -A 2 -B 1 "CTCAAAAAAAAACAAAGGA" input.fastq |grep -v "^--$" > output.fastq

Also, there is a problem with overlapping reads.


NGS


genomics


genome


bioinformatics


fastq

• 36 views

Read more here: Source link