Remove reads from FASTQ file based on missing fixed base
Hello everybody, I have a question regarding processing raw FASTQ files based on a specific UMI approach.
Basically, we employed a strategy to our paired-end sequencing experiment, where we use 6nt UMIs in our library. Following the UMI sequence is a fixed base, that is the same for every R1 and every R2 (different between R1 and R2 of course).
This results in most reads having the same base at position 7 (confirmed with FastQC).
My question now is: How do I remove all reads that do not meet the condition of this “fixed 7th base”? Is there a good way to do this in Linux (maybe with grep, awk…?) I am not yet very well versed in using Linux and it’s built in tool for manipulating and processing files, hence the question.
Or ist there a specialized tool, that can do this, that I am not aware of?
I am very grateful for any help!
• 20 views
Read more here: Source link