Abnormal GC content in Whole Exome Sequencing Sample after alignment
I have 87 Whole Exome Sequenced samples (Agilent SureSelect Exome v7 library and NovaSeq sequencer). The Illumina adaptater and the small reads (<30pb) were removed with CutAdapt.
For fastq QCs, my only problem is with %GCs. Here is the multiQC result after a fastQC:
Even though I have 8 bad samples in red, the majority of the samples are approximately centered around 50%GC. (I assume that both bumps are due to errors during library preparation or sequencing?)
However, my main concern is after the alignment with BWA. I obtained this figure for the GC content :
I have one peak at 70% and another one around 90%, which is really problematic.
The HSMetrics showed that I have approximatively 85% of bases aligned on baits (so 15% bases that are off-bait).
When I tried to locate these GC-rich reads I usually fall in intronic or intergenic regions. However sometimes I fall at the end of exons, as with this example:
Do you have an idea about how to remove these reads?
Thank you for your help.
• 524 views
Read more here: Source link