Abnormal GC content in Whole Exome Sequencing Sample after alignment

Abnormal GC content in Whole Exome Sequencing Sample after alignment

0

I have 87 Whole Exome Sequenced samples (Agilent SureSelect Exome v7 library and NovaSeq sequencer). The Illumina adaptater and the small reads (<30pb) were removed with CutAdapt.

For fastq QCs, my only problem is with %GCs. Here is the multiQC result after a fastQC:

enter image description here

Even though I have 8 bad samples in red, the majority of the samples are approximately centered around 50%GC. (I assume that both bumps are due to errors during library preparation or sequencing?)

However, my main concern is after the alignment with BWA. I obtained this figure for the GC content :
enter image description here

I have one peak at 70% and another one around 90%, which is really problematic.

The HSMetrics showed that I have approximatively 85% of bases aligned on baits (so 15% bases that are off-bait).

When I tried to locate these GC-rich reads I usually fall in intronic or intergenic regions. However sometimes I fall at the end of exons, as with this example:

enter image description here

Do you have an idea about how to remove these reads?

Thank you for your help.


library


WES


GC


Sequencing


FASTQC

• 524 views

Read more here: Source link