I have two fastq files of pair-end reads, which I want to use for SNV calling. Quality checking in FastQC showed bad Per base sequence content and a couple of warnings in both Per sequence GC content and Sequence Length Distribution – you can see it in the pictures below.
My idea was to cut off first 6 bases and around 10 in the end. I used Trimmomatic with the following command:
TrimmomaticPE -threads 32 -phred33 R1.fastq R2.fastq Trimmed/FP.fastq Trimmed/FUN.fastq Trimmed/RP.fastq Trimmed/RUN.fastq ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 HEADCROP:6 SLIDINGWINDOW:4:30 CROP:90
After this I got a pretty strange GC content, which appears to be worse than it used to be before trimming, and the Sequence Length Distribution is still has a warning.
The basic statistics before and after trimming is the following:
Does anyone have any idea why this happened, and what to do to improve the quality of data? Any help is appreciated!
Read more here: Source link