I am using FastUniq to deduplicate Illumina Miseq paired-end data, and using FastQC to compare quality control (QC) reports before and after deduplication. I figured out how to use FastUniq, but for some reason, it only seems to be effective on the first read pair, and not nearly as much on the second (which is incredibly odd knowing that the same number of reads are filtered out of forward en reverse files).
My FastUniq command:
fastuniq -i list.txt -o SAMPLE_R1_dedup_1.fastq -p SAMPLE_R2_dedup_2.fastq
Where list.txt
is a file containing the forward and reverse files as FastUniq requires:
SAMPLE_R1.fastq
SAMPLE_R2.fastq
Then, when I compare the FastQC QC report, I see the following:
SAMPLE_R1 with duplicates:
SAMPLE_R1 deduplicated:
As you can see, this gives some really good results. However, the reverse QC reports look like this:
SAMPLE_R2 with duplicates:
SAMPLE_R2 deduplicated:
This is significantly worse than the forward (R1) read, and this exact same thing happens for all my samples.
My question is: what happens that I get these results? Is something going wrong with FastUniq, since it is quite an outdated tool? Is FastQC giving false reports for the reverse (R2) read? Or is this output to be expected?
Read more here: Source link