FASTQC not showing adapters—cutadapt sanity check—

Hello a newbie here,
I am reanalyzing an article (GSE83931) for training purpose. I have two concerns/question.

1- I performed FASTQC on the sequences followed by multiqc. When I look at the reports individually it doesn’t show any adapter sequence. (please see pic1). (Authors reported the they used Trimmomatic to remove them). I can see adapter in the multiqc report (pic2). Pictures belong to the same run. pic1 pic2.

How can we explain the discrepancy here?

2- They reported that TruSeq3-SE.fa adapter sequence was removed by Trimmomatic. I used cutadapt instead. The adapter sequence (based on the FASTQC report) I found online corresponds to : AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

I used following command line parameters:

cutadapt -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA -m 50 -j 4 -o SRR3734812_trim50.fastq.gz --length-tag 'length=" SRR3734812.fastq.gz

Output:

This is cutadapt 1.18 with Python 3.7.6 Command line parameters: -a
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA -m 50 -j 4 -o
SRR3734812_trim50.fastq.gz --length-tag length= SRR3734812.fastq.gz
Processing reads on 4 cores in single-end mode ... Finished in 709.18
s (28 us/read; 2.16 M reads/minute).

=== Summary ===

Total reads processed:              25,562,072 Reads with adapters:   
783,598 (3.1%) Reads that were too short:                   0 (0.0%)
Reads written (passing filters):    25,562,072 (100.0%)

Total basepairs processed: 2,556,207,200 bp Total written (filtered): 
2,553,044,075 bp (99.9%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA; Type: regular 3";
Length: 34; Trimmed: 783598 times.

No. of allowed errors: 0-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-34 bp:
3

Bases preceding removed adapters:   A: 24.0%   C: 31.0%   G: 29.6%  
T: 15.5%   none/other: 0.0%

Overview of removed sequences length    count   expect  max.err error counts
3   529182  399407.4    0   529182 4    116588  99851.8 0   116588
5   39583   24963.0 0   39583 6 16724   6240.7  0   16724 7 14190   1560.2  0   14190
8   12594   390.0   0   12594 9 11809   97.5    0   11202 607 10    10917   24.4    1   10045
872 11  9490    6.1 1   9007 483 12 8432    1.5 1   8112 320 13 7396    0.4 1   7214
182 14  6684    0.1 1   2 6682 15   8   0.0 1   0 8 17  1   0.0 1   0 1

After trimming I performed FASTQC again on the same sequence. Apparently, it did something as the sequence length is now 83-100 (pic3). When I compare the first 3-4 reads from before and after trimming, it looks same. How can I validate trimming step ?

A naïve question: Should all reads have a adapter or only some of them have adapters? (because in the report it say 3% of the runs have adapter) Although not mentioned in the article, could authors upload already trimmed sequences to GEO?

Thank you for your time!

pic3

Read more here: Source link