Trimming Illumina universal adapters using cutadapt proving insufficient

TL;DR: I have high universal Illumina adapter content in my paired-end RNA-seq reads and trimming with both the original sequence and reverse complement of the universal adapter did not completely remove the adapter content and was only effective for the R2 reads.

I am trying to trim adapter sequences from my paired-end RNA-seq data using cutadapt and I am not having a lot of success. When I ran my raw .fastq files through FastQC, it revealed that my sequences had upwards of 35% adapter content in the latter portions of the 150 bp reads.

untrimmed R1 adapter content

untrimmed R2 adapter content

Since the data showed significant universal adapter presence, I decided to use cutadapt to trim these sequences with the universal adapter sequence that Illumina provides. I used the following lines of code in bash:

for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq

do

  SAMPLE=$(echo ${i} | sed "s/R1_001.fastq//")
#  echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
  cutadapt -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -A AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-q 30 -m 60 -o ${SAMPLE}R1_trm.fastq -p ${SAMPLE}R2_trm.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
  echo ${SAMPLE} trimmed

done

Looking at the summary of this data, it appeared to do something but not much.

cutadapt trim1 summary

Looking at the adapter content graph from FastQC I could not even see a visible difference:

trim1 R1

trim1 R2

Seeing how little this helped, I thought, maybe if I try to trim the reverse complement instead, that will help. So I did the same thing, but modifying the code to trim the reverse complement of the universal adapter sequence. I’m going to include that code as well:

for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq

do

  SAMPLE=$(echo ${i} | sed "s/R1_001.fastq//")
#  echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
  cutadapt -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -q 30 -m 60 -o ${SAMPLE}R1_trm2.fastq -p ${SAMPLE}R2_trm2.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
  echo ${SAMPLE} trimmed

done

This time, it appeared to do something noticeable, but the effect was significantly more pronounced in the R2 reads than the R1 reads.

trim2 summary

trim2 R1

trim2 R2

It was nice to see some progress being made, but to be honest, I have no idea how I could proceed from here. I still have >30% adapter content in my R1 reads and >10% in my R2 reads. Definitely not what I would consider to be sufficiently trimmed from my past experience. I believe I saw elsewhere in the forum that it may be more effective to trim smaller portions of the adapter sequence but before I went and wasted the 14 hours that it takes for me to run this, I would ask the forum since y’all actually know what you’re doing.

Read more here: Source link