Illumina index adapter trimming of FASTQ files using Cutadapt/TrimGalore

Illumina index adapter trimming of FASTQ files using Cutadapt/TrimGalore

3

Dear all,

I am new to the field. I am trying to analyze single end 100b FastQ files with ~70million reads/sample. I am trying to determine if adapter sequences are present and if so how to go about them. I ran FastQC on the files and reports show they each have an “overrepresented sequence” of an “illumina index adapter” in them.

sample1

I have the following questions:

  1. Does sample1 look like a trimmed file or it requires adapter trimming?

  2. If further trimming is recommended what would be the best seq/adapter option to be used for cutadapt/TrimGalore? [See below for my thoughts so far]

  3. Based on the FastQC report, do I need to worry about presence of any other adapter sequences beside the index?

My thoughts on question 2:
The sequences for illumina index adapter format appear to be:

GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

These are the adapter sequences found in my FastQC report for sample 1:

GATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATGC 
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATG

I am thinking of using below options for cutadapt/trimgalore to remove the adapter(s):

trim_galore sample1.fastq.gz -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -q 20 --length 20 –fastqc

However, it seems that trimmomatics for instance only takes care of the initial sequence of the index adapter (only up to Ns and not after):
github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-SE.fa

Many thanks for your time and reply beforehand.


cutadapt


TrimGalore


trimming


RNA-Seq


index adapter

• 2.4k views

updated 10 hours ago by

0

written 19 months ago by

▴

40

0.5%? You really don’t have to worry about that if you don’t want to.

The N’s are for the variable index region. You know what the index is, you can see it in the fastqc report. Why would you put N’s in?

The sequence to trim if having Universal Adaprer contaminations is AGATCGGAAGAGC.
In you case (0.5%) I would not even bother and directly align the files without any manipulations.

In your case, you do not need to bother 0.5% adapter contents. However, if you want optimal results, 0.5% adapters can be removed
while leaving non-adapter sequences intact as trimming algorithm improves. Also, trimming is not all about adapters. Removing Ns and low-quality tails are also important.

I’d recommend atria to determine and trim the adapter sequences. It is a newly-published cutting-edge trimmer with exceptional precision and speed. And if you do not know what adapter sequence should be used, Atria can detect adapters if adapter content is higher than 0.04%. (If <0.04%, no need to do adapter trimming.)

Eg: Finding adapters

atria --detect-adapter -r reads.fastq [...]

Do N trimming and low-quality trimming:

atria --no-adapter-trim -r read1.fastq [-R read2.fastq]


Login
before adding your answer.

Traffic: 1435 users visited in the last hour

Read more here: Source link