Illumina index adapter trimming of FASTQ files using Cutadapt/TrimGalore
Dear all,
I am new to the field. I am trying to analyze single end 100b FastQ files with ~70million reads/sample. I am trying to determine if adapter sequences are present and if so how to go about them. I ran FastQC on the files and reports show they each have an “overrepresented sequence” of an “illumina index adapter” in them.
I have the following questions:
-
Does sample1 look like a trimmed file or it requires adapter trimming?
-
If further trimming is recommended what would be the best seq/adapter option to be used for cutadapt/TrimGalore? [See below for my thoughts so far]
-
Based on the FastQC report, do I need to worry about presence of any other adapter sequences beside the index?
My thoughts on question 2:
The sequences for illumina index adapter format appear to be:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
These are the adapter sequences found in my FastQC report for sample 1:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATGC
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATG
I am thinking of using below options for cutadapt/trimgalore to remove the adapter(s):
trim_galore sample1.fastq.gz -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -q 20 --length 20 –fastqc
However, it seems that trimmomatics for instance only takes care of the initial sequence of the index adapter (only up to Ns and not after):
github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-SE.fa
Many thanks for your time and reply beforehand.
• 2.4k views
In your case, you do not need to bother 0.5% adapter contents. However, if you want optimal results, 0.5% adapters can be removed
while leaving non-adapter sequences intact as trimming algorithm improves. Also, trimming is not all about adapters. Removing Ns and low-quality tails are also important.
I’d recommend atria to determine and trim the adapter sequences. It is a newly-published cutting-edge trimmer with exceptional precision and speed. And if you do not know what adapter sequence should be used, Atria can detect adapters if adapter content is higher than 0.04%. (If <0.04%, no need to do adapter trimming.)
Eg: Finding adapters
atria --detect-adapter -r reads.fastq [...]
Do N trimming and low-quality trimming:
atria --no-adapter-trim -r read1.fastq [-R read2.fastq]
Traffic: 1435 users visited in the last hour
Read more here: Source link