Illumina index adapter trimming of FASTQ files using Cutadapt/TrimGalore
I am new to the field. I am trying to analyze single end 100b FastQ files with ~70million reads/sample. I am trying to determine if adapter sequences are present and if so how to go about them. I ran FastQC on the files and reports show they each have an “overrepresented sequence” of an “illumina index adapter” in them.
I have the following questions:
Does sample1 look like a trimmed file or it requires adapter trimming?
If further trimming is recommended what would be the best seq/adapter option to be used for cutadapt/TrimGalore? [See below for my thoughts so far]
Based on the FastQC report, do I need to worry about presence of any other adapter sequences beside the index?
My thoughts on question 2:
The sequences for illumina index adapter format appear to be:
0.5%? You really don’t have to worry about that if you don’t want to.
The N’s are for the variable index region. You know what the index is, you can see it in the fastqc report. Why would you put N’s in?
The sequence to trim if having Universal Adaprer contaminations is AGATCGGAAGAGC.
In you case (0.5%) I would not even bother and directly align the files without any manipulations.
In your case, you do not need to bother 0.5% adapter contents. However, if you want optimal results, 0.5% adapters can be removed
while leaving non-adapter sequences intact as trimming algorithm improves. Also, trimming is not all about adapters. Removing Ns and low-quality tails are also important.
I’d recommend atria to determine and trim the adapter sequences. It is a newly-published cutting-edge trimmer with exceptional precision and speed. And if you do not know what adapter sequence should be used, Atria can detect adapters if adapter content is higher than 0.04%. (If <0.04%, no need to do adapter trimming.)