Why does Cutadapt output much larger files than I am inputting?
I am using usegalaxy.org to work with paired end RNAseq data. I am using Cutadapt to trim adapter sequences, and the Cutadapt output files are larger than the files I am inputting. Example, my first sample SRR6467550, the forward read input fastsanger.qz is 2.1 GB. After using Cutadapt, the output fastsanger.qz is 8.1 GB. This is causing my disk quota to fill much faster and making it difficult to work with the amount of data I have (226 samples, I am going to have to work in batches as is). Is this problem avoidable in any way? Is there a way to obtain an output that is smaller?
My full input for reference:
Paired-end collection: My Data
Read 1 (3′): AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Read 2 (3′): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
Minimum length (R1): 20
Quality cutoff: 20
Outputs Selector: Report: Cutadapt’s per-adapter statistics. You can use this file with MultiQC.
• 33 views
Read more here: Source link