Hey all,
So I currently dealing with deep-sequencing data of 16S amplicons from multiple variable regions on the 16S gene.
I used different sets of primers to amplify different regions and then sequenced them all together,
I am trying to de-multiplex based on those primers using cutadapt and the following code:
cutadapt
--pair-adapters
--no-indels
-g file:primers_16S_V1-9_qiagen_for.fasta
-G file:primers_16S_V1-9_qiagen_rev.fasta
-o {name1}_{name2}.1.fastq.gz -p {name1}_{name2}.2.fastq.gz
1_S1_L001_R1_001.fastq.gz 1_S1_L001_R2_001.fastq.gz
where my primer files look like that:
primers_16S_V1-9_qiagen_for.fasta:
>V1V2_f
AGRGTTTGATYMTGGCTC
>V2V3_f
GGCGNACGGGTGAGTAA
>V3V4_f
CCTACGGGNGGCWGCAG
>V4V5_f
GTGYCAGCMGCCGCGGTAA
>V5V7_f
GGATTAGATACCCBRGTAGTC
>V7V9_f
YAACGAGCGMRACCC
primers_16S_V1-9_qiagen_rev.fasta:
>V1V2_r
CTGCTGCCTYCCGTA
>V2V3_r
WTTACCGCGGCTGCTGG
>V3V4_r
GACTACHVGGGTATCTAATCC
>V4V5_r
CCGYCAATTYMTTTRAGTTT
>V5V7_r
ACGTCRTCCCCDCCTTCCTC
>V7V9_r
GCTGCGTTCTTCATCGATGC
My primers are not necessarily anchored so they can appear at any place in the reads.
When I demultiplex I almost don’t get any matches to the V3V4f and V3V4r combination which is weird since I know it is a major part of my reads.
So I tried running the following code on the unknown reads only with the V3V4f and V3V4r combination:
cutadapt
--pair-adapters
--no-indels
--action=none
-g CCTACGGGNGGCWGCAG
-G GACTACHVGGGTATCTAATCC
-o {name1}_{name2}.1.fastq.gz -p {name1}_{name2}.2.fastq.gz
unknown-unknown.1.fastq.gz unknown-unknown.2.fastq.gz
and lo and behold I get tens of thousands of reads matching the primer combination.
Does anybody have any idea what is happening here? I am sure I am missing something.
Why are those matches not found in my first run of cutadapt ?
Sorry for the long post and thank u for any suggestions
Cheers
Read more here: Source link