Cutadapt de-multiplexing does not recognize some barcodes

Hey all,

So I currently dealing with deep-sequencing data of 16S amplicons from multiple variable regions on the 16S gene.
I used different sets of primers to amplify different regions and then sequenced them all together,
I am trying to de-multiplex based on those primers using cutadapt and the following code:

cutadapt 
    --pair-adapters 
    --no-indels 
    -g file:primers_16S_V1-9_qiagen_for.fasta 
    -G file:primers_16S_V1-9_qiagen_rev.fasta 
    -o {name1}_{name2}.1.fastq.gz -p {name1}_{name2}.2.fastq.gz 
    1_S1_L001_R1_001.fastq.gz 1_S1_L001_R2_001.fastq.gz

where my primer files look like that:

primers_16S_V1-9_qiagen_for.fasta:

>V1V2_f
AGRGTTTGATYMTGGCTC
>V2V3_f
GGCGNACGGGTGAGTAA
>V3V4_f
CCTACGGGNGGCWGCAG
>V4V5_f
GTGYCAGCMGCCGCGGTAA
>V5V7_f
GGATTAGATACCCBRGTAGTC
>V7V9_f
YAACGAGCGMRACCC

primers_16S_V1-9_qiagen_rev.fasta:

>V1V2_r
CTGCTGCCTYCCGTA
>V2V3_r
WTTACCGCGGCTGCTGG
>V3V4_r
GACTACHVGGGTATCTAATCC
>V4V5_r
CCGYCAATTYMTTTRAGTTT
>V5V7_r
ACGTCRTCCCCDCCTTCCTC
>V7V9_r
GCTGCGTTCTTCATCGATGC

My primers are not necessarily anchored so they can appear at any place in the reads.
When I demultiplex I almost don’t get any matches to the V3V4f and V3V4r combination which is weird since I know it is a major part of my reads.

So I tried running the following code on the unknown reads only with the V3V4f and V3V4r combination:

cutadapt 
    --pair-adapters 
    --no-indels 
    --action=none 
    -g CCTACGGGNGGCWGCAG 
    -G GACTACHVGGGTATCTAATCC 
    -o {name1}_{name2}.1.fastq.gz -p {name1}_{name2}.2.fastq.gz 
    unknown-unknown.1.fastq.gz unknown-unknown.2.fastq.gz

and lo and behold I get tens of thousands of reads matching the primer combination.

Does anybody have any idea what is happening here? I am sure I am missing something.
Why are those matches not found in my first run of cutadapt ?

Sorry for the long post and thank u for any suggestions

Cheers

Read more here: Source link