Split fastq according to barcodes

Hello, everyone:

I’m recently analyze my scRNA-seq data, the first step is to splitting fastq files according to my barcode file which looks like this:

sc1 AACGTGAT
sc2 AAACATCG
sc3 ATGCCTAA
sc4 AGTGGTCA
sc5 ACCACTGT
sc6 ACATTGGC
sc7 CAGATCTG
sc8 CATCAAGT
sc9 CGCTGATC
sc10    ACAAGCTA
sc11    CTGTAGCC
sc12    AACGCTTA

My data is pair end sequenced and the R1, R2 are like these (I trimmed some):
R2:

@ST-E00493:75:H33JKALXX:1:1101:10987:2206 2:N:0:ATACACAT    
AACGCTTAAGGGTAATTTTTTGTGTTATGTATTTTTTTTTTAGGGGAAAAGGCATTTTTGGT...
+
AAFFFFJJ<A7JF<JF----AA--A--7----AAFJ-F<-FF-<<F-<-AFFA-7A7A-A-<...

R1:

@ST-E00493:75:H33JKALXX:1:1101:10987:2206 1:N:0:ATACACAT
GTTGTGAAGGGGAGGCTGGAGAGGCTTCGTCTGCTAAGAGCATTGGCCGTTCTTCCACTGTT...
+
AAAFFFJ-<JJJJJJJJFJJJF7JFFJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJFFJJJ...

The barcode information is in the first 8bp of R2 (Here is AACGCTTA), so, I want to split the fastq file according to the barcode informations and pair the read_1 to read_2 by header info. But after I searched many programes or scripts I can’t find a suitable solution:
fastq-multx

fastq-multx -B barcode_sequence -b -m 0 R2.fastq.gz R1.fastq.gz -o %_R1.fq -o %_R2.fq
The result is absolutely not what I want which only 7 lines head with its barcode.

fastx_barcode_splitter.pl
It seems don’t spport PE reads.

BBmap

I also wrote a python script, but it runs so slow…. , So, I wonder if somebody have good suggestions. Thanks in advance!

Read more here: Source link