isolate adapter contamination reads from fastq file using python

isolate adapter contamination reads from fastq file using python

0

hi everyone,
I want to extract adapter contaminated reads from a fastq file using python code, but I am unable to do so.

adapter sequence is : “GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA”
file contain this data :

@HWUSI-EAS570R_0003:2:50:5038:17424#0/1
CAGCTTCTGTTGATGCTGATTTAATTCCTGCAACTA
+HWUSI-EAS570R_0003:2:50:5038:17424#0/1
hhhhhhhhhhhgghhhhhahhhhhhhhhhhhgfhh[
@HWUSI-EAS570R_0003:2:50:5175:17417#0/1
CACCTTGCTTTATGGGAAAGCGTAACATAACTACAG
+HWUSI-EAS570R_0003:2:50:5175:17417#0/1
hhhhhhhhhhhfhhhhfaehhhhgahehhcghhfch
@HWUSI-EAS570R_0003:2:50:5442:17417#0/1
AGTTCGCCGACGTTTACGCCGCCTCGGTCCTCGGCA
+HWUSI-EAS570R_0003:2:50:5442:17417#0/1
ghhhhhhhhhhhhhhfhhhhhhhfhhgfhhgfgffc
@HWUSI-EAS570R_0003:2:50:5552:17421#0/1
AAGACATCAAACTACGAAACTACTACAAGAAAACAT
+HWUSI-EAS570R_0003:2:50:5552:17421#0/1
hghghhhhhhhhhghhhhhhghhhhhehhhhheg`h
@HWUSI-EAS570R_0003:2:50:5658:17415#0/1
GTTCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTAG
+HWUSI-EAS570R_0003:2:50:5658:17415#0/1
hhhhhfhghdhhhhhhhhhhhgghhfheffhdfcbf
@HWUSI-EAS570R_0003:2:50:5712:17421#0/1
TTTCTTTTACCCCTAATCCTATCAGCTTTTTCTCCC
+HWUSI-EAS570R_0003:2:50:5712:17421#0/1
hhhghhhhhhhhhhhhhhghhhghhhhhghhhghhh

this is the code tried:

import re

with open(‘last_mock.fastq’,’r’) as rf:

    for line in rf:
        x= re.match( r"(GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA)",line)
        if x:
             print(x)


regex


bioinformatics


python


genomics

• 22 views

Read more here: Source link