isolate adapter contamination reads from fastq file using python
hi everyone,
I want to extract adapter contaminated reads from a fastq file using python code, but I am unable to do so.
adapter sequence is : “GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA”
file contain this data :
@HWUSI-EAS570R_0003:2:50:5038:17424#0/1
CAGCTTCTGTTGATGCTGATTTAATTCCTGCAACTA
+HWUSI-EAS570R_0003:2:50:5038:17424#0/1
hhhhhhhhhhhgghhhhhahhhhhhhhhhhhgfhh[
@HWUSI-EAS570R_0003:2:50:5175:17417#0/1
CACCTTGCTTTATGGGAAAGCGTAACATAACTACAG
+HWUSI-EAS570R_0003:2:50:5175:17417#0/1
hhhhhhhhhhhfhhhhfaehhhhgahehhcghhfch
@HWUSI-EAS570R_0003:2:50:5442:17417#0/1
AGTTCGCCGACGTTTACGCCGCCTCGGTCCTCGGCA
+HWUSI-EAS570R_0003:2:50:5442:17417#0/1
ghhhhhhhhhhhhhhfhhhhhhhfhhgfhhgfgffc
@HWUSI-EAS570R_0003:2:50:5552:17421#0/1
AAGACATCAAACTACGAAACTACTACAAGAAAACAT
+HWUSI-EAS570R_0003:2:50:5552:17421#0/1
hghghhhhhhhhhghhhhhhghhhhhehhhhheg`h
@HWUSI-EAS570R_0003:2:50:5658:17415#0/1
GTTCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTAG
+HWUSI-EAS570R_0003:2:50:5658:17415#0/1
hhhhhfhghdhhhhhhhhhhhgghhfheffhdfcbf
@HWUSI-EAS570R_0003:2:50:5712:17421#0/1
TTTCTTTTACCCCTAATCCTATCAGCTTTTTCTCCC
+HWUSI-EAS570R_0003:2:50:5712:17421#0/1
hhhghhhhhhhhhhhhhhghhhghhhhhghhhghhh
this is the code tried:
import re
with open(‘last_mock.fastq’,’r’) as rf:
for line in rf:
x= re.match( r"(GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA)",line)
if x:
print(x)
• 22 views
Read more here: Source link