Removing gaps from fasta sequence file
I am using python (3.6)/biopython(1.72) to read sequence files. I have an aligned sequence file in fasta format.
>Human
----------------------------MRLRVRLLKRTWPLEVPETEPTL-RSHLRQSLLCT-IPSSTDSEHSSLQN-NEQPSL
>Chimpanzee
----------------------------MRLRVRLLKRTWPLEVPETEPTL-RSRLRQSLLCT-IPSSTDSEHSSLQN-NEQPSL
>Dog
----------------------------MKLRVRLQKRTWPLDLPDAEPTL-RAHLSQALLPS-LPSSTDSEHSSLQN-NDPPSL
>Mouse
----------------------------MKLRVRLQKRTQPLEVPESEPTL-RAHLSQVLLPT-LPSSTDTEHSSLQD-NDQPSL
I need to remove the gaps '-'
from the file and have the result file like this:
>Human
MRLRVRLLKRTWPLEVPETEPTLRSHLRQSLLCTIPSSTDSEHSSLQNNEQPSL
>Chimpanzee
MRLRVRLLKRTWPLEVPETEPTLRSRLRQSLLCTIPSSTDSEHSSLQNNEQPSL
>Dog
MKLRVRLQKRTWPLDLPDAEPTLRAHLSQALLPSLPSSTDSEHSSLQNNDPPSL
>Mouse
MKLRVRLQKRTQPLEVPESEPTLRAHLSQVLLPTLPSSTDTEHSSLQDNDQPSL
I have been trying this using python:
file_var = input ("Enter your file name: ")
sequences = []
for seq_record in SeqIO.parse(file_var, "fasta"):
sequences.append(seq_record.seq)
print (sequences)
list2 = [] # list for extracting "-"
list3 = [] # list for sequence without "-"
for seq_record in alignment:
if "-" in alignment:
list2.append(seq_record)
else:
list3.append(seq_record)
But this outputs me the error:
raise NotImplementedError(_NO_SEQRECORD_COMPARISON)
NotImplementedError: SeqRecord comparison is deliberately not implemented. Explicitly compare the attributes of interest.
Can I have any suggestions??
(P.S: I have been working with sequence file using windows OS, not linux)
• 3.4k views
Read more here: Source link