Issue reading fasta file with Biopython

Hello everyone,

This should be very easy and I know it, but I am stuck with it and I cannot pinpoint my mistake.

I wanted a boolean python function to check if a given file is in fasta format. And this, without manually checking myself the extension (.fa, .fasta etc). I have found this solution which suited me. When parsing for needed files, my python script now use this “is_fasta” function.

My problem is that for some files it works, for some others it doesn’t… When it doesn’t I have an error of the sort when trying to read the fasta file :

UnicodeDecodeError: 'utf-8' codec cant decode byte 0xf3 in position 551: invalid continuation byte 
#or
UnicodeDecodeError: 'utf-8' codec cant decode byte 0x87 in position 23: invalid start byte

So I understand they might be something with the encoding of the file. I usually check it using the command file, but for files that works as for files that does not works, I get “ASCII text”, and when asking for more information with file -i, he just print “regular file”. So I don’t see anything about utf-8 or so. And my comprehension of file format kind of stop here.

I am working in a conda environment I have made with several tools, the python version inside is 3.6.10. I have added biopython with regular conda command and the channel conda-forge.

Does anyone has an advice about this issue ? Or should I just revert to my original idea to just check the file extension ?

Thank you and have a nice day,

Read more here: Source link