This should be very easy and I know it, but I am stuck with it and I cannot pinpoint my mistake.
I wanted a boolean python function to check if a given file is in fasta format. And this, without manually checking myself the extension (.fa, .fasta etc). I have found this solution which suited me. When parsing for needed files, my python script now use this “is_fasta” function.
My problem is that for some files it works, for some others it doesn’t… When it doesn’t I have an error of the sort when trying to read the fasta file :
UnicodeDecodeError: 'utf-8' codec cant decode byte 0xf3 in position 551: invalid continuation byte #or UnicodeDecodeError: 'utf-8' codec cant decode byte 0x87 in position 23: invalid start byte
So I understand they might be something with the encoding of the file. I usually check it using the command
file, but for files that works as for files that does not works, I get “ASCII text”, and when asking for more information with
file -i, he just print “regular file”. So I don’t see anything about utf-8 or so. And my comprehension of file format kind of stop here.
I am working in a conda environment I have made with several tools, the python version inside is 3.6.10. I have added biopython with regular conda command and the channel conda-forge.
Does anyone has an advice about this issue ? Or should I just revert to my original idea to just check the file extension ?
Thank you and have a nice day,
Read more here: Source link