Difference in alignment length between FASTA and HitTable

Difference in alignment length between FASTA and HitTable


Hello all,

I’ve a horrible feeling this is going to be a stupidly obvious answer but I’ve had no luck finding a similar question amongst the forum or in the BLAST manual.

I’ve used BLAST on some sequences. I’ve then downloaded the hittable and aligned FASTA files.

Checking the hittable, I decide I’d like to filter my results by 40% sequence length before I align them. I filter my table in Excel and I use bioAwk to filter out the sequences that are less than 40% of my query sequence. I check with grep and wc to see how many FASTA files I have and then notice I’ve got a different number of FASTA files than my filtered hittable.

Checking the alignment length column on the hittable table and checking the length of my FASTA sequences, I can see there is a difference of anywhere from 20+ bases on my sequences. The FASTA sequences contain less nucleotides than the alignment length listed on the able. This number for each sequence does not correspond to any other value in the hittable.

What might the reason for this be? Is it to do with my parameters or am I missing something? Or have I misunderstood something?

Happy to provide any extra information – thank you kindly for any help and I am sincerely sorry if this is a daft question. I have honestly tried to find out why this might be but I’m having no luck at all.




Source link