Hi,
I’ve been passed down some bam files from RNA-seq that I need to analyze. They were generated with pseudoalignments using Kallisto ( pachterlab.github.io/kallisto/about )
When I run htseq-count one specific bamfile crashes on a specific read:
Error occured when processing input (record #149548485 in file 103805-016-011.kallisto.pseudoalignments.bam):
Expected str, got NoneType [Exception type: TypeError, raised in _HTSeq.pyx:60]
After delving deeper I realized that the next read (or that specific one if it is 0-based) has some very weird formatting, at least to me:
[149548484] A00379:576:H7WK3DSX3:4:1101:5421:1016 77 * 0 0 * * 0 0 GNTCTTTTAAAAAGAGATTAAACCGAAGGTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGCGTCATTTAAAGCCTAGTTAACGCATTTACTAAACGCAGACGAAAATGGAAAGATTAATTGGGAGTGGTAGGATGAAACAAT F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
[149548485] A00379:576:H7WK3DSX3:4:1101:5421:1016 141 * 0 0 * * 0 0 CTTCCTACTTTTCAGGTTTAAATTTATCTTTTTTCTTCTAAAAGTATGTTTTTATCTTCTAATTTCCCTATCTTCTCTATTCTTTTCTTCGCCTTCCCGTACTTCTGTCTTCCAGTTTTACACTTCAAACTTCTATCTTCTCCAAATTGTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF,FFFFFFF,FFFFF:FFF:FF,FFF:FF,F,FFFFF,:FF
[149548486] A00379:576:H7WK3DSX3:4:1101:13702:1016 67 * 0 0 151M * 0 0 ANAAATCTAGGCTCCATCAACACTGAATTGCAAGATGTGCAGAGGATCATGGTGGCCAATATTGAAGAAGTGTTACAACGAGGAGAAGCACTCTCAGCATTGGATTCAAAGGCTAACAATTTGTCCAGTCTGTCCAAGAAATACCGCCAGG F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF ZW:f:0
[149548487] A00379:576:H7WK3DSX3:4:1101:13702:1016 131 * 0 0 151M * 0 0 TATTTTCAGGAAACTGAGCTCACAGAGATGTGTATTAGAATCCAAGTGGAACTTCTGCCTCTAAAGACCTTGCAAGAAAAGAGATGCCCTGAAAATGAAAGGTTGCACCTCATTTAATGAAGCTTAACCCTATGTAGAAAGTCTCTTTCGG F:FFFFFFFFF:FFFFF,FFFFFFFFFFFFFFFFF:FFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF,FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ZW:f:0
[149548488] A00379:576:H7WK3DSX3:4:1101:14913:1016 77 * 0 0 * * 0 0 ANTATAACAAACCCTGAGAACCAAAATGAACGAAAATCTGTTCGCTTCATTCATTGCCCCCACAATCCTAGGCCTACCCGCCGCAGTACTGATCATTCTATTTCCCCCTCTATTGATCCCCACCTCCAAATATCTCATCAACAACCGACTA F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,::,F::,FF:F:FFFFF,FF:,,::FF:F,F:,,:FFFFFF:F:F::F:FFFFFF,
[149548489] A00379:576:H7WK3DSX3:4:1101:14913:1016 141 * 0 0 * * 0 0 GTTTATAGATAGTTGGGTGGTTGGTGTAAATGAGTGAGGCAGGAGTCCGAGGAGGAGGTTAGTTGTGGCAATAAAAATGATTAAGGATACTAGTATAAGAGATCAGGTTCGTCCTTTAGTGTTGTGTATGGTTATCATTTGTTTTGAGGTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:F:FFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFF::FFFF:F:FFFFFFFFFFFF::FFF,FFFFFFFFFFFFFFFFFFFF:FFFFFFFF,:F
The read pair at lines 149548486-149548487 are flagged as being mapped, but they aren’t mapped to any known sequence (RNAME is *).
How can I prevent such error?
Read more here: Source link