Hello!
I am trying to produce bam files to load to igv after kallisto quant with –genobam option.
After producing and loading the pseudoalignment bam to the igv, it is empty.
This is my initial command:
kallisto quant -i Homo_sapiens.GRCh38.cdna.all.release-100.idx -o pseudo -t 10 --genomebam -g Homo_sapiens.GRCh38.100.gtf -c hg38.chrom.sizes R1.fastq.gz.trim_1.fq.gz R1.fastq.gz.trim_2.fq.gz
But I think the problem is the pseudobam file produced doesn’t have chromosome information and everything is aligned to “*”:
VH00268:2:AAAH2JYHV:1:1101:51046:1057 131 * 0 0 50M * 0 0 GTCTTCCCTGGACATCACTGCCTCTCCAGGGCATTCTCAGGCCCGGGGGT CCCCCCCCCCC;CCCCCCCCCCCCCC;CCCCCCCCCCCCCCCC;CCCCCC ZW:f:0
VH00268:2:AAAH2JYHV:1:1101:56689:1057 67 * 0 0 51M * 0 0 TTAAAAAAGTATTTTCTTAACTTTTTACCTTCATTTTTCAAAACAATTAGG CC;CCC-CC-CCCCCCCCCCCC;CCCCCCCCCC;CCCCC;CC;CCCCCCCC ZW:f:0
VH00268:2:AAAH2JYHV:1:1101:56689:1057 131 * 0 0 51M * 0 0 GGAGGTCGATGCTGCATTGAGCCGAGATTGTGCCACTGCACTCCAGCGTGG CCCCCCCCCCCCCC;;CCCCC;CCCCCCCCCCCCCCCCCC-CCCCCCCCCC ZW:f:0
Initial fasta for indexing and gtf file were downloaded both from ensemble release 100
Any idea on what could be wrong in the initial pseudoreads produced?
This is the start of my gtf file:
#!genome-build GRCh38.p13
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.28
#!genebuild-last-updated 2019-06
1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
1 havana transcript 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; tag "basic"; transcript_support_level "1";
1 havana exon 11869 12227 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
1 havana exon 12613 12721 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
1 havana exon 13221 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";
and the chromosome size file is tab delimited containing “chr” for chromosome names
I also have quite low mapping percentage:
samtools flagstat pseudoalignments.bam
126464878 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
33881421 + 0 mapped (26.79% : N/A)
126464878 + 0 paired in sequencing
63232439 + 0 read1
63232439 + 0 read2
27267150 + 0 properly paired (21.56% : N/A)
27267150 + 0 with itself and mate mapped
6614271 + 0 singletons (5.23% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Many thanks!
Read more here: Source link