Using featureCounts in the Rsubread package I am getting 0 annotations.
I started from raw sequencing data and the Refseq genome and Refseq Genomic GTF files downloaded from here:
www.ncbi.nlm.nih.gov/assembly/GCF_000001635.27/
through the download assembly button on the side. I had the top option to RefSeq for both downloads and chose the Genomic Fasta (fna) for the genome file and Genomic GTF (.gtf) for the annotation file.
I then build the index using:
buildindex(basename="GRCm39", reference="GCF_000001635.27_GRCm39_genomic.fna")
This generated the following files.
GRCm39.00.b.array
GRCm39.00.b.tab
GRCm39.files
GRCm39.log
GRCm39.reads
I then aligned my sequencing to the index using:
align(index="GRCm39", readfile1=allRead1files, readfile2 = allRead2files)
This ran fine and generated 3 files per paired end sequencing eg. for the paired end files named 19143_S24_R1_153.fastq.gz and 19143_S24_R2_153.fastq.gz
it output
19143_S24_R1_153.fastq.gz.subread.BAM
19143_S24_R1_153.fastq.gz.subread.BAM.indel.vcf
19143_S24_R1_153.fastq.gz.subread.BAM.summary
Total fragments : 6,925,412 ||
|| Mapped : 6,901,334 (99.7%) ||
|| Uniquely mapped : 6,277,058 ||
|| Multi-mapping : 624,276 ||
|| ||
|| Unmapped : 24,078 ||
|| ||
|| Properly paired : 5,346,711 ||
|| Not properly paired : 1,554,623 ||
|| Singleton : 87,673 ||
|| Chimeric : 71,303 ||
|| Unexpected strandness : 9,749 ||
|| Unexpected fragment length : 1,370,658 ||
|| Unexpected read order : 15,240 ||
|| ||
|| Indels : 27,554 ||
|| ||
|| Running time : 12.3 minutes
with ~76% of read being properly paired. This was consistent for all all samples.
But when I run:
fcs<-featureCounts(files=bam.files, annot.ext="GCF_000001635.27_GRCm39_genomic.gtf", isGTFAnnotationFile = T, GTF.featureType="exon,gene", GTF.attrType="gene_id", isPairedEnd = T, requireBothEndsMapped=T, countChimericFragments=F)
I get:
//========================== featureCounts setting ===========================\
|| ||
|| Input files : 2 BAM files ||
|| ||
|| 19120_S1_R1_107.fastq.gz.subread.BAM ||
|| 19120_S1_R1_108.fastq.gz.subread.BAM ||
|| ||
|| Paired-end : yes ||
|| Count read pairs : yes ||
|| Annotation : GCF_000001635.27_GRCm39_genomic.gtf (GTF) ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
//================================= Running ==================================\
|| ||
|| Load annotation file GCF_000001635.27_GRCm39_genomic.gtf ... ||
|| Features : 1571285 ||
|| Meta-features : 50563 ||
|| Chromosomes/contigs : 45 ||
|| ||
|| Process BAM file 19120_S1_R1_107.fastq.gz.subread.BAM... ||
|| Paired-end reads are included. ||
|| Total alignments : 6925412 ||
|| Successfully assigned alignments : 0 (0.0%) ||
|| Running time : 0.25 minutes ||
|| ||
|| Process BAM file 19120_S1_R1_108.fastq.gz.subread.BAM... ||
|| Paired-end reads are included. ||
|| Total alignments : 6790284 ||
|| Successfully assigned alignments : 0 (0.0%) ||
|| Running time : 0.24 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
Sorry for the long post but I am totally lost.
Based on other posts it might be that the annotation and BAM files have different chromosome names? I don’t know why that would happen when they are downloaded from the same source at the same time. How would I check this?
Thanks in advance.
Read more here: Source link