.sam
file was generated by following code
samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam
samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam
samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam
samtools index Untreated-3_s.bam
.gtf
file was downloaded by:
wget ftp://ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz
gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz
when I use htseq-count:
htseq-count -s no -a 10 Untreated-3_sn.sam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count
an error occured:
file has no sequences defined (mode="r") - is it SAM/BAM format? Consider opening with check_sq=False
[Exception type: ValueError, raised in libcalignmentfile.pyx:990]
I can use samtools view see the sorted .bam
file, but can’t the .sam
file generated by .bam
file, error occured that
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
I find I can use the .bam file to get the count file:
htseq-count -s no -a 10 Untreated-3_sn.bam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count
It works like this:
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
358027 GFF lines processed.
100000 alignment record pairs processed.
200000 alignment record pairs processed.
……
9700000 alignment record pairs processed.
9800000 alignment record pairs processed.
9900000 alignment record pairs processed.
10000000 alignment record pairs processed.
10028400 alignment pairs processed.
I’m totally new to bioinfo, please tell me is it OK to get count
file by .bam
file rather than .sam
file? thanks
Read more here: Source link