is it same to use .bam file or .sam file?

.sam file was generated by following code

samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam 
samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam 
samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam
samtools index Untreated-3_s.bam

.gtf file was downloaded by:

wget ftp://ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz

gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz

when I use htseq-count:

htseq-count -s no -a 10 Untreated-3_sn.sam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count

an error occured:

 file has no sequences defined (mode="r") - is it SAM/BAM format? Consider opening with check_sq=False
  [Exception type: ValueError, raised in libcalignmentfile.pyx:990]

I can use samtools view see the sorted .bam file, but can’t the .sam file generated by .bam file, error occured that

[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.

I find I can use the .bam file to get the count file:

 htseq-count -s no -a 10 Untreated-3_sn.bam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count

It works like this:

100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
358027 GFF lines processed.
100000 alignment record pairs processed.
200000 alignment record pairs processed.
……
9700000 alignment record pairs processed.
9800000 alignment record pairs processed.
9900000 alignment record pairs processed.
10000000 alignment record pairs processed.
10028400 alignment pairs processed.

I’m totally new to bioinfo, please tell me is it OK to get count file by .bam file rather than .sam file? thanks

Read more here: Source link