How to get the total genic and intergenic length of a chromosome?

It looks like you have a .gtf file. That means you can extract the exon lines from the .gtf file and count and sum up the exonic intervals.

You can generate a sorted .bed file of exon coordinates by:

grep -P 'texont' your.gtf | cut -f 1,4,5 | sort -k1,1 -k2,2n > exons.bed

You can merge this exons.bed using bedtools:

bedtools merge -i exons.bed > exons.merged.bed

You can count/sum the intervals in the merged bed file:

awk -F't' 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}' exons.merged.bed

This should give you number of genic bases, if you are defining genic by just exons. To get intergenic, just sum up your chromosome lengths and subtract the genic number.

You can do all this in one line also.

grep -P 'texont' your.gtf | cut -f 1,4,5 | sort -k1,1 -k2,2n | bedtools merge -i stdin | awk -F't' 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}'

Read more here: Source link