My solution, based on Ian’s answer:
zcat ../../../data/annotations/gencode.v24.annotation.gtf.gz | awk 'OFS="t" {if ($3=="gene") {print $1,$4-1,$5,$10,$16,$7}}' | tr -d '";' | head
chr1 11868 14408 ENSG00000223972.5 . +
chr1 14403 29569 ENSG00000227232.5 . -
chr1 17368 17435 ENSG00000278267.1 . -
chr1 29553 31108 ENSG00000243485.3 . +
chr1 30365 30502 ENSG00000274890.1 . +
chr1 34553 36080 ENSG00000237613.2 . -
chr1 52472 53311 ENSG00000268020.3 . +
chr1 62947 63886 ENSG00000240361.1 . +
chr1 69090 70007 ENSG00000186092.4 . +
chr1 89294 133722 ENSG00000238009.6 . -
Gives you all the genes, with their name, in bed format.
You can use the score field to store other info you are interested in, like the common gene name:
zcat ../../../data/annotations/gencode.v24.annotation.gtf.gz | awk 'OFS="t" {if ($3=="gene") {print $1,$4-1,$5,$10,$16,$7}}' | tr -d '";' | head
chr1 11868 14408 ENSG00000223972.5 DDX11L1 +
chr1 14403 29569 ENSG00000227232.5 WASH7P -
chr1 17368 17435 ENSG00000278267.1 MIR6859-1 -
chr1 29553 31108 ENSG00000243485.3 RP11-34P13.3 +
chr1 30365 30502 ENSG00000274890.1 MIR1302-2 +
chr1 34553 36080 ENSG00000237613.2 FAM138A -
chr1 52472 53311 ENSG00000268020.3 OR4G4P +
chr1 62947 63886 ENSG00000240361.1 OR4G11P +
chr1 69090 70007 ENSG00000186092.4 OR4F5 +
chr1 89294 133722 ENSG00000238009.6 RP11-34P13.7 -
Read more here: Source link