How To Convert Gencode Gtf Into Bed Format ?

My solution, based on Ian’s answer:

zcat ../../../data/annotations/gencode.v24.annotation.gtf.gz |  awk 'OFS="t" {if ($3=="gene") {print $1,$4-1,$5,$10,$16,$7}}' | tr -d '";' | head
chr1    11868   14408   ENSG00000223972.5       .       +
chr1    14403   29569   ENSG00000227232.5       .       -
chr1    17368   17435   ENSG00000278267.1       .       -
chr1    29553   31108   ENSG00000243485.3       .       +
chr1    30365   30502   ENSG00000274890.1       .       +
chr1    34553   36080   ENSG00000237613.2       .       -
chr1    52472   53311   ENSG00000268020.3       .       +
chr1    62947   63886   ENSG00000240361.1       .       +
chr1    69090   70007   ENSG00000186092.4       .       +
chr1    89294   133722  ENSG00000238009.6       .       -

Gives you all the genes, with their name, in bed format.

You can use the score field to store other info you are interested in, like the common gene name:

zcat ../../../data/annotations/gencode.v24.annotation.gtf.gz |  awk 'OFS="t" {if ($3=="gene") {print $1,$4-1,$5,$10,$16,$7}}' | tr -d '";' | head
chr1    11868   14408   ENSG00000223972.5       DDX11L1 +
chr1    14403   29569   ENSG00000227232.5       WASH7P  -
chr1    17368   17435   ENSG00000278267.1       MIR6859-1       -
chr1    29553   31108   ENSG00000243485.3       RP11-34P13.3    +
chr1    30365   30502   ENSG00000274890.1       MIR1302-2       +
chr1    34553   36080   ENSG00000237613.2       FAM138A -
chr1    52472   53311   ENSG00000268020.3       OR4G4P  +
chr1    62947   63886   ENSG00000240361.1       OR4G11P +
chr1    69090   70007   ENSG00000186092.4       OR4F5   +
chr1    89294   133722  ENSG00000238009.6       RP11-34P13.7    -

Read more here: Source link