How to fix GTF files by adding specific strings into empty gene_id “”

How to fix GTF files by adding specific strings into empty gene_id “”

1

Hi,

I want to repair GTF file by adding a unique string (such as Product name) to empty gene_id “”. I would really appreciate it if anyone could provide any solution.

For example:

grep -m1 'gene_id ""' mygtf.gtf

NC_001717.1 RefSeq  exon    1004    1071    .   +   .   **gene_id ""**; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";

I want to add the product name between the double quotes right after the gene_id like:

NC_001717.1 RefSeq  exon    1004    1071    .   +   .   gene_id "tRNA-Phe"; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";

I have 24 empty gene_id, and need to fix all of them. I obtained this file from NCBI RefSeq. Unfortunately, this species is not available from the Ensemble database.

The original reason why I would like to fix the GTF file is to filter GTF file with cellragner mkgtf. I am getting the below error, so I need to modify the GTF file.

cellranger.reference.GtfParseError: Error while parsing GTF file /~/genome/mygtf.gtf Property 'gene_id' is empty in GTF line 1809658: NC_001717.1 RefSeq exon 1004 1071 . + gene_id ""; transcript_id "unknown_transcript_1"; anticodon "(pos:1034..1036)"; gbkey "tRNA"; note "putative"; product "tRNA-Phe"; exon_number "1";

Thank you!


cellranger


GTF


Annotation


UNIX

• 24 views

updated 41 minutes ago by

105k

written 2 hours ago by

0

Read more here: Source link