Determining LOC coordinate from GFF3 start column
Hi all, total noob question:
I have a GFF3 file of a pepper (C. annuum) plant genome that looks like this:
seqid src type start end
chr01 PROTEIN gene 29119 37617 . - . ID=CA.PGAv.1.6.scaffold567.122
chr01 PROTEIN mRNA 29119 37617 . - . ID=TC.CA.PGAv.1.6.scaffold567.122;Parent=CA.PGAv.1.6.scaffold567.122
chr01 PROTEIN exon 29119 29457 . - 0 Parent=TC.CA.PGAv.1.6.scaffold567.122
...
chr02 ABINITI gene 157637 159805 0.22 - . ID=CA.PGAv.1.6.scaffold1545.2
...
chr04 ISGAP gene 11689 14256 1096 + . ID=CA.PGAv.1.6.scaffold638.93
...
I am trying to cross-reference the features in the GFF3 with the genes from this paper which identifies the locations with numbers such as “LOC107867643”, “LOC107868281” etc which I’m assuming are the absolute coordinates in their aligned sequence.
I’m assuming the “start” column is relative to the location of the seqid (because chr04 for example has a start less than chr02) and the spec.
My question is: how then do I translate the chr02 start 157637 for example to an absolute coordinate I can match up relative to the LOC numbers published in the paper?
For example, if the last feature for chr01 has an “end” of 309042759 and the first feature for chr02 has a “start” of 157637 can I just do 309042759 + 157637 = 309200396 to get the whole genome coordinate for that feature?
I found this Biostars question that noted if the chromosome was listed in the file it would start with 1 but I do not have any such entries in this file.
Any help would be great thanks
• 11 views
Read more here: Source link