Dear Maciek
Hopefully you were able to solve these problems already.
I cannot comment on the main set of issues you reported. However, I also encountered the error: `Error in START[!REV, 3] : incorrect number of dimensions` following certain instances of `set.synnonsyn` which I also noticed occurred for genes which have only a single CDS listed in the gtf file.
There is a ‘dirty’ way to fix this which is to just ‘copy and paste’ any lone CDS in the gtf file, thereby adding a dummy CDS. As each dummy is a duplicate and therefore covers the same coordinates, it should not affect the number of coding SNPs detected.
Of course, this is not an elegant solution, but it enabled `set.synnonsyn` proceed without error for single-CDS genes.
First I read in the gtf and make a list of all ENSEMBL IDs of genes which have only one CDS:
library(ape)
library(dplyr)
gtf <- paste(“stickleback_v5_ensembl_genes_reformatted.gtf”)
allgenes <- read.gff(gtf, na.strings = c(“.”, “?”), GFF3 = FALSE)
allgenes<-subset(allgenes,feature==”CDS”)[9]
allgenes$attributes<-gsub(“ENS_gene=”,””,allgenes$attributes)
allgenes$attributes<-gsub(“;.*”,””,allgenes$attributes)
colnames(allgenes)[1]<-“ensembl_gene_id”
allgenes <- add_count(allgenes,ensembl_gene_id,name=”n_CDS”)
singles <- subset(allgenes,n_CDS==1)[1]
Then I get a subset of the gtf containing only these single CDS genes:
allgenes <- read.gff(gtf, na.strings = c(“.”, “?”), GFF …