I’d like to do something in R that seems rather simple: draw a nice plot of a human gene, with exons and introns marked (maybe even using a different color for CDS vs. UTRs, but I can live without that for now), with the position of a bunch of mutations overlaid. I’ve tried a few packages for this, the best of which (so far, has been ggbio
. Here’s some code for reproducibility:
library(Homo.sapiens)
library(ggbio)
library(tidyverse)
# first get gene structure information from Homo.sapiens annotation package
genes_all <- AnnotationDbi::keys(Homo.sapiens, keytype="GENEID")
generanges <- AnnotationDbi::select(Homo.sapiens,
keys=genes_all, keytype="GENEID",
columns=c('GENEID', 'SYMBOL', 'TXCHROM', 'TXSTART', 'TXEND'))
generanges <- drop_na(generanges) # need to get rid of these in order to indexm later
# convert to GRanges object
generanges <- makeGRangesFromDataFrame(generanges, keep.extra.columns=T)
# we'll use RSPO1 gene as an example
generanges[generanges$SYMBOL=='RSPO1']
# simulate 200 SNPs randomly scattered through RSPO1
snps <- sample(start(generanges[generanges$SYMBOL=='RSPO1']):end(generanges[generanges$SYMBOL=='RSPO1']),
200, replace=F)
# plot gene structures with autoplot, add SNP positions with geom_point
autoplot(Homo.sapiens, which=generanges[generanges$SYMBOL=='RSPO1'],
gap.geom='chevron', col="blue", fill="blue") +
geom_point(data=tibble(pos=snps), aes(x=pos, y=1), col="red", alpha=0.25) +
theme_bw()
The resulting plot looks like this:
This is not a bad start, but I would really like to show only a single transcript, not all four splice forms – e.g. the “canonical” Refseq transcript. This is just for ease of visualization/presentation. I think what I need to do is to subset the annotation object Homo.sapiens
, so that it contains only Refseq transcripts, and then pass that object to the autoplot
command. I’m not sure how to do this, though – or if there is a better way to achieve this goal with a different package. (Note that I tried dandelion and lollipop plots using the trackViewer
package, but the SNP density is way too high to visualize that way. Also, I really like the fact that ggbio
can be combined with traditional ggplot commands, as I’ve done here.)
Read more here: Source link