trying to make nice-looking plot of gene structure with mutations

I’d like to do something in R that seems rather simple: draw a nice plot of a human gene, with exons and introns marked (maybe even using a different color for CDS vs. UTRs, but I can live without that for now), with the position of a bunch of mutations overlaid. I’ve tried a few packages for this, the best of which (so far, has been ggbio. Here’s some code for reproducibility:

library(Homo.sapiens)
library(ggbio)
library(tidyverse)

# first get gene structure information from Homo.sapiens annotation package
genes_all <- AnnotationDbi::keys(Homo.sapiens, keytype="GENEID")
generanges <- AnnotationDbi::select(Homo.sapiens, 
                      keys=genes_all, keytype="GENEID",
                      columns=c('GENEID', 'SYMBOL', 'TXCHROM', 'TXSTART', 'TXEND'))
generanges <- drop_na(generanges)  # need to get rid of these in order to indexm later
# convert to GRanges object
generanges <- makeGRangesFromDataFrame(generanges, keep.extra.columns=T)
# we'll use RSPO1 gene as an example
generanges[generanges$SYMBOL=='RSPO1']
# simulate 200 SNPs randomly scattered through RSPO1
snps <- sample(start(generanges[generanges$SYMBOL=='RSPO1']):end(generanges[generanges$SYMBOL=='RSPO1']),
               200, replace=F)

# plot gene structures with autoplot, add SNP positions with geom_point
autoplot(Homo.sapiens, which=generanges[generanges$SYMBOL=='RSPO1'], 
         gap.geom='chevron', col="blue", fill="blue") + 
  geom_point(data=tibble(pos=snps), aes(x=pos, y=1), col="red", alpha=0.25) +
  theme_bw()

The resulting plot looks like this: RSPO1 gene plot with 4 transcripts, overlaid with SNPs in red

This is not a bad start, but I would really like to show only a single transcript, not all four splice forms – e.g. the “canonical” Refseq transcript. This is just for ease of visualization/presentation. I think what I need to do is to subset the annotation object Homo.sapiens, so that it contains only Refseq transcripts, and then pass that object to the autoplot command. I’m not sure how to do this, though – or if there is a better way to achieve this goal with a different package. (Note that I tried dandelion and lollipop plots using the trackViewer package, but the SNP density is way too high to visualize that way. Also, I really like the fact that ggbio can be combined with traditional ggplot commands, as I’ve done here.)

Read more here: Source link

Tagged