I have the same problem and I am trying to solve this all in R by doing this:
- Use BiomaRt to get positions of all genes:
- Use genomicRanges to find the overlap between my dataset called “probes” and the output of BiomaRt. I still have not figured out why my overlap is not working though, but my problem is that the interval matching returns more fields than one per probe.
I’ll reproduce the code I am trying here if anybody has suggestions on how to improve it. Hope it helps.
probes<- data.frame(gene=c(8503721, 352341, 251113), chrom=c(2,2,11),probe_start=c(213865547, 1636127, 131062588), probe_end=c(213865606, 1636176, 131062647)) library(GenomicRanges) probes_int <- GRanges(seqnames = Rle(probes$chrom),ranges = IRanges(start = probes$probe_start, end = probes$probe_end), names = probes$gene) genes_int <- GRanges(seqnames = Rle(genes$chromosome_name), ranges = IRanges(start = genes$start_position, end = genes$end_position), names = genes$hgnc_symbol) overlaps<- findOverlaps(probes_int, genes_int, type="within")
Read more here: Source link