How to search dbSNP using a list of SNPs and retrieve Gene name (hgnc symbol if existing, otherwise just whatever is in there)
I have a list of 500.000 SNPs from which I want to obtain the gene name. I try to search with biomaRt
library(data.table)
library(biomaRt)
rs <- fread("SNPs.txt")
ensembl_version = "https://dec2016.archive.ensembl.org"
ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
getBM(attributes=c("refsnp_id", "associated_gene"), filters="snp_filter", values=rs, mart=ensembl, uniqueRows=TRUE)
However many of the SNPs return NA
or simply nothing. Show here:
refsnp_id associated_gene
1 rs425277 PRKCZ
2 rs1571149
3 rs1240707
4 rs1240708
5 rs873927
6 rs880051 SSU72
7 rs904589
8 rs908742
9 rs909823
10 rs925905
11 rs7290
12 rs7407
13 rs1878745
14 rs2296716 SSU72
15 rs2298217
16 rs2459994
When I search some of the rsIDs which did not produce a gene name on dbSNP, they are in fact associated with a gene name in the database. My question is then, how can I connect biomaRt to dbSNP and retrieve the correct gene names for all the SNPs in the list ‘SNPs.txt’?
• 884 views