How to search dbSNP using a list of SNPs and retrieve Gene name (hgnc symbol if existing, otherwise just whatever is in there)
I have a list of 500.000 SNPs from which I want to obtain the gene name. I try to search with biomaRt
library(data.table) library(biomaRt) rs <- fread("SNPs.txt") ensembl_version = "https://dec2016.archive.ensembl.org" ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp") getBM(attributes=c("refsnp_id", "associated_gene"), filters="snp_filter", values=rs, mart=ensembl, uniqueRows=TRUE)
However many of the SNPs return
NA or simply nothing. Show here:
refsnp_id associated_gene 1 rs425277 PRKCZ 2 rs1571149 3 rs1240707 4 rs1240708 5 rs873927 6 rs880051 SSU72 7 rs904589 8 rs908742 9 rs909823 10 rs925905 11 rs7290 12 rs7407 13 rs1878745 14 rs2296716 SSU72 15 rs2298217 16 rs2459994
When I search some of the rsIDs which did not produce a gene name on dbSNP, they are in fact associated with a gene name in the database. My question is then, how can I connect biomaRt to dbSNP and retrieve the correct gene names for all the SNPs in the list ‘SNPs.txt’?
• 884 views