Tag: ENSG

assign in pandas pipeline – Stackify

You can use pipe: tmp_df = df. drop(“Gene type”, axis=1). rename(columns = { “Gene stable ID”: “ENSG”, “Gene name”: “gene_name”, “miRBase accession”: “MI”, “miRBase ID”: “mirna_name” }). pipe(lambda x: x.assign(species = x.mirna_name.str[:3])) tmp_df Out[365]: ENSG gene_name MI mirna_name species 0 ENSG00000274494 MIR6832 MI0022677 hsa-mir-6832 hsa 1 ENSG00000283386 MIR4659B MI0017291 hsa-mir-4659b…

Continue Reading assign in pandas pipeline – Stackify

Help needed for Ensembl Gene ID conversion for RNA-seq data

Hello All, I am new to the RNA-seq world and especially new to the bioinformatics side. We recently completed a RNA-seq experiment (total RNAs) on human samples and we used illumina’s Dragen RNA pipeline which generated salmon gene count (.sf) output files. In the files, the gene ID is in…

Continue Reading Help needed for Ensembl Gene ID conversion for RNA-seq data

bedtools getfasta concatenating sequences

bedtools getfasta concatenating sequences 0 Hi, I have a bed file containing exons of the genes. the name field is specified with name of the gene like (ENSG***). when I run bedtools getfasta I get the sequences of each exon separately. is there a standard way in order to concatenate…

Continue Reading bedtools getfasta concatenating sequences

Get gene names from rs SNP ids

Gene to rs id library(biomaRt) ## It might take long time to process if many genes (>50) in the list. ## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line. genes <- read.table(“~/hgnc_gene_symbols.txt”) ensembl = useMart(“ensembl”, dataset=”hsapiens_gene_ensembl”) dbsnp = useMart(“snp”, dataset = “hsapiens_snp”) getHGNC2ENSG =…

Continue Reading Get gene names from rs SNP ids

Finding the significance of the overlap between 2 or more gene sets using simulation in R.

TLDR: Example R function to calculate significance of overlap of 2 or more gene sets. genes_all is a vector that contains all genes, and gene_sets takes a list of vectors for each gene set. I encourage people to read the full tutorial and attempt to reproduce the code themselves (especially…

Continue Reading Finding the significance of the overlap between 2 or more gene sets using simulation in R.

Rshiny – ggplot – density plot

Hi, I’m trying to do a density plot in Rshiny. I find it confusing with this error Error: StatBin requires a continuous x variable: the x variable is discrete.Perhaps you want stat=”count”? My data looks like: t_data_gene ENSG S45 545 S43 4588 S36 454 S33 4685 … I used: ggplot(t_data_gene,…

Continue Reading Rshiny – ggplot – density plot

ICA – Reconstruction Errors

ICA – Reconstruction Errors 0 Hello people! I am completely new to the topic of ML and Omics and at all in the bioinformatics field. To gather some knowledge I started to work through a book I found on the internet and there is the following task: “Produce a 10-component…

Continue Reading ICA – Reconstruction Errors

Answer: AnnotationHub::mapIds() cannot find existing ENSG (GEO supplemental data cross-r

Hi, a quick check on NCBI Gene reveals that the official symbol for this is *PRXL2C*, not *AAED1*. In this way, I would not have expected `org.Hs.eg.db` (using ‘recent’ annotation) to have it. However, I can see that `EnsDb.Hsapiens.v86` (older version) does [have it]. So, there must have been an…

Continue Reading Answer: AnnotationHub::mapIds() cannot find existing ENSG (GEO supplemental data cross-r

AnnotationHub::mapIds() cannot find existing ENSG (GEO supplemental data cross-referenced with ensembl.org)

Anyone know why I’m not getting ENSG ids for some of these symbols? The example below retrieves `NA` for multiple symbols, including AAED1 [whose ENSG is ENSG00000158122][1]. “` > library(AnnotationHub) > library(org.Hs.eg.db) > library(GEOquery) > temp download.file(getGEO(“GSM4430459″)@header$supplementary_file_1,temp) > genes unlink(temp) > ensids = mapIds(org.Hs.eg.db, keys=genes, column=”ENSEMBL”, keytype=”SYMBOL”, multiVals=”first”) > ensids[“AAED1”]…

Continue Reading AnnotationHub::mapIds() cannot find existing ENSG (GEO supplemental data cross-referenced with ensembl.org)