Tag: ENSG
Flitering Genes in RNA-Seq causes more significant FDR Adjusted P-Val DEGs?
Hello, I am performing RNA-Seq Analysis on 6 samples: 3 from an infected group and 3 from control. I have performed alignment and quantification and removed ribosomal-RNA reads. I use Edge-R to filter and the TMM-normalize my read counts. I typically remove all genes not expressed over 1 CPM in…
ENSG-like CATG Identifier?
ENSG-like CATG Identifier? 3 Hello all, I’ve been giving some expression data which has rather perplexing identifiers. Along with Ensembl ENSG identifiers, it also has similar CATG identifiers (eg CATG00000000004) that I can seem to find nothing about. I know the answer is probably obvious to the initiated, but I’m…
Get TPM from RNA counts and gene length?
Get TPM from RNA counts and gene length? 1 Hello, I am working with an RNA-seq FeatureCounts output file that supplies the counts for a given ENSG gene ID, as well as the gene length(according to documentation this is in base pairs, not kilobases). Is there a way to obtain…
DE Analysis on cells from a patient derived mouse xenograft with high levels of mouse count “contamination”
I am performing a differential expression analysis for collaborators. The overall biological design from my collaborators is as follows: 1) Received patient sample. 2) Amplified patient sample using patient derived xenograft (PDX) in a mouse host. 3) Extracted cells from mouse and enriched for human cells by positive selection using…
Annotating snps with gene information
Annotating snps with gene information 0 Hi, I am trying to annotate a list of snps with the ENSG gene number using biomaRt. I need to use ensemble version 91. I have built the following query: snps = c(“rs201327123” “rs141149254” “rs114420996” “rs62637817″) ensembl.snp = useEnsembl(biomart=”snps”, version=91) mart.snp <- useMart(biomart =…
assign in pandas pipeline – DevDreamz
You can use pipe: tmp_df = df.\ drop(“Gene type”, axis=1).\ rename(columns = { “Gene stable ID”: “ENSG”, “Gene name”: “gene_name”, “miRBase accession”: “MI”, “miRBase ID”: “mirna_name” }).\ pipe(lambda x: x.assign(species = x.mirna_name.str[:3])) tmp_df Out[365]: ENSG gene_name MI mirna_name species 0 ENSG00000274494 MIR6832 MI0022677 hsa-mir-6832 hsa 1 ENSG00000283386 MIR4659B MI0017291 hsa-mir-4659b…
assign in pandas pipeline – Stackify
You can use pipe: tmp_df = df. drop(“Gene type”, axis=1). rename(columns = { “Gene stable ID”: “ENSG”, “Gene name”: “gene_name”, “miRBase accession”: “MI”, “miRBase ID”: “mirna_name” }). pipe(lambda x: x.assign(species = x.mirna_name.str[:3])) tmp_df Out[365]: ENSG gene_name MI mirna_name species 0 ENSG00000274494 MIR6832 MI0022677 hsa-mir-6832 hsa 1 ENSG00000283386 MIR4659B MI0017291 hsa-mir-4659b…
Help needed for Ensembl Gene ID conversion for RNA-seq data
Hello All, I am new to the RNA-seq world and especially new to the bioinformatics side. We recently completed a RNA-seq experiment (total RNAs) on human samples and we used illumina’s Dragen RNA pipeline which generated salmon gene count (.sf) output files. In the files, the gene ID is in…
bedtools getfasta concatenating sequences
bedtools getfasta concatenating sequences 0 Hi, I have a bed file containing exons of the genes. the name field is specified with name of the gene like (ENSG***). when I run bedtools getfasta I get the sequences of each exon separately. is there a standard way in order to concatenate…
Get gene names from rs SNP ids
Gene to rs id library(biomaRt) ## It might take long time to process if many genes (>50) in the list. ## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line. genes <- read.table(“~/hgnc_gene_symbols.txt”) ensembl = useMart(“ensembl”, dataset=”hsapiens_gene_ensembl”) dbsnp = useMart(“snp”, dataset = “hsapiens_snp”) getHGNC2ENSG =…
Finding the significance of the overlap between 2 or more gene sets using simulation in R.
TLDR: Example R function to calculate significance of overlap of 2 or more gene sets. genes_all is a vector that contains all genes, and gene_sets takes a list of vectors for each gene set. I encourage people to read the full tutorial and attempt to reproduce the code themselves (especially…
Rshiny – ggplot – density plot
Hi, I’m trying to do a density plot in Rshiny. I find it confusing with this error Error: StatBin requires a continuous x variable: the x variable is discrete.Perhaps you want stat=”count”? My data looks like: t_data_gene ENSG S45 545 S43 4588 S36 454 S33 4685 … I used: ggplot(t_data_gene,…
ICA – Reconstruction Errors
ICA – Reconstruction Errors 0 Hello people! I am completely new to the topic of ML and Omics and at all in the bioinformatics field. To gather some knowledge I started to work through a book I found on the internet and there is the following task: “Produce a 10-component…
Answer: AnnotationHub::mapIds() cannot find existing ENSG (GEO supplemental data cross-r
Hi, a quick check on NCBI Gene reveals that the official symbol for this is *PRXL2C*, not *AAED1*. In this way, I would not have expected `org.Hs.eg.db` (using ‘recent’ annotation) to have it. However, I can see that `EnsDb.Hsapiens.v86` (older version) does [have it]. So, there must have been an…
AnnotationHub::mapIds() cannot find existing ENSG (GEO supplemental data cross-referenced with ensembl.org)
Anyone know why I’m not getting ENSG ids for some of these symbols? The example below retrieves `NA` for multiple symbols, including AAED1 [whose ENSG is ENSG00000158122][1]. “` > library(AnnotationHub) > library(org.Hs.eg.db) > library(GEOquery) > temp download.file(getGEO(“GSM4430459″)@header$supplementary_file_1,temp) > genes unlink(temp) > ensids = mapIds(org.Hs.eg.db, keys=genes, column=”ENSEMBL”, keytype=”SYMBOL”, multiVals=”first”) > ensids[“AAED1”]…