Tag: HGNC

NM_000546.6(TP53):c.743G>A (p.Arg248Gln) AND Lymphoma – ClinVar

NM_000546.6(TP53):c.743G>A (p.Arg248Gln) AND Lymphoma Based on: 1 submission [Details] Record status: current Accession: RCV000790860.7 Allele description NM_000546.6(TP53):c.743G>A (p.Arg248Gln) Gene: TP53:tumor protein p53 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 17p13.1 Genomic location: Preferred name: NM_000546.6(TP53):c.743G>A (p.Arg248Gln) Other names: p.R248Q:CGG>CAG HGVS: NC_000017.11:g.7674220C>T NG_017013.2:g.18331G>A NM_000546.6:c.743G>AMANE SELECT NM_001126112.3:c.743G>A…

Continue Reading NM_000546.6(TP53):c.743G>A (p.Arg248Gln) AND Lymphoma – ClinVar

how to merge human reference genome and GTF file with a custom sequence.

Hello Biostars, I am looking for some guidance on how to merge some files for my rna-bulk sequencing analysis. Let me start by describing the problem: I recieved an mRNA sequence of 4775 characters which I would like to merge with the human reference genome that I download from NCBI…

Continue Reading how to merge human reference genome and GTF file with a custom sequence.

Solved Currently, ENSEMBL GENE IDs with their versions

Currently, ENSEMBL GENE IDs with their versions represent each unique gene in each row. As an example, for ENSG00000000003.15, ENSG00000000003 represents the unique Ensembl gene ID and 15 represents the version. In biology, we are more familiar with the gene symbol, known as the HGNC SYMBOL. ENSG00000000003.15 Ensembl ID corresponds…

Continue Reading Solved Currently, ENSEMBL GENE IDs with their versions

Human hg38 chr6:31,165,200-31,165,800 UCSC Genome Browser v457

     Custom Tracks ac4C-RIP-seq peaks, hESC CTL-1hidedensesquishpackfull ac4C-RIP-seq peaks, hESC CTL-2hidedensesquishpackfull ac4C-RIP-seq peaks, hESC NAT10-KD-1hidedensesquishpackfull ac4C-RIP-seq peaks, hESC NAT10-KD-2hidedensesquishpackfull    Mapping and Sequencing Base Positionhidedensefull p14 Fix Patcheshidedensesquishpackfull p14 Alt Haplotypeshidedensesquishpackfull Assemblyhidedensesquishpackfull Centromereshidedensesquishpackfull Chromosome Bandhidedensesquishpackfull Clone Endshidedensesquishpackfull Exome Probesetshidedensesquishpackfull FISH Cloneshidedensesquishpackfull Gaphidedensesquishpackfull GC Percenthidedensefull GRC Contigshidedensefull GRC Incidenthidedensesquishpackfull Hg19…

Continue Reading Human hg38 chr6:31,165,200-31,165,800 UCSC Genome Browser v457

NM_000391.4(TPP1):c.379C>T (p.Arg127Ter) AND Angelman syndrome – ClinVar

NM_000391.4(TPP1):c.379C>T (p.Arg127Ter) AND Angelman syndrome Based on: 1 submission [Details] Record status: current Accession: RCV001804926.2 Allele description [Variation Report for NM_000391.4(TPP1):c.379C>T (p.Arg127Ter)] NM_000391.4(TPP1):c.379C>T (p.Arg127Ter) Gene: TPP1:tripeptidyl peptidase 1 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 11p15.4 Genomic location: Preferred name: NM_000391.4(TPP1):c.379C>T (p.Arg127Ter) Other names: p.R127*:CGA>TGA…

Continue Reading NM_000391.4(TPP1):c.379C>T (p.Arg127Ter) AND Angelman syndrome – ClinVar

NA values in conumee detail

NA values in conumee detail 0 Hi Biostars! I am currently performing a CNA-analysis on Illumina Infinium EPIC data. As i have genes of interest discovered through a separate RNA-seq analysis, i created g-ranges for these to study their CNA profile using conumee. However, after successfully running CNV.create_anno (i.e. creating…

Continue Reading NA values in conumee detail

Extract of no. shared gene names in disease-disease association though API from Disgenet Database

Extract of no. shared gene names in disease-disease association though API from Disgenet Database 1 Hello Everyone, I am trying to extract the shared gene between two diseases (disease-disease association) from the “Disgenet Database“. But I only received a number but not the name or symbol of genes between the…

Continue Reading Extract of no. shared gene names in disease-disease association though API from Disgenet Database

vcf – VEP annotation INFO field Ensembl IDs and locations

I have a vcf file that I annoteted with VEP, for human data. I have run VEP to annotate my files with some additional parameters (as shown below in the ##VEP-command-line). However, my output is rather strange (mainly the INFO column). ##VEP=”v108″ time=”2023-04-27 15:13:08″ cache=”workflow/resources/variants/cache_vep/homo_sapiens/108_GRCh38″ ensembl-funcgen=108.56bb136 ensembl-variation=108.a885ada ensembl-io=108.58d13c1 ensembl=108.d8a9c80 1000genomes=”phase3″…

Continue Reading vcf – VEP annotation INFO field Ensembl IDs and locations

Conversion of Gene Name to Ensembl ID

Using Enembl REST API: rest.ensembl.org/lookup/symbol/homo_sapiens/A1CF assembly_name: GRCh38 biotype: protein_coding db_type: core description: APOBEC1 complementation factor [Source:HGNC Symbol;Acc:HGNC:24086] display_name: A1CF end: 50885675 id: ENSG00000148584 logic_name: ensembl_havana_gene_homo_sapiens object_type: Gene seq_region_name: 10 source: ensembl_havana species: homo_sapiens start: 50799409 strand: -1 version: 15 rest.ensembl.org/lookup/symbol/homo_sapiens/A1CF?content-type=application/json {“strand”:-1,”assembly_name”:”GRCh38″,”version”:15,”species”:”homo_sapiens”,”end”:50885675,”description”:”APOBEC1 complementation factor [Source:HGNC Symbol;Acc:HGNC:24086]”,”source”:”ensembl_havana”,”db_type”:”core”,”object_type”:”Gene”,”id”:”ENSG00000148584″,”seq_region_name”:”10″,”display_name”:”A1CF”,”start”:50799409,”logic_name”:”ensembl_havana_gene_homo_sapiens”,”biotype”:”protein_coding”} Look up multiple symbols at…

Continue Reading Conversion of Gene Name to Ensembl ID

hclust with similar data gives different

hclust with similar data gives different 0 I have RNAseq data with expression in ensembl ID and I convert them into gene symbol and for further analysis. I had performed hclustering and then cut tree using dynamicTreeCut with a geneset of 20010 genes and got 27 different gene-clusters. Now after…

Continue Reading hclust with similar data gives different

GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership | Genome Biology

Models for single-cell ATAC-seq data In single-cell ATAC-seq data, \(x_{ij}\) is the number of unique reads mapping to peak or region j in cell i. Although \(x_{ij}\) can take non-negative integer values, it is common to “binarize” the accessibility data (e.g., [19, 74, 133,134,135]), meaning that \(x_{ij} = 1\) when…

Continue Reading GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership | Genome Biology

Hugo_Symbol to Entrez ID

Hello, I have Myeloid-Acute Myeloid Leukemia (AML) RNAseq data file data_mrna_seq_rpkm.csv. This file has Hugo_Symbols for all 22,844 genes but not its Entrez IDs. I was able use to two methods in R programming 1) library(org.Hs.eg.db) mapIDs method and 2) biomaRT method to get the entrez_ID of only 16,569 genes…

Continue Reading Hugo_Symbol to Entrez ID

KidneyGPS: a user-friendly web application to help prioritize kidney function genes and variants based on evidence from genome-wide association studies | BMC Bioinformatics

User interface The user interface of KidneyGPS is organized into five tabs: Three tabs enable the specific search for genes, variants and regions (underlying data structure shown in Additional file 1: Fig. S4): (1) “gene search” tab: search for genes using their gene names (synonyms automatically mapped to their official HGNC…

Continue Reading KidneyGPS: a user-friendly web application to help prioritize kidney function genes and variants based on evidence from genome-wide association studies | BMC Bioinformatics

NM_005546.4(ITK):c.1741C>T (p.Arg581Trp) AND Autoinflammatory syndrome – ClinVar

NM_005546.4(ITK):c.1741C>T (p.Arg581Trp) AND Autoinflammatory syndrome Based on: 1 submission [Details] Record status: current Accession: RCV002263636.3 Allele description [Variation Report for NM_005546.4(ITK):c.1741C>T (p.Arg581Trp)] NM_005546.4(ITK):c.1741C>T (p.Arg581Trp) Gene: ITK:IL2 inducible T cell kinase [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 5q33.3 Genomic location: Preferred name: NM_005546.4(ITK):c.1741C>T (p.Arg581Trp) HGVS:…

Continue Reading NM_005546.4(ITK):c.1741C>T (p.Arg581Trp) AND Autoinflammatory syndrome – ClinVar

Comparative single-cell transcriptomic analysis of primate brains highlights human-specific regulatory evolution

Consensus MTG taxonomy across primates The BRAIN Initiative Cell Census Network26 generated high-resolution transcriptomic maps of the MTG in human, chimpanzee, gorilla, macaque and marmoset by applying single-nucleus transcriptomic (snRNA-seq) assays to samples isolated from between three and seven donor brains in each species (plate-based SMART-seq v4 (SSv4) for great…

Continue Reading Comparative single-cell transcriptomic analysis of primate brains highlights human-specific regulatory evolution

NM_000379.4(XDH):c.1510A>G (p.Met504Val) AND Hereditary xanthinuria type 1 – ClinVar

NM_000379.4(XDH):c.1510A>G (p.Met504Val) AND Hereditary xanthinuria type 1 Based on: 1 submission [Details] Record status: current Accession: RCV002479628.1 Allele description [Variation Report for NM_000379.4(XDH):c.1510A>G (p.Met504Val)] NM_000379.4(XDH):c.1510A>G (p.Met504Val) Gene: XDH:xanthine dehydrogenase [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 2p23.1 Genomic location: Preferred name: NM_000379.4(XDH):c.1510A>G (p.Met504Val) HGVS: NC_000002.12:g.31375472T>C…

Continue Reading NM_000379.4(XDH):c.1510A>G (p.Met504Val) AND Hereditary xanthinuria type 1 – ClinVar

The non-classical major histocompatibility complex II protein SLA-DM is crucial for African swine fever virus replication

Cell lines and viruses Cell lines were received from the cell culture collection for veterinary medicine (CCVM) of the Friedrich-Loeffler-Institut (FLI). The highly passaged wild boar lung cell line (WSL-R-HP, #1346; abbreviated as WSL) was maintained in Ham’s F12 cell culture medium (Ham’s F-12, 5.32 g/L; IMDM, 8.80 g/L; NaHCO3, 2.45 g/L; pH 7.2)…

Continue Reading The non-classical major histocompatibility complex II protein SLA-DM is crucial for African swine fever virus replication

TCGA gene expression quantitation batch information

I add samples to cart at GDC data portal, then downloaded them. I merge them with the R code below. Hope this will help library(data.table) library(tidyverse) ##You can generate gdc_sample_sheet.tsv at GDC data portal index=read.table(“gdc_sample_sheet.tsv”,sep=”\t”,header=TRUE) index=index[order(index$Sample.ID),] ##read files setwd(“where_you_download_your_data”) expr_file=index$File.Name mat=do.call(cbind,lapply(as.character(expr_file),function(x){fread(x,header=T,sep=”\t”)[,c(4)]})) exp_mat=read.table(as.character(index$File.Name[1]),sep=”\t”,header=T) mat=data.frame(exp_mat$gene_id,exp_mat$gene_name,exp_mat$gene_type,mat) mat=mat[5:nrow(mat),] colnames(mat)=c(“ensembl_gene_id”,”hgnc_symbol”,”gene_biotype”,index$new_id) ensg_id=unlist(strsplit(as.character(mat$ensembl_gene_id),split=”[.]”)) ensg_id=ensg_id[grep(“ENSG*”,ensg_id)] mat$ensembl_gene_id=ensg_id write.table(mat,”TCGA.tsv”,row.names =…

Continue Reading TCGA gene expression quantitation batch information

A genome-wide association study of blood cell morphology identifies cellular proteins implicated in disease aetiology

INTERVAL study The INTERVAL study was a randomised trial of approximately 45,000 blood donors aged eighteen years or older who were recruited at 25 NHSBT (National Health Service Blood and Transplant) static donor centres across England21. The study was approved by the Cambridge East Research Ethics Committee and we complied…

Continue Reading A genome-wide association study of blood cell morphology identifies cellular proteins implicated in disease aetiology

Retrieving Chromosome Accession for Specific Gene Using Ensembl REST API

Hello! I have a list of gene IDs obtained from multiple sources via the OMA API. My goal is to download the gene sequences along with their genomic context. To achieve this, I need to acquire the accession numbers and coordinates of the assemblies in which they are annotated. I…

Continue Reading Retrieving Chromosome Accession for Specific Gene Using Ensembl REST API

Unsupervised clustering on gene expression data

Clustering is a data mining method to identify unknown possible groups of items solely based on intrinsic features and no external variables. Basically, clustering includes four steps: 1) Data preparation and Feature selection, 2) Dissimilarity matrix calculation, 3) applying clustering algorithms, 4) Assessing cluster assignment I use an RNA-seq dataset…

Continue Reading Unsupervised clustering on gene expression data

converting Gene ID to GeneSymbols

converting Gene ID to GeneSymbols 0 Hello I am converting the geneids into Gene symbol using the map function. The conversion is going fine without any error but when I am checking online on ncbi website its showing a different symbol for that geneid . the following is the code…

Continue Reading converting Gene ID to GeneSymbols

SERPINA2 Vertebrate HGNC

Vertebrate Homology Alliance Homology Information Alliance Homology Information MGI loads orthology data based on the ‘stringent’ set from the Alliance of Genome Resources. The Alliance sets are based on a scoring system developed by the Alliance in collaboration with DIOPT. MGI includes orthology for the following vertebrate species from the…

Continue Reading SERPINA2 Vertebrate HGNC

jannovar download problem

jannovar download problem 0 I am trying to convert some HGVS to chrom:pos:ref:alt format. I was thinking to use jannovar. As per the documentation I run: jannovar download -d hg19/refseq which gives me this: Options JannovarDownloadOptions [downloadDir=data, getDataSourceFiles()=[bundle:///default_sources.ini], isReportProgress()=true, getHttpProxy()=null, getHttpsProxy()=null, getFtpProxy()=null, geneIdentifiers=[], outputFile=] Downloading/parsing for data source “hg19/refseq” INFO…

Continue Reading jannovar download problem

Retrieve only protein coding esnsemble gene ids and gene symbols

Retrieve only protein coding esnsemble gene ids and gene symbols 1 I tried without success different ways to retrieve the current list of ensemble gene ids including the gene symbol for only protein coding genes by using the R library Biomart. Here is the code: library(biomaRt) ensembl = useMart(biomart=”ensembl”, dataset=”hsapiens_gene_ensembl”)…

Continue Reading Retrieve only protein coding esnsemble gene ids and gene symbols

reference annotation for the human and mouse genomes in 2023

D942–D949 Nucleic Acids Research, 2023, Vol. 51, Database issue Published online 24 November 2022 doi.org/10.1093/nar/gkac1071 GENCODE: reference annotation for the human and mouse genomes in 2023 Adam Frankish 1,* , Sı́lvia Carbonell-Sala2 , Mark Diekhans 3 , Irwin Jungreis 4,5 , Jane E. Loveland 1 , Jonathan M. Mudge1 ,…

Continue Reading reference annotation for the human and mouse genomes in 2023

Spatially resolved multiomics of human cardiac niches

Research ethics for donor tissues All heart tissue samples were obtained from transplant donors after Research Ethics Committee approval and written informed consent from donor families as previously described2. The following ethics approvals for donors of additional heart tissue were obtained: D8 and A61 (REC reference 15/EE/0152, East of England…

Continue Reading Spatially resolved multiomics of human cardiac niches

Convert Human to Mouse Symbols

I’m trying to create a working function that takes a column of human gene symbols as input and outputs a vector of mouse gene symbols that is the same length. (I’m trying to use the function to replace the human genes in a dataframe with mouse genes) I have tried…

Continue Reading Convert Human to Mouse Symbols

NM_024496.4(IRF2BPL):c.1157C>T (p.Thr386Met) AND Neurodevelopmental disorder with regression, abnormal movements, loss of speech, and seizures – ClinVar

NM_024496.4(IRF2BPL):c.1157C>T (p.Thr386Met) AND Neurodevelopmental disorder with regression, abnormal movements, loss of speech, and seizures Based on: 1 submission [Details] Record status: current Accession: RCV002475523.1 Allele description [Variation Report for NM_024496.4(IRF2BPL):c.1157C>T (p.Thr386Met)] NM_024496.4(IRF2BPL):c.1157C>T (p.Thr386Met) Gene: IRF2BPL:interferon regulatory factor 2 binding protein like [Gene – OMIM – HGNC] Variant type: single nucleotide…

Continue Reading NM_024496.4(IRF2BPL):c.1157C>T (p.Thr386Met) AND Neurodevelopmental disorder with regression, abnormal movements, loss of speech, and seizures – ClinVar

Phenotype and organism model references for a large list of genes

Phenotype and organism model references for a large list of genes 1 Hi all. I have to generate table for about 3500 genes with following columns: 1) references to each gene; 2) pathological features 3) reference to the organism model, e.g. MGI number for mouse Are there any databases/softwares, which…

Continue Reading Phenotype and organism model references for a large list of genes

Splitting of VCF file of CSQ field in the INFO column to tabular format.

VCF file will be having seven fixed columns and INFO column. Chromosome, position, ID, ref, alt, qual, filter, and INFO column. This INFO column will be having the variant related information. In the INFO column CSQ field will be having multiple fields – 82 fields fixed with the delimeter “|”…

Continue Reading Splitting of VCF file of CSQ field in the INFO column to tabular format.

Comparing gene expression with copy number variation in TCGA

Hello, I want to compare (with a PCA) gene expression against copy number variation at gene level in a TCGA project.When I retrieve the gene expression every value is mapped by sample and gene. But for the copy number variation, I get only chromosomal locations.To do the PCA, I want…

Continue Reading Comparing gene expression with copy number variation in TCGA

Seurat scRNA convert Ensembl ID to gene symbol

Hi, I’m download some datasets from Geo Database (www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE155960) I found the names are in ENSEMBL nomenclature and I need to convert into Gene symbol in order to do the QC metrics in the Seurat pipeline. I’m using this code to convert the ENSEMBL to gene symbol: library(Seurat) library(patchwork) library…

Continue Reading Seurat scRNA convert Ensembl ID to gene symbol

COL15A1 Vertebrate HGNC

Vertebrate Homology Alliance Homology Information Alliance Homology Information MGI loads orthology data based on the ‘stringent’ set from the Alliance of Genome Resources. The Alliance sets are based on a scoring system developed by the Alliance in collaboration with DIOPT. MGI includes orthology for the following vertebrate species from the…

Continue Reading COL15A1 Vertebrate HGNC

Resurrecting the alternative splicing landscape of archaic hominins using machine learning

Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014). Article  PubMed  Google Scholar  Prüfer, K. et…

Continue Reading Resurrecting the alternative splicing landscape of archaic hominins using machine learning

STAGEs: A web-based tool that integrates data visualization and pathway enrichment analysis for gene expression studies

STAGEs is an interactive web app built using Streamlit (www.streamlit.io), and the running instance of the online app can be accessed via the website (kuanrongchan-stages-stages-vpgh46.streamlitapp.com/). The app can also run locally using the instructions detailed in GitHub (github.com/kuanrongchan/STAGES). Users can directly upload data from Excel spreadsheets, csv or txt files…

Continue Reading STAGEs: A web-based tool that integrates data visualization and pathway enrichment analysis for gene expression studies

REACTOME_ACTIVATED_POINT_MUTANTS_OF_FGFR2

Standard name REACTOME_ACTIVATED_POINT_MUTANTS_OF_FGFR2 Systematic name M647 Brief description Genes involved in Activated point mutants of FGFR2 Full description or abstract   Collection ARCHIVED: Archived Founder gene sets that are referenced by current Hallmarks      C2_CP: ARCHIVED Canonical Pathways            C2_CP:REACTOME: ARCHIVED Reactome Pathways Source publication   Exact source R-HSA-2033519 Related gene sets   External…

Continue Reading REACTOME_ACTIVATED_POINT_MUTANTS_OF_FGFR2

trouble getting gene names from biomaRt

I have an excel file, which contains columns chrom, pos, id, ref and alt. I want to add a new column, which will have the name of the genes for the corresponding rows. For that I am using getBM() function in biomaRt, but it takes too much time to finish….

Continue Reading trouble getting gene names from biomaRt

CALM2 Vertebrate HGNC

Vertebrate Homology Alliance Homology Information Alliance Homology Information MGI loads orthology data based on the ‘stringent’ set from the Alliance of Genome Resources. The Alliance sets are based on a scoring system developed by the Alliance in collaboration with DIOPT. MGI includes orthology for the following vertebrate species from the…

Continue Reading CALM2 Vertebrate HGNC

NM_018026.4(PACS1):c.1199+5G>A AND Schuurs-Hoeijmakers syndrome – ClinVar

NM_018026.4(PACS1):c.1199+5G>A AND Schuurs-Hoeijmakers syndrome Based on: 1 submission [Details] Record status: current Accession: RCV002632445.1 Allele description [Variation Report for NM_018026.4(PACS1):c.1199+5G>A] NM_018026.4(PACS1):c.1199+5G>A Gene: PACS1:phosphofurin acidic cluster sorting protein 1 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 11q13.2 Genomic location: Preferred name: NM_018026.4(PACS1):c.1199+5G>A HGVS: NC_000011.10:g.66220796G>A NG_033900.1:g.155444G>A NM_018026.4:c.1199+5G>AMANE…

Continue Reading NM_018026.4(PACS1):c.1199+5G>A AND Schuurs-Hoeijmakers syndrome – ClinVar

Human hg38 chr10:21,513,475-21,525,682 UCSC Genome Browser v446

     Seq2science ChIP-seq hub ChIP-seqhidedensesquishpackfull    Mapping and Sequencing Base Positionhidedensefull p14 updated Fix Patcheshidedensesquishpackfull p14 updated Alt Haplotypeshidedensesquishpackfull Assemblyhidedensesquishpackfull Centromereshidedensesquishpackfull Chromosome Bandhidedensesquishpackfull Clone Endshidedensesquishpackfull Exome Probesetshidedensesquishpackfull FISH Cloneshidedensesquishpackfull Gaphidedensesquishpackfull GC Percenthidedensefull GRC Contigshidedensefull GRC Incidenthidedensesquishpackfull Hg19 Diffhidedensesquishpackfull INSDChidedensesquishpackfull LiftOver & ReMaphidedensesquishpackfull LRG Regionshidedensesquishpackfull Mappabilityhideshow Problematic Regionshidedensesquishpackfull new Recomb…

Continue Reading Human hg38 chr10:21,513,475-21,525,682 UCSC Genome Browser v446

H1-4 Vertebrate HGNC

Vertebrate Homology Alliance Homology Information Alliance Homology Information MGI loads orthology data based on the ‘stringent’ set from the Alliance of Genome Resources. The Alliance sets are based on a scoring system developed by the Alliance in collaboration with DIOPT. MGI includes orthology for the following vertebrate species from the…

Continue Reading H1-4 Vertebrate HGNC

NM_000018.4(ACADVL):c.204G>A (p.Ala68=) AND Very long chain acyl-CoA dehydrogenase deficiency – ClinVar

NM_000018.4(ACADVL):c.204G>A (p.Ala68=) AND Very long chain acyl-CoA dehydrogenase deficiency Based on: 1 submission [Details] Record status: current Accession: RCV001989056.2 Allele description [Variation Report for NM_000018.4(ACADVL):c.204G>A (p.Ala68=)] NM_000018.4(ACADVL):c.204G>A (p.Ala68=) Gene: ACADVL:acyl-CoA dehydrogenase very long chain [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 17p13.1 Genomic location: Preferred…

Continue Reading NM_000018.4(ACADVL):c.204G>A (p.Ala68=) AND Very long chain acyl-CoA dehydrogenase deficiency – ClinVar

PUREE: accurate pan-cancer tumor purity estimation from gene expression data

Genomics-based consensus tumor purity estimates For TCGA samples, genomic-based consensus tumor purities were computed as a mean of predictions from ABSOLUTE17, AbsCNSeq18, ASCAT15, and PurBayes16 following the approach reported in Ghoshdastider et al. 41. AbsCNSeq and PurBayes estimates are based on mutation variant allele frequency data, and ASCAT and ABSOLUTE…

Continue Reading PUREE: accurate pan-cancer tumor purity estimation from gene expression data

NM_017654.4(SAMD9):c.3698C>T (p.Ser1233Leu) AND Inborn genetic diseases – ClinVar

NM_017654.4(SAMD9):c.3698C>T (p.Ser1233Leu) AND Inborn genetic diseases Based on: 1 submission [Details] Record status: current Accession: RCV002542603.1 Allele description [Variation Report for NM_017654.4(SAMD9):c.3698C>T (p.Ser1233Leu)] NM_017654.4(SAMD9):c.3698C>T (p.Ser1233Leu) Gene: SAMD9:sterile alpha motif domain containing 9 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 7q21.2 Genomic location: Preferred name: NM_017654.4(SAMD9):c.3698C>T…

Continue Reading NM_017654.4(SAMD9):c.3698C>T (p.Ser1233Leu) AND Inborn genetic diseases – ClinVar

Getting chromosome of unusual chromosome names e.g. ‘CHR_HSCHR8_8_CTG1’

Getting chromosome of unusual chromosome names e.g. ‘CHR_HSCHR8_8_CTG1’ 0 I made a biomaRt query: library(biomaRt) mart = useMart(‘ensembl’, dataset=”hsapiens_gene_ensembl”) genes = getBM(attributes = c(“chromosome_name”,”start_position”, “hgnc_symbol”, “uniprot_gn_symbol”, “uniprot_gn_id”), mart = mart, values = list(“protein_coding”,c(1:22))) Most of the chromosome_name values are regular numbers 1 to 22. However, some are unusual, such as…

Continue Reading Getting chromosome of unusual chromosome names e.g. ‘CHR_HSCHR8_8_CTG1’

Retrieve hgnc_symbol from XM_ refseqs using BiomaRt

Retrieve hgnc_symbol from XM_ refseqs using BiomaRt 1 I have some predicted RefSeqs, such as XM_005261067 or XM_005255527 and would like to retrieve the hgnc_symbol using BiomaRt like this: refseqids_XM = c(“XM_005261067″,”XM_005255527”) gene_XM <- getBM(attributes=c(“refseq_mrna_predicted”,”hgnc_symbol”), filters=”refseq_mrna_predicted”, values=refseqids_XM, mart=ensembl) gene_XM However, this gives me no result: [1] refseq_mrna_predicted refseq_mrna <0 Zeilen>…

Continue Reading Retrieve hgnc_symbol from XM_ refseqs using BiomaRt

TxDB.Hsapiens.UCSC.hg38.knownGene with locateVariants() identifying SNPs from various chromosome being part of the same gene

I am trying to annotate a list of SNPs using the hg38 genome (knownGene) and locateVariants(). The program is able to successfully run and provide “GeneIDs” for several of the loci. However, some GeneIDs are applied to SNPs in completely different regions and on completely different chromosomes. When I cross…

Continue Reading TxDB.Hsapiens.UCSC.hg38.knownGene with locateVariants() identifying SNPs from various chromosome being part of the same gene

EPCAM Vertebrate HGNC

Vertebrate Homology Alliance Homology Information Alliance Homology Information MGI loads orthology data based on the ‘stringent’ set from the Alliance of Genome Resources. The Alliance sets are based on a scoring system developed by the Alliance in collaboration with DIOPT. MGI includes orthology for the following vertebrate species from the…

Continue Reading EPCAM Vertebrate HGNC

How does CIBERSORTx deal with multiple HGNC symbol rows

Forum:How does CIBERSORTx deal with multiple HGNC symbol rows 0 After obtaining my counts matrix from salmon and assigning HGNC symbols to my ENGS IDs, I end up with multiple rows with duplicate HGNC symbols due to the nature of converting ENGS to HGNC as expected. However, I’m wondering how…

Continue Reading How does CIBERSORTx deal with multiple HGNC symbol rows

snpeff annotation query

snpeff annotation query 0 Hi, I am looking in-depth into some snpeff annotation of structural variation and was checking a 3 kb insertion event from a multisample VCF. As usual, snpeff gave multiple annotations, even though I only wanted to have the canonical transcript. I think I have a small…

Continue Reading snpeff annotation query

Battery gene sets for CAMERA limma

Battery gene sets for CAMERA limma 1 @e1fb1374 Last seen 8 hours ago Germany Hi everyone, I’m confused with the results of my CAMERA analysis. For building indexes, I used the battery of gene sets from MSigDb. I transformed the gmt files to list and built indexes. The initial count…

Continue Reading Battery gene sets for CAMERA limma

CHOPCHOP command line

CHOPCHOP command line 0 Hey everyone, is anyone using the CLI version of CHOPCHOP for sgRNA design? I am trying to get it to work, but I have some questions about it. I need a .gene_table file that I can extract from the UCSC. However, the tracks RefSeq Genes or…

Continue Reading CHOPCHOP command line

biomaRt crashes R studio

biomaRt crashes R studio 0 Hey everyone, I just wanted to execute a script that worked before. However, everytime I try to run it now RStudio gets unresponsive. I didn’t change anything. Does anyone else experience this? This is an extract from my script: library(biomaRt) … mart <- useMart(“ENSEMBL_MART_ENSEMBL”) mart…

Continue Reading biomaRt crashes R studio

NM_001184.4(ATR):c.4820G>A (p.Ser1607Asn) AND not specified – ClinVar

NM_001184.4(ATR):c.4820G>A (p.Ser1607Asn) AND not specified Based on: 1 submission [Details] Record status: current Accession: RCV001821037.3 Allele description [Variation Report for NM_001184.4(ATR):c.4820G>A (p.Ser1607Asn)] NM_001184.4(ATR):c.4820G>A (p.Ser1607Asn) Gene: ATR:ATR serine/threonine kinase [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 3q23 Genomic location: Preferred name: NM_001184.4(ATR):c.4820G>A (p.Ser1607Asn) HGVS: NC_000003.12:g.142512292C>T NG_008951.1:g.71535G>A…

Continue Reading NM_001184.4(ATR):c.4820G>A (p.Ser1607Asn) AND not specified – ClinVar

error in duplicate identification

error in duplicate identification 0 # duplicated genes and number of duplicates duplicated_genes <- names(table(df$hgnc_symbol)[table(df$hgnc_symbol) > 1]) gene_counts <- table(df$hgnc_symbol)[duplicated_genes] #zero expression of each gene zero_counts <- sapply(unique(duplicated_genes), function(gene) { sum(rowSums(df[df$hgnc_symbol == gene, -ncol(df)]) == 0) }) This is the code I’m running. I want to identify duplicate gene from…

Continue Reading error in duplicate identification

errod in duplicate identification

errod in duplicate identification 0 duplicated_genes <- names(table(df$hgnc_symbol)[table(df$hgnc_symbol) > 1]) gene_counts <- table(df$hgnc_symbol)[duplicated_genes] zero_counts <- sapply(unique(duplicated_genes), function(gene) { sum(rowSums(df[df$hgnc_symbol == gene, -ncol(df)]) == 0) }) this is the code i m running i want to identify duplicate gene from my data frame, and their frequency and in third column i…

Continue Reading errod in duplicate identification

Extract Interactions using paxtoolsr from Pathway Commons

Extract Interactions using paxtoolsr from Pathway Commons 1 I am looking for interactions between a protein (ACTB) and calcium and I want to use the paxtoolsr package. Is this possible? What code is necessary? r pathway • 50 views • link updated 1 hour ago by cannin &utrif; 350 •…

Continue Reading Extract Interactions using paxtoolsr from Pathway Commons

Unknown naming convention for genes

Unknown naming convention for genes 1 Hello, I have a list of genes beginning with prefix ACXXXXXX (for example: AC000032.1, AC000035.), ALXXXXXX (ex. AL008582.1, AL008628.1), and APXXXXXX (ex. AP000223.1, AP000238.1), but I am unfamiliar with their naming convention. I cannot find any information about these genes from HGNC symbol checker…

Continue Reading Unknown naming convention for genes

Which AF column do I use from TCGA data in maftools

Which AF column do I use from TCGA data in maftools 0 Currently trying to read a maf file from the TCGA and need to change the headers in order to run it but not sure what column in the TCGA file corresponds to the ‘i_TumorVAF_WU’ header I need. Below…

Continue Reading Which AF column do I use from TCGA data in maftools

org.Hs.eg.db gives more than one ENTREZID for a gene symbol

There is no way to specify the source of gene symbols for an OrgDb. For TEC, one comes from HGNC, and the other comes from OMIM. When we generate the OrgDb packages, we don’t distinguish between sources, as they are all (as far as NCBI is concerned) ‘real’ gene symbols….

Continue Reading org.Hs.eg.db gives more than one ENTREZID for a gene symbol

Human hg38 chr19:11,216,461-11,670,150 UCSC Genome Browser v442

     Custom Tracks 1806hidedensefull 468hidedensefull BT20hidedensefull BT474hidedensefull MCF7hidedensefull T47Dhidedensefull    Mapping and Sequencing Base Positionhidedensefull p13 Fix Patcheshidedensesquishpackfull p13 Alt Haplotypeshidedensesquishpackfull Assemblyhidedensesquishpackfull Centromereshidedensesquishpackfull Chromosome Bandhidedensesquishpackfull Clone Endshidedensesquishpackfull Exome Probesetshidedensesquishpackfull FISH Cloneshidedensesquishpackfull Gaphidedensesquishpackfull GC Percenthidedensefull GRC Contigshidedensefull GRC Incidenthidedensesquishpackfull Hg19 Diffhidedensesquishpackfull INSDChidedensesquishpackfull LiftOver & ReMaphidedensesquishpackfull LRG Regionshidedensesquishpackfull Mappabilityhideshow RefSeq Acchidedensesquishpackfull…

Continue Reading Human hg38 chr19:11,216,461-11,670,150 UCSC Genome Browser v442

NM_004429.5(EFNB1):c.128+5G>A AND Craniofrontonasal syndrome – ClinVar

NM_004429.5(EFNB1):c.128+5G>A AND Craniofrontonasal syndrome Based on: 1 submission [Details] Record status: current Accession: RCV001263203.1 Allele description [Variation Report for NM_004429.5(EFNB1):c.128+5G>A] NM_004429.5(EFNB1):c.128+5G>A Gene: EFNB1:ephrin B1 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: Xq13.1 Genomic location: Preferred name: NM_004429.5(EFNB1):c.128+5G>A HGVS: NC_000023.11:g.68829909G>A NG_008887.1:g.5913G>A NM_004429.5:c.128+5G>AMANE SELECT NC_000023.10:g.68049752G>A NM_004429.4:c.128+5G>A This…

Continue Reading NM_004429.5(EFNB1):c.128+5G>A AND Craniofrontonasal syndrome – ClinVar

NM_130797.4(DPP6):c.358+30C>T AND not provided – ClinVar

NM_130797.4(DPP6):c.358+30C>T AND not provided Based on: 1 submission [Details] Record status: current Accession: RCV001707448.1 Allele description [Variation Report for NM_130797.4(DPP6):c.358+30C>T] NM_130797.4(DPP6):c.358+30C>T Gene: DPP6:dipeptidyl peptidase like 6 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 7q36.2 Genomic location: Preferred name: NM_130797.4(DPP6):c.358+30C>T HGVS: NC_000007.14:g.154446358C>T NG_033878.2:g.703373C>T NM_001039350.3:c.166+30C>T NM_001290252.2:c.172+30C>T NM_001290253.2:c.358+30C>T…

Continue Reading NM_130797.4(DPP6):c.358+30C>T AND not provided – ClinVar

Availability of information on genes in Gnomad VCF data

Availability of information on genes in Gnomad VCF data 1 Hi , Im new to gnomad and genetics in general and i was wondering does the gnomad genome data that is downlaoded in the vcf format on variants contains information of what is the nearest gene and is the genomic…

Continue Reading Availability of information on genes in Gnomad VCF data

NM_005445.4(SMC3):c.181C>T (p.Arg61Trp) AND Cornelia de Lange syndrome 3 – ClinVar

NM_005445.4(SMC3):c.181C>T (p.Arg61Trp) AND Cornelia de Lange syndrome 3 Based on: 1 submission [Details] Record status: current Accession: RCV000760292.1 Allele description [Variation Report for NM_005445.4(SMC3):c.181C>T (p.Arg61Trp)] NM_005445.4(SMC3):c.181C>T (p.Arg61Trp) Gene: SMC3:structural maintenance of chromosomes 3 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 10q25.2 Genomic location: Preferred name:…

Continue Reading NM_005445.4(SMC3):c.181C>T (p.Arg61Trp) AND Cornelia de Lange syndrome 3 – ClinVar

NM_005359.6(SMAD4):c.1630C>A (p.Pro544Thr) AND Hereditary cancer-predisposing syndrome – ClinVar

NM_005359.6(SMAD4):c.1630C>A (p.Pro544Thr) AND Hereditary cancer-predisposing syndrome Based on: 1 submission [Details] Record status: current Accession: RCV001012497.1 Allele description [Variation Report for NM_005359.6(SMAD4):c.1630C>A (p.Pro544Thr)] NM_005359.6(SMAD4):c.1630C>A (p.Pro544Thr) Gene: SMAD4:SMAD family member 4 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 18q21.2 Genomic location: Preferred name: NM_005359.6(SMAD4):c.1630C>A (p.Pro544Thr) HGVS:…

Continue Reading NM_005359.6(SMAD4):c.1630C>A (p.Pro544Thr) AND Hereditary cancer-predisposing syndrome – ClinVar

All transcript variants in gene MAD1L1 – BIPMed SNP Array

Legend Please note that a short description of a certain column can be displayed when you move your mouse cursor over the column’s header and hold it still. Below, a more detailed description is shown per column. Effect: The variant’s effect on the protein’s function, in the format ‘R/C’ where…

Continue Reading All transcript variants in gene MAD1L1 – BIPMed SNP Array

Gene symbol report | HUGO Gene Nomenclature Committee

Toggle navigation Menu Gene data Gene symbol reports Gene group reports Tools BioMart HCOP Multi-symbol checker Search Downloads BioMart Complete set archive Custom downloads REST service Statistics and download files GitHub (code) VGNC VGNC homepage About the VGNC Contact us Contact details Feedback form More About About the HGNC Guidelines…

Continue Reading Gene symbol report | HUGO Gene Nomenclature Committee

Uncharted genetic territory offers insight into human-specific proteins

When researchers working on the Human Genome Project completely mapped the genetic blueprint of humans in 2001, they were surprised to find only around 20,000 genes that produce proteins. Could it be that humans have only about twice as many genes as a common fly? Scientists had expected considerably more….

Continue Reading Uncharted genetic territory offers insight into human-specific proteins

Efficient way of mapping UniProt IDs to representative UniRef90 IDs?

You can do this directly on UniProt: www.uniprot.org/uploadlists/ Just paste or upload your list of UniProt IDs, and select “UniProtKB AC/ID” in the “From” field and “UniParc” in the “To” field I’ve also written a script, pasted below, that can do this with some useful options: $ uniprot_map.pl -h uniprot_map.pl…

Continue Reading Efficient way of mapping UniProt IDs to representative UniRef90 IDs?

gene ID RNAseq

gene ID RNAseq 0 Hi friends How can I get gene numeric ID and hugo ID by R script? what script should I use? I have this but does not give numeric ID and hugo ID. ibrary(biomaRt) library(dplyr) library(tibble) attributeNames <-c(“ensembl_gene_id”,”external_gene_name”,”HGNC_ID”, “chromosome_name”,”description”) filterValues <- rownames(res) Annotations <- getBM(attributes=attributeNames, filters =…

Continue Reading gene ID RNAseq

Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

I’ve annotated some variants using VEP, and was looking at the minor allele frequencies. Some of the variants had very different MAFs in the annotation than I expected (I expected MAF < 1%, whereas some annotated MAFs were >50%). I looked up the same variants on the gnomAD v3 browser,…

Continue Reading Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

How to convert transcript-relative coordinates to genomic coordinates?

How to convert transcript-relative coordinates to genomic coordinates? 0 I have queried using Entrez Utilities (efetch: www.ncbi.nlm.nih.gov/books/NBK25499/) and obtained annotations for transcripts like the following: >Feature ref|NM_152486.3| 1 2557 gene gene SAMD11 gene_syn MRS gene_desc sterile alpha motif domain containing 11 db_xref GeneID:148398 db_xref HGNC:HGNC:28706 db_xref MIM:616765 How/what database should…

Continue Reading How to convert transcript-relative coordinates to genomic coordinates?

FREM1 Vertebrate HGNC

Vertebrate Homology Alliance Homology Information Alliance Homology Information MGI loads orthology data based on the ‘stringent’ set from the Alliance of Genome Resources. The Alliance sets are based on a scoring system developed by the Alliance in collaboration with DIOPT. MGI includes orthology for the following vertebrate species from the…

Continue Reading FREM1 Vertebrate HGNC

KINNEY_DNMT1_METHYLATION_TARGETS

Standard name KINNEY_DNMT1_METHYLATION_TARGETS Systematic name M2508 Brief description Hypomethylated genes in prostate tissue from mice carrying hypomorphic alleles of DNMT1 [GeneID=1786]. Full description or abstract Previous studies have shown that tumor progression in the transgenic adenocarcinoma of mouse prostate (TRAMP) model is characterized by global DNA hypomethylation initiated during early-stage…

Continue Reading KINNEY_DNMT1_METHYLATION_TARGETS

Help needed for Ensembl Gene ID conversion for RNA-seq data

Hello All, I am new to the RNA-seq world and especially new to the bioinformatics side. We recently completed a RNA-seq experiment (total RNAs) on human samples and we used illumina’s Dragen RNA pipeline which generated salmon gene count (.sf) output files. In the files, the gene ID is in…

Continue Reading Help needed for Ensembl Gene ID conversion for RNA-seq data

Cellosaurus cell line HEK293T-CAF40-null (CVCL_A5EE)

Cell line name HEK293T-CAF40-null Synonyms CAF40-null HEK293T Accession CVCL_A5EE Resource Identification Initiative To cite this cell line use: HEK293T-CAF40-null (RRID:CVCL_A5EE) Comments Doubling time: ~30-40 hours (DSMZ).Knockout cell: Method=CRISPR/Cas9; HGNC; 10445; CNOT9.Transfected with: UniProtKB; P00552; Transposon Tn5 neo.Transformant: NCBI_TaxID; 28285; Adenovirus 5.Transformant: NCBI_TaxID; 1891767; Simian virus 40 (SV40) [tsA].Derived from sampling…

Continue Reading Cellosaurus cell line HEK293T-CAF40-null (CVCL_A5EE)

Change separator just between specific columns

I am trying to change the separator just between columns 1 and 9. After that, I would like to maintain the original separator. Those are first lines of my file both when directly reading it and when od -c file is executed: #description: evidence-based annotation of the human genome (GRCh38),…

Continue Reading Change separator just between specific columns

Gene Id Conversion Tool

MyGene.info is a web service that provides up to date annotations in several fields and is great for gene ID conversion. All species from NCBI and Ensembl are supported and annotations are updated weekly to ensure the latest annotations are available. Both python and R/Bioconductor clients are easy to use….

Continue Reading Gene Id Conversion Tool

NM_000018.4(ACADVL):c.879-8T>A AND Very long chain acyl-CoA dehydrogenase deficiency – ClinVar

NM_000018.4(ACADVL):c.879-8T>A AND Very long chain acyl-CoA dehydrogenase deficiency Based on: 1 submission [Details] Record status: current Accession: RCV001200781.1 Allele description [Variation Report for NM_000018.4(ACADVL):c.879-8T>A] NM_000018.4(ACADVL):c.879-8T>A Gene: ACADVL:acyl-CoA dehydrogenase very long chain [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 17p13.1 Genomic location: Preferred name: NM_000018.4(ACADVL):c.879-8T>A HGVS:…

Continue Reading NM_000018.4(ACADVL):c.879-8T>A AND Very long chain acyl-CoA dehydrogenase deficiency – ClinVar

NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) AND not specified – ClinVar

NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) AND not specified Based on: 1 submission [Details] Record status: current Accession: RCV000780718.2 Allele description [Variation Report for NM_005359.6(SMAD4):c.1473T>C (p.Gly491=)] NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) Gene: SMAD4:SMAD family member 4 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 18q21.2 Genomic location: Preferred name: NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) HGVS: NC_000018.10:g.51078281T>C…

Continue Reading NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) AND not specified – ClinVar

feutureCount in the subread

feutureCount in the subread 0 Hello Everyone, I am quantifying read counts in the bam using feutureCount in the command line but getting errors below ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is ‘gene_id’ An…

Continue Reading feutureCount in the subread

NM_005359.6(SMAD4):c.-20A>C AND not specified – ClinVar

NM_005359.6(SMAD4):c.-20A>C AND not specified Based on: 1 submission [Details] Record status: current Accession: RCV000444837.1 Allele description [Variation Report for NM_005359.6(SMAD4):c.-20A>C] NM_005359.6(SMAD4):c.-20A>C Gene: SMAD4:SMAD family member 4 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 18q21.2 Genomic location: Preferred name: NM_005359.6(SMAD4):c.-20A>C HGVS: NC_000018.10:g.51047027A>C NG_013013.2:g.83988A>C NM_005359.6:c.-20A>CMANE SELECT LRG_318t1:c.-20A>C…

Continue Reading NM_005359.6(SMAD4):c.-20A>C AND not specified – ClinVar

Using VEP to get gnomAD frequencies

Hi all, I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases. For example, when I pass in: 10 69408929 COSM3751912 A…

Continue Reading Using VEP to get gnomAD frequencies

When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

Hi all, I have raw counts of samples in a dataframe. The row names is Ensembl ID and I want to convert them to a gene symbol. So I’ve run the code below. query <- GDCquery(project = “TCGA-COAD” , data.category = “Transcriptome Profiling” , data.type = “Gene Expression Quantification”, workflow.type…

Continue Reading When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

GENCODE – Ribo-seq ORFs

In recent years, Ribosome Profiling (Ribo-seq) has been used to detect thousands of non-canonical – i.e. unannotated – translated open reading frames (ORFs) in the human genome. GENCODE have now embarked on a long-term community-driven project to incorporate these features into reference gene annotation. This pioneering work is being done…

Continue Reading GENCODE – Ribo-seq ORFs

NM_000138.5(FBN1):c.1215C>A (p.Pro405=) AND Familial thoracic aortic aneurysm and aortic dissection – ClinVar

NM_000138.5(FBN1):c.1215C>A (p.Pro405=) AND Familial thoracic aortic aneurysm and aortic dissection Based on: 1 submission [Details] Record status: current Accession: RCV001192329.1 Allele description [Variation Report for NM_000138.5(FBN1):c.1215C>A (p.Pro405=)] NM_000138.5(FBN1):c.1215C>A (p.Pro405=) Gene: FBN1:fibrillin 1 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 15q21.1 Genomic location: Preferred name: NM_000138.5(FBN1):c.1215C>A…

Continue Reading NM_000138.5(FBN1):c.1215C>A (p.Pro405=) AND Familial thoracic aortic aneurysm and aortic dissection – ClinVar

Gene expression (RNA-seq) clustering

Unsupervised class discovery is a data mining method to identify unknown possible groups (clusters) of items solely based on intrinsic features and no external variables. Basically clustering includes four steps: 1 Data preparation and Feature selection, 2 Dissimilarity matrix calculation, 3 applying clustering algorithms, 4 Assessing cluster assignment I use…

Continue Reading Gene expression (RNA-seq) clustering

Converting Ensembl gene id to Gene symbol

Converting Ensembl gene id to Gene symbol 0 Hi all, As mentioned earlier in this post, I tried to convert the Ensembl gene ids to the Gene symbol. I didn’t receive any error by the code below but the nrow of ens_to_symbol_biomart is 55605 and the length of ens is…

Continue Reading Converting Ensembl gene id to Gene symbol

“Given ref” field is empty when a ref. allele was in VCF input

VEP: “Given ref” field is empty when a ref. allele was in VCF input 0 Hi there, I’m running VEP using the following command: ref=”GRCh38.primary_assembly.genome.fa” vep=”/opt/vep_ensembl/ensembl-vep/vep” for ea in *Somatic.hc.vcf do $vep -i $ea -o vep/”$(echo $ea | sed s/.vcf//)”_VEP.txt –cache –dir_cache “/home/shared/vep_cache/” –assembly GRCh38 –merged –fasta $ref –hgvs –hgvsg…

Continue Reading “Given ref” field is empty when a ref. allele was in VCF input

NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) AND not provided – ClinVar

NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) AND not provided Based on: 1 submission [Details] Record status: current Accession: RCV000597527.1 Allele description [Variation Report for NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val)] NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) Gene: SYNE1:spectrin repeat containing nuclear envelope protein 1 [Gene – OMIM – HGNC] Variant type: Indel Cytogenetic location: 6q25.2 Genomic location: Preferred name: NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) HGVS:…

Continue Reading NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) AND not provided – ClinVar

Sort a sub column within a column while keeping the feature (LINUX)

I have a vcf file with these column headers: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BS_25YES2E3 BS_G5B6AD28 BS_QCGPE1ZX A sample feature within that vcf file chr1 10450 . T C 27.94 VQSRTrancheSNP99.90to100.00+ AC=1;AF=0.167;AN=6;BaseQRankSum=-1.676e+00;ClippingRankSum=0.789;DP=102;ExcessHet=4.7712;FS=4.868;MLEAC=1;MLEAF=0.167;MQ=34.67;MQRankSum=-1.084e+00;PG=0,0,0;QD=1.55;ReadPosRankSum=-2.169e+00;SOR=0.707;VQSLOD=-1.050e+01;culprit=MQ;ANN=C|upstream_gene_variant|MODIFIER|**DDX11L1**|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||1560|1||SNV|HGNC|HGNC:37102||||chr1:g.10450T>C,C|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||1419|1||SNV|HGNC|HGNC:37102|YES|||chr1:g.10450T>C,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene|||||||||||3954|-1||SNV|HGNC|HGNC:38034|YES|||chr1:g.10450T>C GT:AD:DP:FT:GQ:JL:JP:PL:PP 0/0:28,0:28:lowGQ:0:1:1:0,0,663:0,0,666 0/1:13,5:18:PASS:35:1:1:34,0,342:35,0,345 0/0:44,0:44:lowGQ:0:1:1:0,0,802:0,0,805 The portion in bold is what I want (DDX11L1). I…

Continue Reading Sort a sub column within a column while keeping the feature (LINUX)

Genes associated with GO term

Genes associated with GO term 1 I’ve read in an older thread that to retrieve all of the gene names associated with a GO id you use the biomaRt package, eg: library(biomaRt) ensembl = useMart(“ensembl”,dataset=”hsapiens_gene_ensembl”) gene.data <- getBM(attributes=c(‘hgnc_symbol’, ‘ensembl_transcript_id’, ‘go_id’), filters=”go_id”, values=”GO:0072599″, mart = ensembl) However, I’m not sure this…

Continue Reading Genes associated with GO term

Extract variant consequence count from gnomad and patient VCF file

Hello, I have 2 types of VEP annotated VCF file – regular vcf and gnomad genome file. I would like to extract counts of both missense, synonymous, upstream and intron variants for each gene in each file. Output should be something similar to this: MHTFR: missense 23, intron 100, synonymous…

Continue Reading Extract variant consequence count from gnomad and patient VCF file

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Statistics and reproducibility For multiplex fluorescent in situ hybridization (FISH) and immunofluorescence staining experiments, each ISH probe combination was repeated with similar results on at least two separate individuals per species, and on at least two sections per individual. The experiments were not randomized and the investigators were not blinded…

Continue Reading Comparative cellular analysis of motor cortex in human, marmoset and mouse

Reading microarray data from the Gene Expression Omnibus

Hi Caitlin, For this study, the uploaded data is normalised via GC-RMA, but is not log [base 2] (log2) transformed. To retrieve it, you need to do: library(GEOquery) gset <- getGEO(‘GSE12657’, GSEMatrix = TRUE, getGPL= FALSE) if (length(gset) > 1) idx <- grep(‘GPL8300’, attr(gset, ‘names’)) else idx <- 1 gset…

Continue Reading Reading microarray data from the Gene Expression Omnibus

How Find Genes On Specific Positions

I have the same problem and I am trying to solve this all in R by doing this: Use BiomaRt to get positions of all genes: genes<-getBM(c(“hgnc_symbol”,”ensembl_gene_id”,”chromosome_name”,”start_position”,”end_position”), mart=mart) Use genomicRanges to find the overlap between my dataset called “probes” and the output of BiomaRt. I still have not figured out…

Continue Reading How Find Genes On Specific Positions

Ttc30a affects tubulin modifications in a model for ciliary chondrodysplasia with polycystic kidney disease

Significance Cilia are tubulin-based cellular appendages, and their dysfunction has been linked to a variety of genetic diseases. Ciliary chondrodysplasia is one such condition that can co-occur with cystic kidney disease and other organ manifestations. We modeled skeletal ciliopathies by mutating two established disease genes in Xenopus tropicalis frogs. Bioinformatic…

Continue Reading Ttc30a affects tubulin modifications in a model for ciliary chondrodysplasia with polycystic kidney disease

Fetch HGNC id from gene symbol in python

Fetch HGNC id from gene symbol in python 0 Hi, I am having thousands of gene symbols and I want to get their HGNC id’s. I tried downloading HGNC rdf file from bioportal and used sparql to fetch most of the gene id’s. However, the HGNC rdf from bioportal was…

Continue Reading Fetch HGNC id from gene symbol in python