Tag: HGNC

Efficient way of mapping UniProt IDs to representative UniRef90 IDs?

You can do this directly on UniProt: www.uniprot.org/uploadlists/ Just paste or upload your list of UniProt IDs, and select “UniProtKB AC/ID” in the “From” field and “UniParc” in the “To” field I’ve also written a script, pasted below, that can do this with some useful options: $ uniprot_map.pl -h uniprot_map.pl…

Continue Reading Efficient way of mapping UniProt IDs to representative UniRef90 IDs?

gene ID RNAseq

gene ID RNAseq 0 Hi friends How can I get gene numeric ID and hugo ID by R script? what script should I use? I have this but does not give numeric ID and hugo ID. ibrary(biomaRt) library(dplyr) library(tibble) attributeNames <-c(“ensembl_gene_id”,”external_gene_name”,”HGNC_ID”, “chromosome_name”,”description”) filterValues <- rownames(res) Annotations <- getBM(attributes=attributeNames, filters =…

Continue Reading gene ID RNAseq

Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

I’ve annotated some variants using VEP, and was looking at the minor allele frequencies. Some of the variants had very different MAFs in the annotation than I expected (I expected MAF < 1%, whereas some annotated MAFs were >50%). I looked up the same variants on the gnomAD v3 browser,…

Continue Reading Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

How to convert transcript-relative coordinates to genomic coordinates?

How to convert transcript-relative coordinates to genomic coordinates? 0 I have queried using Entrez Utilities (efetch: www.ncbi.nlm.nih.gov/books/NBK25499/) and obtained annotations for transcripts like the following: >Feature ref|NM_152486.3| 1 2557 gene gene SAMD11 gene_syn MRS gene_desc sterile alpha motif domain containing 11 db_xref GeneID:148398 db_xref HGNC:HGNC:28706 db_xref MIM:616765 How/what database should…

Continue Reading How to convert transcript-relative coordinates to genomic coordinates?

FREM1 Vertebrate HGNC

Vertebrate Homology Alliance Homology Information Alliance Homology Information MGI loads orthology data based on the ‘stringent’ set from the Alliance of Genome Resources. The Alliance sets are based on a scoring system developed by the Alliance in collaboration with DIOPT. MGI includes orthology for the following vertebrate species from the…

Continue Reading FREM1 Vertebrate HGNC

KINNEY_DNMT1_METHYLATION_TARGETS

Standard name KINNEY_DNMT1_METHYLATION_TARGETS Systematic name M2508 Brief description Hypomethylated genes in prostate tissue from mice carrying hypomorphic alleles of DNMT1 [GeneID=1786]. Full description or abstract Previous studies have shown that tumor progression in the transgenic adenocarcinoma of mouse prostate (TRAMP) model is characterized by global DNA hypomethylation initiated during early-stage…

Continue Reading KINNEY_DNMT1_METHYLATION_TARGETS

Help needed for Ensembl Gene ID conversion for RNA-seq data

Hello All, I am new to the RNA-seq world and especially new to the bioinformatics side. We recently completed a RNA-seq experiment (total RNAs) on human samples and we used illumina’s Dragen RNA pipeline which generated salmon gene count (.sf) output files. In the files, the gene ID is in…

Continue Reading Help needed for Ensembl Gene ID conversion for RNA-seq data

Cellosaurus cell line HEK293T-CAF40-null (CVCL_A5EE)

Cell line name HEK293T-CAF40-null Synonyms CAF40-null HEK293T Accession CVCL_A5EE Resource Identification Initiative To cite this cell line use: HEK293T-CAF40-null (RRID:CVCL_A5EE) Comments Doubling time: ~30-40 hours (DSMZ).Knockout cell: Method=CRISPR/Cas9; HGNC; 10445; CNOT9.Transfected with: UniProtKB; P00552; Transposon Tn5 neo.Transformant: NCBI_TaxID; 28285; Adenovirus 5.Transformant: NCBI_TaxID; 1891767; Simian virus 40 (SV40) [tsA].Derived from sampling…

Continue Reading Cellosaurus cell line HEK293T-CAF40-null (CVCL_A5EE)

Change separator just between specific columns

I am trying to change the separator just between columns 1 and 9. After that, I would like to maintain the original separator. Those are first lines of my file both when directly reading it and when od -c file is executed: #description: evidence-based annotation of the human genome (GRCh38),…

Continue Reading Change separator just between specific columns

Gene Id Conversion Tool

MyGene.info is a web service that provides up to date annotations in several fields and is great for gene ID conversion. All species from NCBI and Ensembl are supported and annotations are updated weekly to ensure the latest annotations are available. Both python and R/Bioconductor clients are easy to use….

Continue Reading Gene Id Conversion Tool

NM_000018.4(ACADVL):c.879-8T>A AND Very long chain acyl-CoA dehydrogenase deficiency – ClinVar

NM_000018.4(ACADVL):c.879-8T>A AND Very long chain acyl-CoA dehydrogenase deficiency Based on: 1 submission [Details] Record status: current Accession: RCV001200781.1 Allele description [Variation Report for NM_000018.4(ACADVL):c.879-8T>A] NM_000018.4(ACADVL):c.879-8T>A Gene: ACADVL:acyl-CoA dehydrogenase very long chain [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 17p13.1 Genomic location: Preferred name: NM_000018.4(ACADVL):c.879-8T>A HGVS:…

Continue Reading NM_000018.4(ACADVL):c.879-8T>A AND Very long chain acyl-CoA dehydrogenase deficiency – ClinVar

NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) AND not specified – ClinVar

NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) AND not specified Based on: 1 submission [Details] Record status: current Accession: RCV000780718.2 Allele description [Variation Report for NM_005359.6(SMAD4):c.1473T>C (p.Gly491=)] NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) Gene: SMAD4:SMAD family member 4 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 18q21.2 Genomic location: Preferred name: NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) HGVS: NC_000018.10:g.51078281T>C…

Continue Reading NM_005359.6(SMAD4):c.1473T>C (p.Gly491=) AND not specified – ClinVar

feutureCount in the subread

feutureCount in the subread 0 Hello Everyone, I am quantifying read counts in the bam using feutureCount in the command line but getting errors below ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is ‘gene_id’ An…

Continue Reading feutureCount in the subread

NM_005359.6(SMAD4):c.-20A>C AND not specified – ClinVar

NM_005359.6(SMAD4):c.-20A>C AND not specified Based on: 1 submission [Details] Record status: current Accession: RCV000444837.1 Allele description [Variation Report for NM_005359.6(SMAD4):c.-20A>C] NM_005359.6(SMAD4):c.-20A>C Gene: SMAD4:SMAD family member 4 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 18q21.2 Genomic location: Preferred name: NM_005359.6(SMAD4):c.-20A>C HGVS: NC_000018.10:g.51047027A>C NG_013013.2:g.83988A>C NM_005359.6:c.-20A>CMANE SELECT LRG_318t1:c.-20A>C…

Continue Reading NM_005359.6(SMAD4):c.-20A>C AND not specified – ClinVar

Using VEP to get gnomAD frequencies

Hi all, I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases. For example, when I pass in: 10 69408929 COSM3751912 A…

Continue Reading Using VEP to get gnomAD frequencies

When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

Hi all, I have raw counts of samples in a dataframe. The row names is Ensembl ID and I want to convert them to a gene symbol. So I’ve run the code below. query <- GDCquery(project = “TCGA-COAD” , data.category = “Transcriptome Profiling” , data.type = “Gene Expression Quantification”, workflow.type…

Continue Reading When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

GENCODE – Ribo-seq ORFs

In recent years, Ribosome Profiling (Ribo-seq) has been used to detect thousands of non-canonical – i.e. unannotated – translated open reading frames (ORFs) in the human genome. GENCODE have now embarked on a long-term community-driven project to incorporate these features into reference gene annotation. This pioneering work is being done…

Continue Reading GENCODE – Ribo-seq ORFs

NM_000138.5(FBN1):c.1215C>A (p.Pro405=) AND Familial thoracic aortic aneurysm and aortic dissection – ClinVar

NM_000138.5(FBN1):c.1215C>A (p.Pro405=) AND Familial thoracic aortic aneurysm and aortic dissection Based on: 1 submission [Details] Record status: current Accession: RCV001192329.1 Allele description [Variation Report for NM_000138.5(FBN1):c.1215C>A (p.Pro405=)] NM_000138.5(FBN1):c.1215C>A (p.Pro405=) Gene: FBN1:fibrillin 1 [Gene – OMIM – HGNC] Variant type: single nucleotide variant Cytogenetic location: 15q21.1 Genomic location: Preferred name: NM_000138.5(FBN1):c.1215C>A…

Continue Reading NM_000138.5(FBN1):c.1215C>A (p.Pro405=) AND Familial thoracic aortic aneurysm and aortic dissection – ClinVar

Gene expression (RNA-seq) clustering

Unsupervised class discovery is a data mining method to identify unknown possible groups (clusters) of items solely based on intrinsic features and no external variables. Basically clustering includes four steps: 1 Data preparation and Feature selection, 2 Dissimilarity matrix calculation, 3 applying clustering algorithms, 4 Assessing cluster assignment I use…

Continue Reading Gene expression (RNA-seq) clustering

Converting Ensembl gene id to Gene symbol

Converting Ensembl gene id to Gene symbol 0 Hi all, As mentioned earlier in this post, I tried to convert the Ensembl gene ids to the Gene symbol. I didn’t receive any error by the code below but the nrow of ens_to_symbol_biomart is 55605 and the length of ens is…

Continue Reading Converting Ensembl gene id to Gene symbol

“Given ref” field is empty when a ref. allele was in VCF input

VEP: “Given ref” field is empty when a ref. allele was in VCF input 0 Hi there, I’m running VEP using the following command: ref=”GRCh38.primary_assembly.genome.fa” vep=”/opt/vep_ensembl/ensembl-vep/vep” for ea in *Somatic.hc.vcf do $vep -i $ea -o vep/”$(echo $ea | sed s/.vcf//)”_VEP.txt –cache –dir_cache “/home/shared/vep_cache/” –assembly GRCh38 –merged –fasta $ref –hgvs –hgvsg…

Continue Reading “Given ref” field is empty when a ref. allele was in VCF input

NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) AND not provided – ClinVar

NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) AND not provided Based on: 1 submission [Details] Record status: current Accession: RCV000597527.1 Allele description [Variation Report for NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val)] NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) Gene: SYNE1:spectrin repeat containing nuclear envelope protein 1 [Gene – OMIM – HGNC] Variant type: Indel Cytogenetic location: 6q25.2 Genomic location: Preferred name: NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) HGVS:…

Continue Reading NM_182961.4(SYNE1):c.21155_21156delinsTT (p.Gly7052Val) AND not provided – ClinVar

Sort a sub column within a column while keeping the feature (LINUX)

I have a vcf file with these column headers: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BS_25YES2E3 BS_G5B6AD28 BS_QCGPE1ZX A sample feature within that vcf file chr1 10450 . T C 27.94 VQSRTrancheSNP99.90to100.00+ AC=1;AF=0.167;AN=6;BaseQRankSum=-1.676e+00;ClippingRankSum=0.789;DP=102;ExcessHet=4.7712;FS=4.868;MLEAC=1;MLEAF=0.167;MQ=34.67;MQRankSum=-1.084e+00;PG=0,0,0;QD=1.55;ReadPosRankSum=-2.169e+00;SOR=0.707;VQSLOD=-1.050e+01;culprit=MQ;ANN=C|upstream_gene_variant|MODIFIER|**DDX11L1**|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||1560|1||SNV|HGNC|HGNC:37102||||chr1:g.10450T>C,C|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||1419|1||SNV|HGNC|HGNC:37102|YES|||chr1:g.10450T>C,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene|||||||||||3954|-1||SNV|HGNC|HGNC:38034|YES|||chr1:g.10450T>C GT:AD:DP:FT:GQ:JL:JP:PL:PP 0/0:28,0:28:lowGQ:0:1:1:0,0,663:0,0,666 0/1:13,5:18:PASS:35:1:1:34,0,342:35,0,345 0/0:44,0:44:lowGQ:0:1:1:0,0,802:0,0,805 The portion in bold is what I want (DDX11L1). I…

Continue Reading Sort a sub column within a column while keeping the feature (LINUX)

Genes associated with GO term

Genes associated with GO term 1 I’ve read in an older thread that to retrieve all of the gene names associated with a GO id you use the biomaRt package, eg: library(biomaRt) ensembl = useMart(“ensembl”,dataset=”hsapiens_gene_ensembl”) gene.data <- getBM(attributes=c(‘hgnc_symbol’, ‘ensembl_transcript_id’, ‘go_id’), filters=”go_id”, values=”GO:0072599″, mart = ensembl) However, I’m not sure this…

Continue Reading Genes associated with GO term

Extract variant consequence count from gnomad and patient VCF file

Hello, I have 2 types of VEP annotated VCF file – regular vcf and gnomad genome file. I would like to extract counts of both missense, synonymous, upstream and intron variants for each gene in each file. Output should be something similar to this: MHTFR: missense 23, intron 100, synonymous…

Continue Reading Extract variant consequence count from gnomad and patient VCF file

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Statistics and reproducibility For multiplex fluorescent in situ hybridization (FISH) and immunofluorescence staining experiments, each ISH probe combination was repeated with similar results on at least two separate individuals per species, and on at least two sections per individual. The experiments were not randomized and the investigators were not blinded…

Continue Reading Comparative cellular analysis of motor cortex in human, marmoset and mouse

Reading microarray data from the Gene Expression Omnibus

Hi Caitlin, For this study, the uploaded data is normalised via GC-RMA, but is not log [base 2] (log2) transformed. To retrieve it, you need to do: library(GEOquery) gset <- getGEO(‘GSE12657’, GSEMatrix = TRUE, getGPL= FALSE) if (length(gset) > 1) idx <- grep(‘GPL8300’, attr(gset, ‘names’)) else idx <- 1 gset…

Continue Reading Reading microarray data from the Gene Expression Omnibus

How Find Genes On Specific Positions

I have the same problem and I am trying to solve this all in R by doing this: Use BiomaRt to get positions of all genes: genes<-getBM(c(“hgnc_symbol”,”ensembl_gene_id”,”chromosome_name”,”start_position”,”end_position”), mart=mart) Use genomicRanges to find the overlap between my dataset called “probes” and the output of BiomaRt. I still have not figured out…

Continue Reading How Find Genes On Specific Positions

Ttc30a affects tubulin modifications in a model for ciliary chondrodysplasia with polycystic kidney disease

Significance Cilia are tubulin-based cellular appendages, and their dysfunction has been linked to a variety of genetic diseases. Ciliary chondrodysplasia is one such condition that can co-occur with cystic kidney disease and other organ manifestations. We modeled skeletal ciliopathies by mutating two established disease genes in Xenopus tropicalis frogs. Bioinformatic…

Continue Reading Ttc30a affects tubulin modifications in a model for ciliary chondrodysplasia with polycystic kidney disease

Fetch HGNC id from gene symbol in python

Fetch HGNC id from gene symbol in python 0 Hi, I am having thousands of gene symbols and I want to get their HGNC id’s. I tried downloading HGNC rdf file from bioportal and used sparql to fetch most of the gene id’s. However, the HGNC rdf from bioportal was…

Continue Reading Fetch HGNC id from gene symbol in python

Editing header of a fasta file

Editing header of a fasta file 1 Hello everybody, I’ve been using sed but for simple steps and now I can’t do this: I have this header: >ENSP00000451042.1 pep chromosome:GRCh38:14:22438547:22438554:1 gene:ENSG00000223997.1 transcript:ENST00000415118.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD1 description:T cell receptor delta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12254] and I would like to obtein this:…

Continue Reading Editing header of a fasta file

Get gene names from rs SNP ids

Gene to rs id library(biomaRt) ## It might take long time to process if many genes (>50) in the list. ## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line. genes <- read.table(“~/hgnc_gene_symbols.txt”) ensembl = useMart(“ensembl”, dataset=”hsapiens_gene_ensembl”) dbsnp = useMart(“snp”, dataset = “hsapiens_snp”) getHGNC2ENSG =…

Continue Reading Get gene names from rs SNP ids

Running htseq-count to “grab” long non coding gene_id names

Running htseq-count to “grab” long non coding gene_id names 0 hi all, new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, “lnc_RNA” and “lncRNA”,…

Continue Reading Running htseq-count to “grab” long non coding gene_id names

where do I find transcript_biotype

where do I find transcript_biotype 1 Hi newbie_r, I am unsure; however, via biomaRt in R, one can generate a master table that has biotypes for Ensembl and RefSeq ‘transcripts’. require(biomaRt) ensembl <- useMart(‘ensembl’, dataset=”hsapiens_gene_ensembl”) annot <- getBM( attributes = c( ‘hgnc_symbol’, ‘ensembl_gene_id’, ‘ensembl_transcript_id’, ‘entrezgene_id’, ‘refseq_mrna’, ‘gene_biotype’), mart = ensembl)…

Continue Reading where do I find transcript_biotype

Download TCGA and GTEX data from Xena toilHub for (full genome but for 1 cancer/tissue type)

Download TCGA and GTEX data from Xena toilHub for (full genome but for 1 cancer/tissue type) 0 Dear All, I would like to download TCGA and GTEX gene expression data for ovarian cancer and ovary respectively from the Xena toilHub platform (all genes; RSEM expected counts). However, I only found…

Continue Reading Download TCGA and GTEX data from Xena toilHub for (full genome but for 1 cancer/tissue type)

How to map old Ensembl Gene IDs to HGNC symbols and Entrez IDs

How to map old Ensembl Gene IDs to HGNC symbols and Entrez IDs 0 I have a list of Ensembl version 74 gene IDs that I need to convert to HGNC symbols. I was wondering what the best way to go about this would be? Originally I used biomaRt with…

Continue Reading How to map old Ensembl Gene IDs to HGNC symbols and Entrez IDs

Find overlaping sequences with pyranges from overlap

I am trying to replicate the mergeByOverlap function from R BioConductor in python using the pyranges package. In R the code would be: gr.snp <- with(gr.snp, GRanges(chr, IRanges(start, end),rsid=gr.snp$rsid)) snp.annotated <- data.frame(mergeByOverlaps(gr.snp, gencode, maxgap=2000, type=”start”)) which returns: nrow(snp.annotated) [1] 34 colnames(snp.annotated) [1] “gr.snp.seqnames” “gr.snp.start” [3] “gr.snp.end” “gr.snp.width” [5] “gr.snp.strand” “gr.snp.rsid”…

Continue Reading Find overlaping sequences with pyranges from overlap

Converting mouse Gene IDs to Human while keeping genes that don’t convert

Hi there, I am using bioMart to convert some gene IDs from mouse to human for some data I generated through RNA-seq. I am currently mapping using the following function: convertMouseGeneList <- function(x){ require(“biomaRt”) human = useMart(“ensembl”, dataset = “hsapiens_gene_ensembl”) mouse = useMart(“ensembl”, dataset = “mmusculus_gene_ensembl”) genesV2 = getLDS(attributes =…

Continue Reading Converting mouse Gene IDs to Human while keeping genes that don’t convert

How to search dbSNP using a list of SNPs and retrieve Gene name (hgnc symbol if existing, otherwise just whatever is in there)

How to search dbSNP using a list of SNPs and retrieve Gene name (hgnc symbol if existing, otherwise just whatever is in there) 2 I have a list of 500.000 SNPs from which I want to obtain the gene name. I try to search with biomaRt library(data.table) library(biomaRt) rs <-…

Continue Reading How to search dbSNP using a list of SNPs and retrieve Gene name (hgnc symbol if existing, otherwise just whatever is in there)