Categories
Tag: ensembl
Finding EntreZ IDs for refseq IDs
Finding EntreZ IDs for refseq IDs 1 Hi all, I have a list of bacterial RefSeq IDs corresponding to protein sequences (e.g., WP_007430823.1, WP_019686959.1, etc.). I need to retrieve the corresponding EntreZ IDs for these RefSeq IDs, in order to cotinue the RNA-seq downstream analysis (GO enrichment analysis ). Here’s…
Error in CIBERSORTx
Hello, I am trying to use CIBERSORT to deconvolute the immune cells in pancreatic cancer after my treatments. I have 3 biological replicates of Control, Treatments A,B,C. Using edgeR, I created the cpm matrix which is not log transformed. and converted it to the required format as follows: # Load…
No gene can be mapped
Hi When I do my data’s Gene Set Enrichment Analysis with ClusterProfiler using codes of Mohammed Khalfan from website, when I run the following code and got the error message. gse <- gseGO(geneList=gene_list, ont = “ALL”, keyType = “ENSEMBL”, nPerm = 10000, minGSSize = 3, maxGSSize = 800, pvalueCutoff =…
Confusion about “Top Features table” in cellranger web summary output for CITE-Seq libraries
Confusion about “Top Features table” in cellranger web summary output for CITE-Seq libraries 0 I am reviewing the Web Summary (count) documentation on the 10X genomics website and became confused as to what is being displayed in the Top Features by Cluster table under the Antibody tab (see screenshot below)….
Genomic hypomethylation in cell-free DNA predicts responses to checkpoint blockade in lung and breast cancer
Lung cancer ICB cohort Advanced non-small cell lung carcinoma patients who were treated with anti-PD-1/PD-L1 monotherapy at Samsung Medical Center, Seoul, Republic of Korea were enrolled for this study. The present study has been reviewed and approved by the Institutional Review Board (IRB) of the Samsung Medical Center (IRB no….
An FGFR2 mutation as the potential cause of a new phenotype including early-onset osteoporosis and bone fractures: a case report | BMC Medical Genomics
Anamnesis vitae A 13 year old male born was as result of the VII pregnancy, from unrelated parents. Other pregnancies resulted in: I-II silent miscarriage in the second trimester; III – female, born in 2003 (III-3 Fig. 1) that has the following phenotypic features: genu valgum, hip dysplasia, combined thoracolumbar scoliosis,…
Annotate variants with ensembl rest api
Annotate variants with ensembl rest api 0 I have a variant file (.vcf.gz), and I want to annotate this file using the Ensembl Rest API, particularly the Vep Rest API. I am new to this variant annotation; however, I have seen a couple of codes from the Ensembl page on…
A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain
Mouse breeding and husbandry All experimental procedures related to the use of mice were approved by the Institutional Animal Care and Use Committee of the AIBS, in accordance with NIH guidelines. Mice were housed in a room with temperature (21–22 °C) and humidity (40–51%) control within the vivarium of the AIBS…
Conserved and divergent gene regulatory programs of the mammalian neocortex
Nucleus preparation from frozen brain tissue for Chromium single-cell multiome ATAC and gene expression analysis M1 tissue was obtained from three human donors (male, aged 42, 29 and 58 years), three macaque donors (male, aged 6 (Macaca mulatta), 6 (M. mulatta) and 14 (Macaca fascicularis) years), three marmoset (Callithrix jacchus)…
Beyond the exome: utility of long-read whole genome sequencing in exome-negative autosomal recessive diseases | Genome Medicine
Our cohort comprises 34 families in which a presumably autosomal recessive disease defied molecular diagnosis by clinical exome sequencing (short-read sequencing-based) and reanalysis performed on the index individual for each family (Fig. 1). The index patient in each family was subjected to an average of 10 × depth lrWGS except for Family F8602…
ORA with clusterProfiler
Hello everyone, I am trying to do an enrichment analysis of Arabidopsis data, however I am still wondering how to build it or what to use as a background (universe), could you guide me? I am working with this example. diff_genes <- read_delim(file = “differential_genes.tsv”, delim = “\t”) biomartr::organismBM(organism =…
Solved Currently, ENSEMBL GENE IDs with their versions
Currently, ENSEMBL GENE IDs with their versions represent each unique gene in each row. As an example, for ENSG00000000003.15, ENSG00000000003 represents the unique Ensembl gene ID and 15 represents the version. In biology, we are more familiar with the gene symbol, known as the HGNC SYMBOL. ENSG00000000003.15 Ensembl ID corresponds…
A CNN based m5c RNA methylation predictor
Hammad, M. et al. A novel end-to-end deep learning approach for cancer detection based on microscopic medical images. Biocybern. Biomed. Eng. 42(3), 737–748 (2022). Article Google Scholar Hammad, M. et al. Efficient multimodal deep-learning-based covid-19 diagnostic system for noisy and corrupted images. J. King Saud Univ.-Sci. 34(3), 101898 (2022). Article …
How to query 1000 genomes project VCF files for specific regions without downloading whole chromosomes first?
How to query 1000 genomes project VCF files for specific regions without downloading whole chromosomes first? 2 Hi, I am trying to find a way to extract an arbitrary region of human genome from the 1000 genomes project’s VCF files without having to download the genome or individual chromosome files…
RNA secondary structure for GRCH38 transcriptome
RNA secondary structure for GRCH38 transcriptome 0 Hi, I am wondering if I could download the dot-bracket notation rna-secondary structure data for the whole human transcriptome from somewhere. I know there are tools, but they take too long to run even on a modest number of rna sequences. Thus I…
Annotation GTF/GFF Arabidopsis thaliana
Annotation GTF/GFF Arabidopsis thaliana 0 Hello, this is my first time working with Arabidopsis and I am quantifying with featureCounts as follows: featureCounts -p –countReadPairs -t exon -g gene_id -a ../genome_arabidopsis/Arabidopsis_thaliana.TAIR10.57.gtf -o SRR14059988.txt ../alignment_hisat2/SRR14059988_sorted.bam However, in my counts I am having counts associated with long non conding, ribosomals, mitochondrial and…
kallisto index build difference according to version
kallisto index build difference according to version 0 Hi all, I’m trying to implement kallisto for a dataset of single-end RNA-seq data, And obviously started with building an index (The files were downloaded from ensembl). Homo_sapiens.GRCh37.ncrna.fa.gz Homo_sapiens.GRCh37.cdna.all.fa.gz using the command kallisto index -i index.idx Homo_sapiens.GRCh37.ncrna.fa.gz Homo_sapiens.GRCh37.cdna.all.fa.gz And although this wasn’t…
Fastest way to convert BED to GTF/GFF with gene_ids?
This is probably a duplicated question from: How To Convert Bed Format To Gtf? How to convert original BED file to a GTF ? Converting different annotation file formats (GTF/GFF/BED) to each other How to change scaffold.fasta file or scaffold.bed file to GTF file? Convert bed12 to GFF convert bed12…
Music compensates for altered gene expression in age-related cognitive disorders
Global impact of music on the human transcriptome We first aimed at quantifying the global effect of music on the transcriptomes of the two groups of donors separately. ACD patients exposed to music showed 2.3 times more DEGs (n = 2605) than controls (n = 1148); Table 2. Moreover, while the proportion up-regulated/down-regulated DEGs…
Does GNOMAD use all LOFTEE LoF filters?
Does GNOMAD use all LOFTEE LoF filters? 0 Hi all, I have some lof variants and I want to know if they have already been detected in GNOMAD (I am essentially curious if my lof variants are novel). In order to make this comparison to GNOMAD I have run my…
Identification of constrained sequence elements across 239 primate genomes
De novo assembly and repeat-masking To maximize the species diversity of primates in our analyses, we newly sequenced and assembled the genomes of 187 different primate species, initially presented in refs. 11,23, for which no other reference genome assembly was available. In brief, each individual was sequenced with 150 bp paired…
How to do KEGG pathway analysis when I have a gene with multiple entrez IDs?
How to do KEGG pathway analysis when I have a gene with multiple entrez IDs? 0 Hello, I did DESeq2 on my samples and I have a list of DEGs that I would like to do kegg pathway analysis on. For DESeq2 I used biomart and tximport to assign external…
Where do these snpeff annotation come from?
Where do these snpeff annotation come from? 0 I am annotating a VCF with annotation from snpeff, which I want to use eventually to parse for predicted loss of function variants I want to understand the annotation better and document how they are happening. I run this command: snpEff “hg38″…
DNA methylation change in blood cells of FB and CFS patients
Introduction Fibromyalgia (FM) and Chronic Fatigue Syndrome (CFS) are characterized by chronic pain, fatigue, and weakness. Patients with these symptoms also suffer from sleep abnormalities and report affected cognitive processes such as memory. The diagnosis of these two syndromes is challenging and is based on questionnaires that make the diagnosis…
EMBL’s European Bioinformatics Institute (EMBL-EBI) in 2023 | Nucleic Acids Research
Abstract The European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) is one of the world’s leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe’s only intergovernmental life sciences organisation. This…
low rate of ‘Successfully assigned alignments’
Hello everybody. I’m a newbie in RNA-seq Analysis, and I have this situation that I don’t really understand. While working with featureCounts for RNA-seq read quantification, I came across an intriguing issue. The rate of successfully assigned alignments turned out to be unexpectedly low, totalling just 15463270 (7.6%). This was…
HTseq reports missing attribute name
HTseq reports missing attribute name 1 Hello, I am running this htseq command htseq-count -r pos -t gene -i gene -s yes -f bam \ /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam \ /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3 > \ /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt However I get this error: Error processing GFF file (line 75 of file /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3): Feature gene:ENSTGUG00000013637 does not contain…
Estimating the proportion of nonsense variants undergoing the newly described phenomenon of manufactured splice rescue
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–24. Article PubMed Google Scholar Abou…
CRISPR-Cas9 Gene Editing Is On The Cusp Of Something Big
Natali_Mis Gene editing, also known as genome editing, is a method where the DNA of an organism is modified using biotechnological techniques. It allows scientists to add, remove, or alter genetic material at particular locations in the genome. This 2-part series will cover the basics of CRISPR-Cas9 (see below) in…
issue in RNA -seq analysis
Forum:issue in RNA -seq analysis 0 hello all. i am working on RNA seq analysis. i would like to know following things: first i downloaded genome fasta file for non-coding rna from ensembl and got the gtf file for hg38 from there itself. performed hist2 and got 17% alignment for…
Evaluating 17 methods incorporating biological function with GWAS summary statistics to accelerate discovery demonstrates a tradeoff between high sensitivity and high positive predictive value
Method selection We reviewed the published literature through February 2020 to identify methods that met the following criteria: i. Descriptively categorized as (a) annotation-based; (b) pleiotropy-based; or (c) eQTL-based. ii. Utilized GWAS summary statistics, as opposed to individual-level genotype data. iii. Implemented using freely-available software or packages. iv. Provided either…
Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA
Summary Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY,…
Chromatin priming elements direct tissue-specific gene activity before hematopoietic specification
Introduction The development of multicellular organisms requires the activation of different gene batteries which specify the identity of each individual cell type. Such shifts in cellular identity are driven by shifts in the gene regulatory network (GRN) consisting of transcription factors (TFs) binding to the enhancers and promoters of their…
Is there a way to query Ensembl to get all 3’UTRs from all species?
Is there a way to query Ensembl to get all 3’UTRs from all species? 1 I am trying to obtain stats on how many 3’UTRs are annotated in Ensembl. I would really like to download the as many annotated 3’UTRs as possible from as many species as possible and find…
use bioservices.UniProt (python) to map uniprot accession to [multiple] ensembl ids
use bioservices.UniProt (python) to map uniprot accession to [multiple] ensembl ids 0 I am just trying to get a (one to many) mapping from uniprot accession -> Ensembl ids. In the spirit of exploration, I have the following code to use bioservices.UniProt pull all possible columns for the selection of…
Application of CNV-seq technology | IJWH
Introduction Ultrasound soft markers refer to small nonspecific variations in foetal structure found in prenatal ultrasound that are often associated with abnormal chromosome number or pathogenic copy number variations (CNVs).1,2 Common ultrasound soft markers include nuchal translucency (NT) thickness, nuchal fold (NF) thickness, nasal bone dysplasia, choroid plexus cyst, intracardiac…
LncRNA INHEG promotes glioma stem cell maintenance and tumorigenicity through regulating rRNA 2’-O-methylation
Ethics statement All mice procedures in this study were performed under an animal protocol approved by the Institutional Animal Care and Use Committee guidelines of Westlake University. The procedures and protocols for glioma patients were approved by the institutional review board of Beijing Tiantan Hospital. Informed consent was obtained from…
Single-cell CRISPR screens in vivo map T cell fate regulomes in cancer
Mice The research conducted in this study complied with all of the relevant ethical regulations. The animal protocols were approved by and performed in accordance with the Institutional Animal Care and Use Committee of St. Jude Children’s Research Hospital. C57BL/6, OT-I50, pmel51 and Rosa26-Cas9 knock-in52 mice were purchased from The…
The Imageable Genome | Nature Communications
For the Imageable Genome project, we developed a data pipeline that identifies texts containing radiotracers, recognizes and extracts names of radiotracers from texts, filters for clinically relevant radiotracers and their associated targets, and translates protein names, i.e. of radiotracer targets, to names of the coding genes. We then downloaded the…
A Cre-dependent massively parallel reporter assay allows for cell-type specific assessment of the functional effects of non-coding elements in vivo
Animal models All procedures involving animals were approved by the Institutional Animal Care and Use Committee (IACUC) at Washington University in St. Louis, MO. Veterinary care and housing was provided by the veterinarians and veterinary technicians of Washington University School of Medicine under Dougherty lab’s approved IACUC protocol. All protocols…
featureCounts error???
featureCounts error??? 0 # get gtf # wget #https://ftp.ensembl.org/pub/release-110/gtf/mus_musculus/Mus_musculus.GRCm39.110.gtf.gz input.dir <- “/home/laudy/data/featurecounts/” setwd(input.dir) featureCounts -p -O -T -a /input.dir/Mus_musculus.GRCm39.110.gtf -o /input.dir/quants.txt /input.dir/PMN_CTRAligned.sortedByCoord.RD.RG.RC.out.bam please can someone tell me what’s wrong Im tried allllllll the options and he give me the same error: Error: object ‘p’ not found or Error: unexpected symbol…
Whole genome sequencing in high-grade cervical intraepitheli… : Medicine
1. Introduction Cervical cancer (CC) is the third most common cancer in women worldwide and has a high mortality rate among women. In 2008, CC was responsible for 275,000 deaths, thereby being the fourth leading cause of cancer death in females worldwide.[1,2] In China, CC is the second most…
Bioconductor – AnnotationHub
DOI: 10.18129/B9.bioc.AnnotationHub Client to access AnnotationHub resources Bioconductor version: Release (3.6) This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be…
Why am I getting different results with kallisto?
Why am I getting different results with kallisto? 1 Hello, I created a reference for my species of interest using the cDNA file from ensembl and then aligned my data to it using kallisto. Then, i was asked to add the ncRNA to the reference and a egfp sequence. I…
PIGx ChIP-seq pipeline error
Hi Lisa, You also need to modify the gtf annotation file using: sed ‘/^#/d’ annotation_file.gtf > annotation_file_no_header.gtf Best, Alex > On 12. Oct 2022, at 15:07, Bora Uyar <borauy…@gmail.com> wrote: > > You would need to check how your fasta headers look and how the chromosomes are represented in…
Circular extrachromosomal DNA promotes tumor heterogeneity in high-risk medulloblastoma
Statistical methods Statistical tests, test statistics and P values are indicated where appropriate in the main text. Categorical associations were established using the chi-squared test of independence if n > 5 for all categories and Fisherʼs exact test otherwise. For both tests, the Python package scipy.stats v1.5.3 implementation was used64. Multiple hypothesis corrections…
Zebrafish danRer11 chr6:43,426,661-43,433,266 UCSC Genome Browser v456
DANIO-CODE Track Hub 3P-seq trackshidedensesquishpackfull CAGE-seq trackshidedensesquishpackfull ChIP-seq trackshidedensesquishpackfull RNA-seq trackshidedensefull Cell Typeshidedensesquishpackfull Consensus promotershidedensesquishpackfull Conservation and CRISPR targetshideshow COPEs and pooled DOPEshideshow Enhancer validationhideshow HiC trackshidedensefull Stages_Typeshidedensesquishpackfull Mapping and Sequencing Base Positionhidedensefull Assemblyhidedensesquishpackfull Gaphidedensesquishpackfull GC Percenthidedensefull GRC Incidenthidedensesquishpackfull INSDChidedensesquishpackfull RefSeq Acchidedensesquishpackfull Restr Enzymeshidedensesquishpackfull Short Matchhidedensesquishpackfull …
Now Available! Compare NCBI RefSeq and UniProt Datasets
Do you need to compare and combine data based on NCBI RefSeq and UniProt datasets, and aren’t sure which proteins are comparable? For many years, NCBI Gene has provided information about the relationships between RefSeq and UniProt accessions courtesy of data imported from UniProt, but the tremendous growth of both…
biomaRt Ensembl version 83 (mouse)
biomaRt Ensembl version 83 (mouse) 0 Hello, I’m looking for an old Ensemb release (83) for GRCm38 build: library(biomaRt) mouse <- useEnsembl(biomart = “genes”, dataset=”mmusculus_gene_ensembl”, version=83) Error: Specified Ensembl version is not available. Use listEnsemblArchives() to view available versions. Version 83 is not displayed in the list of Ensembl archives…
Genetics and epidemiology of mutational barcode-defined clonal hematopoiesis
Identification of CH cases from WGS in ISL and UKB We used WGS from 45,510 Icelanders and 130,709 British ancestry participants from the UKB17,18. Average sequencing depth was 33× for UKB and 38× for ISL. Participants with prior diagnoses of hematological disorders or grossly abnormal hematology measurements on entry were…
In VEP annotation, how is the codon field interpreted?
In VEP annotation, how is the codon field interpreted? 0 After annotating with VEP a VCF file, we obtain different fields. One of them is called Codons which represents the affected codon in the transcript of the gene. Below is a screenshot of Insertions from a sample: HGVSp_Short RefSeq Codons…
vcf – VEP annotation INFO field Ensembl IDs and locations
I have a vcf file that I annoteted with VEP, for human data. I have run VEP to annotate my files with some additional parameters (as shown below in the ##VEP-command-line). However, my output is rather strange (mainly the INFO column). ##VEP=”v108″ time=”2023-04-27 15:13:08″ cache=”workflow/resources/variants/cache_vep/homo_sapiens/108_GRCh38″ ensembl-funcgen=108.56bb136 ensembl-variation=108.a885ada ensembl-io=108.58d13c1 ensembl=108.d8a9c80 1000genomes=”phase3″…
zero counts for all genes in RNAseq data of Ferret
zero counts for all genes in RNAseq data of Ferret 0 I have bulk RNAseq data from Ferret and trying to get counts per gene. to do so I used hisat2 and got the genome from here: hgdownload.soe.ucsc.edu/goldenPath/musFur1/bigZips/musFur1.2bit after aligning the fastq files I used htseq and the following command:…
DNA hypomethylation characterizes genes encoding tissue-dominant functional proteins in liver and skeletal muscle
Overview of this study In this study, we measured the DNA methylome from mouse liver and skeletal muscle, integrated the data with the transcriptome and proteome of these mouse tissues22,23, and examined how tissue-dominant protein and gene expression were associated with DNA hypomethylation (Fig. 1). In this study, we measured DNA…
Bulk RNAseq Salmon index building which transcriptome to use
Bulk RNAseq Salmon index building which transcriptome to use 0 Hi all, I am new to the platform. I was wondering what the common/best practice is regarding building a Salmon index for bulk RNAseq analysis of human cells. The tutorial for Salmon/Alevin is using the complete transcriptome from GENCODE (gencode.vM23.transcripts.fa.gz,…
How to get just protein_coding genes using biomart in R
How to get just protein_coding genes using biomart in R 2 Dear all, I would like to have help with getting just protein_coding genes from gene expression file using biomart. What I have is a file of all genes expression for mouse (mm10) with ensemble gene_names, and I need to…
An extremely fast Non-Overlapping Exon Length calculator written in Rust
Hi all! Introducing the Non-Overlapping Exon Length calculator (NOEL), an extremely fast GTF/GFF per gene exon length extractor written in Rust. See the code and latest updates here: github/alejandrogzi/noel In case you do not want to read the whole text: NOEL outperforms all open-sourced scripts/tools for this task. It can…
Epigenetic regulation during cancer transitions across 11 tumour types
Specimen data All samples for MM, OV, BRCA, PDAC, UCEC, CRC, CESC/AD, SKCM and HNSCC, as well as 2 NATs for GBM and 1 NAT for ccRCC were collected with informed consent in concordance with Institutional Review Board (IRB) approval at the School of Medicine at Washington University in St…
Genomic disturbance of vitellogenin 2 (vtg2) leads to vitellin membrane deficiencies and significant mortalities at early stages of embryonic development in zebrafish (Danio rerio)
A large deletion mutation of 2811 bp of gDNA was introduced into zebrafish vtg2 via CRISPR/Cas9 genome editing. A schematic representation of the general strategy for CRISPR target design and application is given in Fig. 1A–C. The introduced deletion involved 1692 bp of the vtg2 transcript, encoding 564 aa of its respective protein, and…
How measure the specificity of RT-qPCR primers
RT-qPCR primer specificity can be measured using a bioinformatics workflow that involves analyzing potential mismatches, cross-matches, co-amplification of multiple gene splice variants, and sub-optimal amplicon sizes in silico. This workflow utilizes publicly available resources such as NCBI Primer BLAST, in silico PCR in UCSC genome browser, and Ensembl DNA database…
Conversion of Gene Name to Ensembl ID
Using Enembl REST API: rest.ensembl.org/lookup/symbol/homo_sapiens/A1CF assembly_name: GRCh38 biotype: protein_coding db_type: core description: APOBEC1 complementation factor [Source:HGNC Symbol;Acc:HGNC:24086] display_name: A1CF end: 50885675 id: ENSG00000148584 logic_name: ensembl_havana_gene_homo_sapiens object_type: Gene seq_region_name: 10 source: ensembl_havana species: homo_sapiens start: 50799409 strand: -1 version: 15 rest.ensembl.org/lookup/symbol/homo_sapiens/A1CF?content-type=application/json {“strand”:-1,”assembly_name”:”GRCh38″,”version”:15,”species”:”homo_sapiens”,”end”:50885675,”description”:”APOBEC1 complementation factor [Source:HGNC Symbol;Acc:HGNC:24086]”,”source”:”ensembl_havana”,”db_type”:”core”,”object_type”:”Gene”,”id”:”ENSG00000148584″,”seq_region_name”:”10″,”display_name”:”A1CF”,”start”:50799409,”logic_name”:”ensembl_havana_gene_homo_sapiens”,”biotype”:”protein_coding”} Look up multiple symbols at…
couldn’t find matching transcriptome, returning non-ranged SummarizedExperiment AND unable to find an inherited method for function ‘seqinfo’ for signature ‘”SummarizedExperiment”‘
Dear Michael, I have not been able to run tximeta properly. I have read #38 but could not get any clue. The quant.sf files were generated by the latest nf-core RNA-seq pipeline (3.12.0), as the pipeline did not save the Salmon index, I generated it myself. Salmon used by nf-core…
multiple alternate alleles for a single variant in R
Ensembl’s POST “vep/:species/region”: multiple alternate alleles for a single variant in R 1 I am working with this POST request from Ensembl. Let us suppose, as a starting example, that we have the following variant (in the form of a R code snippet): { “variants” : [“21 26960070 26960070 G…
Mosaic chromosomal alterations in blood across ancestries using whole-genome sequencing
Study population We included 67,390 participants from 19 TOPMed studies: Genetics of Cardiometabolic Health in the Amish (n = 1,109) (ref. 32), Atherosclerosis Risk in Communities Study (n = 3,780) (ref. 33), Barbados Genetics Asthma Study (n = 980), Mount Sinai BioMe Biobank (n = 9,392) (ref. 34), Coronary Artery Risk Development in Young Adults (n = 3,293) (ref. 35),…
Ensembl transcript IDs
Ensembl transcript IDs 0 Hi everyone, From the GENCODE gtf file, I noticed that there are multiple ensembl transcript IDs for one gene ID and and one ensembl transcript id may have different versions (different values after the decimal). There are different transcript isoforms of one gene (due to alternative…
Does chromosome order matter when combining individual primary assembly files?
Does chromosome order matter when combining individual primary assembly files? 0 I’m attempting to create a salmon index file for gallus gallus, for which I need both the genome and transcriptome files; from Ensembl, I was able to download the cdna.all transcriptome file as well as the individual chromosome primary…
Errors in Functional Enrichment Analysis with Clusterprofiler
Errors in Functional Enrichment Analysis with Clusterprofiler 0 library(clusterProfiler)library(org.Hs.eg.db) library(tidyverse) library(DOSE) library(ReactomePA) library(enrichplot) library(fgsea) library(data.table) library(ggplot2) keytypes(org.Hs.eg.db) res = read.csv(“coex.Csv”) head(res) original_gene_list = res$correlation names(original_gene_list) <- res$gene gene_list<-na.omit(original_gene_list) gene_list = sort(gene_list, decreasing = TRUE) gse <- gseGO(geneList=gene_list, ont =”ALL”, keyType = “ENSEMBL”, minGSSize = 3, maxGSSize = 800, pvalueCutoff =…
Map genome positions onto protein coordinates?
I am looking for a way to do the following 1) reliably find a protein structure e.g. pdb file or pre-computed alphafold results that is associated with a particular gene/transcript isoform. I found a way to do this somewhat for human genes using biomart, but i’d like to be able…
Eukaryotic Genome Annotation | Genome Annotation Pipeline
PASAPASA was originally developed at The Institute for Genomic Research in 2002 as an effort to automatically improve gene structures in Arabidopsis thaliana. Since then, it has been applied to numerous Eukaryotic genome annotation projects including Rice, Aspergillus species, Plasmodium falciparum, Schistosoma mansoni, Aedes aegypti, mouse, human, among others. Functions of PASA…
GSEA error 1005 The collapsed dataset was empty when used with chip:ftp.broadinstitute.org://pub
I am using the GUI version of GSEA. The samples are of mice. I prepared the required files (.gct) and phenotypelabel (.cls), as required. Expression dataset (partial, feature used are normalized counts) 409 5 NAME description CF355 CF328 WT316 WT351 WT354 ENSMUSG00000025902.14 NA 77 61 110 76 54 ENSMUSG00000102269.2 NA…
NGS one-liner to call variants
Tutorial:NGS one-liner to call variants 0 This is a tutorial about creating a pipeline for sequence analysis in a single line. It is made for capture/amplicon short read sequencing in mind for human DNA and tested with reference exome sequencing data described here. I share the process and debuging steps…
NGS oneliner
Tutorial:NGS oneliner 0 This is a tutorial about creating a pipeline for sequence analysis in a single line.I share the process and debuging steps gone through while putting it together.Source is available at: github.com/barslmn/ngsoneliner/I couldn’t make a longer post, complete version of this post: omics.sbs/blog/NGSoneliner/NGSoneliner.html Pipeline # fastp –in1 “$R1″…
hclust with similar data gives different
hclust with similar data gives different 0 I have RNAseq data with expression in ensembl ID and I convert them into gene symbol and for further analysis. I had performed hclustering and then cut tree using dynamicTreeCut with a geneset of 20010 genes and got 27 different gene-clusters. Now after…
How do I write a correctly formatted gff3 file in R?
Dear all, I am trying to annotate non-coding RNA in a small RNA-seq dataset. The RNACentral gff3 file that I am using has different chromosome identifiers than the genome assembly. I have loaded the gff3 file in R where I changed the chromosome identifiers using the the assembly report and…
VEP ensemble docker and plugins
VEP ensemble docker and plugins 0 Hi, I have used VEP in local using docker and cache and ran this command sudo docker run \ -v /mnt/dodl_drive/sarek_cc/vep:/data ensemblorg/ensembl-vep \ vep -i input_sample.vcf \ –cache \ –output_file output_sample.vcf \ –everything which ran successfully and the output VCF also is as expected….
AnnotationHub data for Mus musculus seems to be missing, while previously it was there
AnnotationHub data for Mus musculus seems to be missing, while previously it was there 1 @1917db9e Last seen 8 hours ago United States I have been using tximeta to get the annotation data for my RNAseq data, created by salmon. I was previously able to get the annotations for the…
Neuron Navigator 1 (Nav1) regulates the response to cocaine in mice
Mouse strains All mouse procedures were approved by the Institutional Animal Care and Use Committees at Binghamton or Stanford University; and were conducted in accordance with the National Institute of Health Guide for Care and Use of Laboratory Animals, Eighth Edition. All mice were originally obtained from Jackson Laboratories, and…
E-IT hiring Bioinformatics Business Analyst in United States
Company Description E-IT Professionals Corp. (EIT) is an award-winning IT consulting, recruitment, management, and staffing organization founded in 1999. EIT is comprised of around 320 Information Technology Consultants and boasts revenues approaching $21M overall. Our dedicated team of professionals in the US and India serves clients from various geographies. We…
How to annotate function of genes from non-model organism
How to annotate function of genes from non-model organism 1 I conducted RNA differential expression analysis for a non-model organism and obtained a list of significantly different genes identified by their NCBI RefSeq IDs. However, I encountered difficulties in annotating the gene functions due to the limited availability of gene…
map Ensembl gene ID from hg19 to hg38
map Ensembl gene ID from hg19 to hg38 0 Hello! I would like to convert Ensembl gene ID from hg19 to hg38 with R. I tried with this code: ensembl <- useMart(“ensembl”, dataset = “hsapiens_gene_ensembl”, host= “grch37.ensembl.org“) ensembl_ids <- c(“ENSG00000183878”, “ENSG00000146083”) converted_ids <- getLDS(attributes = c(“ensembl_gene_id”), filters = “ensembl_gene_id”, values…
Simulation of undiagnosed patients with novel genetic conditions
Simulated patient initialization We simulate patients for each of the 2134 diseases in Orphanet20 (orphadata.org, accessed October 29, 2019) that do not correspond to a group of clinically heterogeneous disorders (i.e., Orphanet’s “Category” classification31), have at least one associated phenotype, and have at least one causal gene. For Orphanet diseases…
Genotyping, sequencing and analysis of 140,000 adults from Mexico City
Recruitment of study participants The MCPS was established in the late 1990s following discussions between Mexican scientists at the National Autonomous University of Mexico (UNAM) and British scientists at the University of Oxford about how best to measure the changing health effects of tobacco in Mexico. These discussions evolved into…
Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
Anderson, P. K. et al. Emerging infectious diseases of plants: Pathogen pollution, climate change and agrotechnology drivers. Trends Ecol. Evol. 19, 535–544. doi.org/10.1016/j.tree.2004.07.021 (2004). Article PubMed Google Scholar Fisher, M. C. et al. Emerging fungal threats to animal, plant and ecosystem health. Nature 484, 186–194 (2012). Article ADS CAS PubMed …
rs67047829 genotypes of ERV3-1/ZNF117 are associated with lower body mass index in the Polish population
Datamining showed that most (80/137 = 58%) of the high-frequency PTC-SNPs, defined as having minor-allele frequencies (MAFs) between 1 and 99%, including rs67047829, had MAFs > 5%, indicating most likely near-neutral selection (Supplementary Table S1). Initial regression models for several PTC-SNPs versus BMI gave low p-values (Supplementary Table S3), suggesting possible future studies, but…
Issues with Mixture file when using CIBERSORTx
Hi, I am trying to run a deconvolution analysis of bulk-RNAseq samples using the LM22 signature matrix provided. I converted all ENSEMBL ID’s to their Symbol, and removed NA and duplicated entries. counts_salmon <- as.data.frame(txi$counts) counts_salmon$symbol <- mapIds(org.Hs.eg.db, keys = rownames(counts_salmon), column = “SYMBOL”, keytype = “ENSEMBL”) counts_salmon <- counts_salmon…
Actalent hiring 100% Remote- Bioinformatics Scientist in North Chicago, Illinois, United States
Seeking 1 Bioinformatics Scientist100% RemoteDescriptionWe are seeking a highly skilled and motivated contractor to join the Genomics Research Center and provide support for bioinformatics projects in the field of Ophthalmology and Specialty Medicine. As a contractor, you will collaborate with a team of scientists and researchers to analyze and interpret…
AlphaMissense Plugin VEP
AlphaMissense Plugin VEP 0 I’ve installed alphamissense plugin in VEP, but I can’t use it. I’ve downloaded the requested files and launch the tabix command before use it. Then I’ve launched the command but I got this error: WARNING: Failed to instantiate plugin AlphaMissense: ERROR: No file specified Try using…
US Tech Solutions hiring Bioinformatics Scientist in United States
Title: Bioinformatics Scientist Job type: Contract Duration: 06+ Months Client’s Location: Remote Job Description: We are seeking a highly skilled and motivated contractor to join the Genomics Research Center and provide support for bioinformatics projects in the field of Ophthalmology and Specialty Medicine. As a contractor, you will collaborate with…
The status of the human gene catalogue
Understanding our Genetic Inheritance: The US Human Genome Project, The First Five Years 1991-1995 (US Department of Health and Human Services, US Department of Energy, 1990). Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022). Describes the first complete gap-free assembly and annotation of…
How to assign gene names after kallisto when I add GFP?
How to assign gene names after kallisto when I add GFP? 1 Hello, I would like to generate a new reference for kallisto where I add GFP. I found this link: github.com/igordot/genomics/blob/master/workflows/ref-genome-gfp.md and it seems pretty straightforward to add the GFP for the alignment. However, I am not sure how…
AlphaFold reveals the construction of the protein universe
Examine fixing protein folding at deepmind.com/AlphaFold and see a timeline of our breakthrough here. It’s been one yr since we launched and open sourced AlphaFoldour AI system to foretell the 3D construction of a protein simply from its 1D amino acid sequence, and created the AlphaFold Protein Structure Database (AlphaFold…
biomaRT doesn’t report uniprot id
biomaRT doesn’t report uniprot id 1 Hi all, I’m working with UniProtKB and would like to get the genomic coordinates of list of proteins. To my knowledge, the first step is to retrieve the mapping between uniprot and ensemble via biomaRT: annotLookup <- getBM( mart = mart, attributes = c(…
TRIM25 targets p300 for degradation
Introduction Protein levels are regulated at several nodes. One mode of protein level regulation acts through enhancing or reducing gene transcription. Gene transcription can be stimulated by binding transcription factors to promoters or enhancers of target genes and by posttranslational modifications of transcription factors and histones. Such modifications can be…
Complex multifactorial DE analysis with limma/edgeR based on rnaseq data
Dear Biostars, I would like to ask you one specific question regarding the DE analysis on an RNASeq dataset of samples, spanning a multi-factor experimental design. Briefly, unstimulated neutrophils of 4 healthy donors, were cultivated with distinct treatment conditions-that is, supernatant of organoids from different cancer/normal patient samples; There are…
Dataset’s name in BioMart for S. pombe
Dataset’s name in BioMart for S. pombe 2 Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart to create TranscriptDB? cheers, S.pombe BioMart dataset • 3.6k views Looks like you figured out another way of getting what…
DESeq2 error – converting result object into dataframe
DESeq2 error – converting result object into dataframe 0 Hello everyone, I am performing differential expression analysis using DESeq2 and the results contrasts as follows: res <- as.data.frame(results(dds, contrast=c(Var, effectvar, baselinevar), alpha = as.numeric(pvalue))) res <- res[order(res$padj), ] res$ensembl_gene_id <- rownames(res) rownames(res) <- NULL res <- na.omit(res) res_all <- res…
Debian — Details of package bioperl in trixie
Perl tools for computational molecular biology The Bioperl project is a coordinated effort to collect computational methods routinely used in bioinformatics into a set of standard CPAN-style, well-documented, and freely available Perl modules. It is well-accepted throughout the community and used in many high-profile projects, e.g., Ensembl. The recommended packages…
KCNQ potassium channels modulate Wnt activity in gastro-oesophageal adenocarcinomas
Introduction The KCNQ (potassium voltage-gated channel subfamily Q) family of ion channels encode potassium transporters (1). KCNQ proteins typically repolarise the plasma membrane of a cell after depolarisation by allowing the export of potassium ions, and are therefore involved in wide-ranging biological functions including cardiac action potentials (2), neural excitability…
Role of microRNAs and their downstream target transcription factors in zebrafish thrombopoiesis
Piggyback hybrid knockdowns of microRNAs in adult zebrafish and estimation of total thrombocyte counts Previous work from our laboratory has identified genes expressed in zebrafish thrombocytes through RNASeq analysis by 10X genomic sequencing to identify the genes involved in thrombocyte differentiation and function24. A total of fifteen miRNA genes were…
Calculation of TMB on gene level
Calculation of TMB on gene level 1 Hi all, I have TCGA cancer data and i want to calculate TMB on gene level. Can anyone please tell me how to do that? TCGA has TMB score based on patient level. I need on gene level. Thanks! genomics • 30 views…