Tag: geneID

KEGG T01002: 225865

Entry 225865            CDS       T01002                                  Symbol Catsper1, Catsper, KSper Name (RefSeq) cation channel, sperm associated 1   KO K16889   cation channel sperm-associated protein 1 Organism mmu  Mus musculus (house mouse) Brite KEGG Orthology (KO) [BR:mmu00001] 09180 Brite Hierarchies  09183 Protein families: signaling and cellular processes   03037 Cilium and associated proteins [BR:mmu03037]    225865 (Catsper1)   04040 Ion channels [BR:mmu04040]    225865 (Catsper1)Cilium and associated…

Continue Reading KEGG T01002: 225865

KEGG T01001: 5836

Entry 178               CDS       T01001                                  Symbol AGL, GDE Name (RefSeq) amylo-alpha-1, 6-glucosidase, 4-alpha-glucanotransferase   KO K01196   glycogen debranching enzyme [EC:2.4.1.25 3.2.1.33] Organism hsa  Homo sapiens (human) Pathway hsa00500   Starch and sucrose metabolism hsa01100   Metabolic pathways Module hsa_M00855   Glycogen degradation, glycogen => glucose-6P Network nt06017  Glycogen metabolism   Element N00718   Glycogen degradation Disease H00069   Glycogen storage disease H01760  …

Continue Reading KEGG T01001: 5836

KEGG T01001: 57661

Entry 57661             CDS       T01001                                  Symbol PHRF1, PPP1R125, RNF221 Name (RefSeq) PHD and ring finger domains 1   KO K17586   PHD and RING finger domain-containing protein 1 Organism hsa  Homo sapiens (human) Brite KEGG Orthology (KO) [BR:hsa00001] 09180 Brite Hierarchies  09181 Protein families: metabolism   01009 Protein phosphatases and associated proteins [BR:hsa01009]    57661 (PHRF1)Protein phosphatases and associated proteins [BR:hsa01009] Protein serine/threonine…

Continue Reading KEGG T01001: 57661

How to calculate TPM from featureCounts output

How to calculate TPM from featureCounts output 0 I would like to find the TPM counts for the GSE102073 study. When i downloaded the raw data from GEO, the raw data are featureCounts output. First part of the file: # Program:featureCounts v1.4.3-p1; Command:”/data/NYGC/Software/Subread/subread-1.4.3-p1-Linux-x86_64/bin/featureCounts” “-s” “2” “-a” “/data/NYGC/Resources/ENCODE/Gencode/gencode.v18.annotation.gtf” “-o” “/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/featureCounts/Sample_JB4853_counts.txt” “/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/STAR_alignment/Sample_JB4853_Aligned.out.WithReadGroup.sorted.bam”…

Continue Reading How to calculate TPM from featureCounts output

KEGG T01001: 10682

Entry 10682             CDS       T01001                                  Symbol EBP, CDPX2, CHO2, CPX, CPXD, MEND Name (RefSeq) EBP cholestenol delta-isomerase   KO K01824   cholestenol Delta-isomerase [EC:5.3.3.5] Organism hsa  Homo sapiens (human) Pathway hsa00100   Steroid biosynthesis hsa01100   Metabolic pathways Module hsa_M00101   Cholesterol biosynthesis, squalene 2,3-epoxide => cholesterol Network nt06034  Cholesterol biosynthesis   Element N01624   Cholesterol biosynthesis Disease H01194   X-linked chondrodysplasia punctata…

Continue Reading KEGG T01001: 10682

KEGG T01002: 68197

Entry 68197             CDS       T01002                                  Symbol Ndufc2, 1810004I06Rik, 2010300P09Rik, G1 Name (RefSeq) NADH:ubiquinone oxidoreductase subunit C2   KO K03968   NADH dehydrogenase (ubiquinone) 1 subunit C2 Organism mmu  Mus musculus (house mouse) Pathway mmu00190   Oxidative phosphorylation mmu01100   Metabolic pathways mmu04714   Thermogenesis mmu04723   Retrograde endocannabinoid signaling mmu04932   Non-alcoholic fatty liver disease mmu05010   Alzheimer disease mmu05012   Parkinson disease…

Continue Reading KEGG T01002: 68197

KEGG T04921: 106155605

Entry 106155605         CDS       T04921                                  Name (RefSeq) 3-phosphoinositide-dependent protein kinase 1   KO K06276   3-phosphoinositide dependent protein kinase-1 [EC:2.7.11.1] Organism lak  Lingula anatina Pathway lak04068   FoxO signaling pathway lak04140   Autophagy – animal lak04150   mTOR signaling pathway Brite KEGG Orthology (KO) [BR:lak00001] 09130 Environmental Information Processing  09132 Signal transduction   04068 FoxO signaling pathway    106155605   04150 mTOR signaling pathway    106155605 09140 Cellular Processes  09141 Transport…

Continue Reading KEGG T04921: 106155605

KEGG T01001: 2914

Entry 2916              CDS       T01001                                  Symbol GRM6, CSNB1B, GPRC1F, MGLUR6, mGlu6 Name (RefSeq) glutamate metabotropic receptor 6   KO K04608   metabotropic glutamate receptor 6 Organism hsa  Homo sapiens (human) Pathway hsa04072   Phospholipase D signaling pathway hsa04080   Neuroactive ligand-receptor interaction hsa04724   Glutamatergic synapse Disease H00787   Congenital stationary night blindness Brite KEGG Orthology (KO) [BR:hsa00001] 09130 Environmental Information…

Continue Reading KEGG T01001: 2914

KEGG T01002: 20597

Entry 20597             CDS       T01002                                  Symbol Smpd1, A-SMase, ASM, Zn-SMase, aSMase Name (RefSeq) sphingomyelin phosphodiesterase 1, acid lysosomal   KO K12350   sphingomyelin phosphodiesterase [EC:3.1.4.12] Organism mmu  Mus musculus (house mouse) Pathway mmu00600   Sphingolipid metabolism mmu01100   Metabolic pathways mmu04071   Sphingolipid signaling pathway mmu04142   Lysosome mmu04217   Necroptosis Brite KEGG Orthology (KO) [BR:mmu00001] 09100 Metabolism  09103 Lipid metabolism   00600 Sphingolipid metabolism    20597…

Continue Reading KEGG T01002: 20597

KEGG T01002: 97418

Entry 97418             ncRNA     T01002                                  Symbol Rnu5g, Rnu5a, U5a Name (RefSeq) RNA, U5G small nuclear   KO K14279   U5 spliceosomal RNA Organism mmu  Mus musculus (house mouse) Pathway mmu03040   Spliceosome Brite KEGG Orthology (KO) [BR:mmu00001] 09120 Genetic Information Processing  09121 Transcription   03040 Spliceosome    97418 (Rnu5g) 09180 Brite Hierarchies  09182 Protein families: genetic information processing   03041 Spliceosome [BR:mmu03041]    97418 (Rnu5g)  09184 RNA family   03100 Non-coding RNAs…

Continue Reading KEGG T01002: 97418

Dot Plot using KEGG

Dot Plot using KEGG 2 Hi, I´m trying to do a dotplot using data from KEGG. I have my data represented, but I don´t want the species name in the X axis. My comand is: kegg_gene_list = sort(kegg_gene_list, decreasing = TRUE) kegg_gene_list = sort(kegg_gene_list, decreasing = TRUE) kegg_organism = “mmu”…

Continue Reading Dot Plot using KEGG

KEGG T01001: 80347

Entry 80347             CDS       T01001                                  Symbol COASY, DPCK, NBIA6, NBP, PCH12, PPAT, UKR1, pOV-2 Name (RefSeq) Coenzyme A synthase   KO K02318   phosphopantetheine adenylyltransferase / dephospho-CoA kinase [EC:2.7.7.3 2.7.1.24] Organism hsa  Homo sapiens (human) Pathway hsa00770   Pantothenate and CoA biosynthesis hsa01100   Metabolic pathways hsa01240   Biosynthesis of cofactors Module hsa_M00120   Coenzyme A biosynthesis, pantothenate => CoA…

Continue Reading KEGG T01001: 80347

KEGG T01001: 407016

Entry 407016            miRNA     T01001                                  Symbol MIR26A2, MIRN26A2, mir-26a-2 Name (RefSeq) microRNA 26a-2   KO K16984   microRNA 26a Organism hsa  Homo sapiens (human) Pathway hsa05206   MicroRNAs in cancer Brite KEGG Orthology (KO) [BR:hsa00001] 09160 Human Diseases  09161 Cancer: overview   05206 MicroRNAs in cancer    407016 (MIR26A2) 09180 Brite Hierarchies  09183 Protein families: signaling and cellular processes   04147 Exosome [BR:hsa04147]    407016 (MIR26A2)  09184 RNA family   03100 Non-coding…

Continue Reading KEGG T01001: 407016

KEGG T01002: 21426

Entry 21426             CDS       T01002                                  Symbol Tfec, Tcfec, bHLHe34 Name (RefSeq) transcription factor EC   KO K15591   transcription factor EC Organism mmu  Mus musculus (house mouse) Brite KEGG Orthology (KO) [BR:mmu00001] 09180 Brite Hierarchies  09182 Protein families: genetic information processing   03000 Transcription factors [BR:mmu03000]    21426 (Tfec)Transcription factors [BR:mmu03000] Eukaryotic type  Basic helix-loop-helix/leucine zipper (bHLH-ZIP)   Ubiquitous bHLH-ZIP factors    21426 (Tfec) BRITE hierarchyBRITE hierarchy SSDB…

Continue Reading KEGG T01002: 21426

KEGG T01001: 220202

Entry 220202            CDS       T01001                                  Symbol ATOH7, Math5, NCRNA, PHPVAR, RNANC, bHLHa13 Name (RefSeq) atonal bHLH transcription factor 7   KO K09083   atonal protein 1/7 Organism hsa  Homo sapiens (human) Disease H02112   Persistent hyperplastic primary vitreous Brite KEGG Orthology (KO) [BR:hsa00001] 09180 Brite Hierarchies  09182 Protein families: genetic information processing   03000 Transcription factors [BR:hsa03000]    220202 (ATOH7)Transcription factors [BR:hsa03000] Eukaryotic type  Basic…

Continue Reading KEGG T01001: 220202

KEGG T08632: 123713541

Entry 123713541         CDS       T08632                                  Name (RefSeq) alpha-1,3/1,6-mannosyltransferase ALG2   KO K03843   alpha-1,3/alpha-1,6-mannosyltransferase [EC:2.4.1.132 2.4.1.257] Organism pbx  Pieris brassicae (large cabbage white) Pathway pbx00510   N-Glycan biosynthesis pbx00513   Various types of N-glycan biosynthesis pbx01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:pbx00001] 09100 Metabolism  09107 Glycan biosynthesis and metabolism   00510 N-Glycan biosynthesis    123713541   00513 Various types of N-glycan biosynthesis    123713541 09180 Brite Hierarchies  09181 Protein…

Continue Reading KEGG T08632: 123713541

KEGG T07277: 120500365

Entry 120500365         CDS       T07277                                  Symbol SRP54 Name (RefSeq) signal recognition particle 54 kDa protein   KO K03106   signal recognition particle subunit SRP54 [EC:3.6.5.4] Organism pmoa  Passer montanus (Eurasian tree sparrow) Pathway pmoa03060   Protein export Brite KEGG Orthology (KO) [BR:pmoa00001] 09120 Genetic Information Processing  09123 Folding, sorting and degradation   03060 Protein export    120500365 (SRP54) 09180 Brite Hierarchies  09183 Protein families: signaling…

Continue Reading KEGG T07277: 120500365

KEGG T01001: 388561

Entry 388561            CDS       T01001                                  Symbol ZNF761, ZNF468 Name (RefSeq) zinc finger protein 761   KO K09228   KRAB domain-containing zinc finger protein Organism hsa  Homo sapiens (human) Pathway hsa05168   Herpes simplex virus 1 infection Brite KEGG Orthology (KO) [BR:hsa00001] 09160 Human Diseases  09172 Infectious disease: viral   05168 Herpes simplex virus 1 infection    388561 (ZNF761) 09180 Brite Hierarchies  09182 Protein families: genetic…

Continue Reading KEGG T01001: 388561

KEGG T05101: 110889682

Entry 110889682         CDS       T05101                                  Name (RefSeq) alpha-1,6-mannosyl-glycoprotein 2-beta-N-acetylglucosaminyltransferase   KO K00736   alpha-1,6-mannosyl-glycoprotein beta-1,2-N-acetylglucosaminyltransferase [EC:2.4.1.143] Organism han  Helianthus annuus (common sunflower) Pathway han00510   N-Glycan biosynthesis han00513   Various types of N-glycan biosynthesis han01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:han00001] 09100 Metabolism  09107 Glycan biosynthesis and metabolism   00510 N-Glycan biosynthesis    110889682   00513 Various types of N-glycan biosynthesis    110889682 09180 Brite Hierarchies  09181 Protein families:…

Continue Reading KEGG T05101: 110889682

KEGG T01002: 627132

Entry 627132            CDS       T01002                                  Symbol Vmn2r93, EG627132 Name (RefSeq) vomeronasal 2, receptor 93   KO K04613   vomeronasal 2 receptor Organism mmu  Mus musculus (house mouse) Brite KEGG Orthology (KO) [BR:mmu00001] 09180 Brite Hierarchies  09183 Protein families: signaling and cellular processes   04030 G protein-coupled receptors [BR:mmu04030]    627132 (Vmn2r93)G protein-coupled receptors [BR:mmu04030] Others  Chemoreception   Vomeronasal pheromone    627132 (Vmn2r93) BRITE hierarchyBRITE hierarchy SSDB OrthologParalogGene clusterGFIT…

Continue Reading KEGG T01002: 627132

Perl debugging help – miRWoods

Hello, I was wondering if anyone with Perl experience could help me debug a miRWoods? I tried reaching out the authors via e-mail with no response, and issues on GitHub are turned off so I’d be super grateful if anyone could provide any insight. When I run miRWoods I get…

Continue Reading Perl debugging help – miRWoods

KEGG T07740: 121914508

Entry 121914652         CDS       T07740                                  Symbol PLB1 Name (RefSeq) LOW QUALITY PROTEIN: phospholipase B1, membrane-associated   KO K14621   phospholipase B1, membrane-associated [EC:3.1.1.4 3.1.1.5] Organism sund  Sceloporus undulatus (fence lizard) Pathway sund00564   Glycerophospholipid metabolism sund00565   Ether lipid metabolism sund00590   Arachidonic acid metabolism sund00591   Linoleic acid metabolism sund00592   alpha-Linolenic acid metabolism sund01100   Metabolic pathways Brite KEGG Orthology…

Continue Reading KEGG T07740: 121914508

KEGG T07457: 120810382

Entry 120825006         CDS       T07457                                  Symbol acsl1a Name (RefSeq) long-chain-fatty-acid–CoA ligase 1a isoform X1   KO K01897   long-chain acyl-CoA synthetase [EC:6.2.1.3] Organism gat  Gasterosteus aculeatus (three-spined stickleback) Pathway gat00061   Fatty acid biosynthesis gat00071   Fatty acid degradation gat01100   Metabolic pathways gat01212   Fatty acid metabolism gat03320   PPAR signaling pathway gat04146   Peroxisome gat04216   Ferroptosis gat04920   Adipocytokine signaling pathway…

Continue Reading KEGG T07457: 120810382

KEGG T01015: 4329643

Entry 4330469           CDS       T01015                                  Name (RefSeq) monodehydroascorbate reductase 4, peroxisomal   KO K08232   monodehydroascorbate reductase (NADH) [EC:1.6.5.4] Organism osa  Oryza sativa japonica (Japanese rice) (RefSeq) Pathway osa00053   Ascorbate and aldarate metabolism osa01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:osa00001] 09100 Metabolism  09101 Carbohydrate metabolism   00053 Ascorbate and aldarate metabolism    4330469Enzymes [BR:osa01000] 1. Oxidoreductases  1.6  Acting on NADH or NADPH   1.6.5  With a quinone…

Continue Reading KEGG T01015: 4329643

KEGG T01001: 2135

Entry 2135              CDS       T01001                                  Symbol EXTL2, EXTR2 Name (RefSeq) exostosin like glycosyltransferase 2   KO K02369   alpha-1,4-N-acetylglucosaminyltransferase EXTL2 [EC:2.4.1.223] Organism hsa  Homo sapiens (human) Pathway hsa00534   Glycosaminoglycan biosynthesis – heparan sulfate / heparin hsa01100   Metabolic pathways Module hsa_M00059   Glycosaminoglycan biosynthesis, heparan sulfate backbone Network nt06029  Glycosaminoglycan biosynthesis   Element N01582   Heparan sulfate biosynthesis Brite KEGG Orthology…

Continue Reading KEGG T01001: 2135

How to get the gene ID

There is a brute force method for this. You could upload the fasta sequence in tblastn and keep a filter of 100% sequence cover and blast. Usually the first hit should give you the Genbank/Refsec ID for your protein sequence. The next one will require some scripting, but if you…

Continue Reading How to get the gene ID

KEGG T01001: 8789

Entry 8789              CDS       T01001                                  Symbol FBP2, CORLK Name (RefSeq) fructose-bisphosphatase 2   KO K03841   fructose-1,6-bisphosphatase I [EC:3.1.3.11] Organism hsa  Homo sapiens (human) Pathway hsa00010   Glycolysis / Gluconeogenesis hsa00030   Pentose phosphate pathway hsa00051   Fructose and mannose metabolism hsa01100   Metabolic pathways hsa01200   Carbon metabolism hsa04152   AMPK signaling pathway hsa04910   Insulin signaling pathway hsa04922   Glucagon signaling pathway…

Continue Reading KEGG T01001: 8789

KEGG T01001: 64781

Entry 64781             CDS       T01001                                  Symbol CERK, LK4, dA59H18.2, dA59H18.3, hCERK Name (RefSeq) ceramide kinase   KO K04715   ceramide kinase [EC:2.7.1.138] Organism hsa  Homo sapiens (human) Pathway hsa00600   Sphingolipid metabolism hsa01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:hsa00001] 09100 Metabolism  09103 Lipid metabolism   00600 Sphingolipid metabolism    64781 (CERK)Enzymes [BR:hsa01000] 2. Transferases  2.7  Transferring phosphorus-containing groups   2.7.1  Phosphotransferases with an alcohol group as acceptor    2.7.1.138  ceramide kinase     64781…

Continue Reading KEGG T01001: 64781

KEGG T01001: 11146

Entry 11146             CDS       T01001                                  Symbol GLMN, FAP, FAP48, FAP68, FKBPAP, GLML, GVM, VMGLOM Name (RefSeq) glomulin, FKBP associated protein   KO K23345   glomulin Organism hsa  Homo sapiens (human) Pathway hsa05131   Shigellosis Network nt06521  NLR signaling   Element N00948   Shigella IpaH7.8 to NLRP3 Inflammasome signaling pathway Disease H00531   Venous malformations Brite KEGG Orthology (KO) [BR:hsa00001] 09160 Human Diseases  09171…

Continue Reading KEGG T01001: 11146

KEGG T01001: 9055

Entry 9055              CDS       T01001                                  Symbol PRC1, ASE1 Name (RefSeq) protein regulator of cytokinesis 1   KO K16732   Ase1/PRC1/MAP65 family protein Organism hsa  Homo sapiens (human) Brite KEGG Orthology (KO) [BR:hsa00001] 09180 Brite Hierarchies  09182 Protein families: genetic information processing   03036 Chromosome and associated proteins [BR:hsa03036]    9055 (PRC1)  09183 Protein families: signaling and cellular processes   04812 Cytoskeleton proteins [BR:hsa04812]    9055 (PRC1)Chromosome and…

Continue Reading KEGG T01001: 9055

KEGG T01001: 1845

Entry 5801              CDS       T01001                                  Symbol PTPRR, EC-PTP, PCPTP1, PTP-SL, PTPBR7, PTPRQ Name (RefSeq) protein tyrosine phosphatase receptor type R   KO K04458   receptor-type tyrosine-protein phosphatase R [EC:3.1.3.48] Organism hsa  Homo sapiens (human) Pathway hsa04010   MAPK signaling pathway Network nt06526  MAPK signaling   Element N01593   Regulation of GF-RTK-RAS-ERK signaling, PTP Brite KEGG Orthology (KO) [BR:hsa00001] 09130 Environmental Information…

Continue Reading KEGG T01001: 1845

KEGG T01001: 6714

Entry 6714              CDS       T01001                                  Symbol SRC, ASV, SRC1, THC6, c-SRC, p60-Src Name (RefSeq) SRC proto-oncogene, non-receptor tyrosine kinase   KO K05704   tyrosine-protein kinase Src [EC:2.7.10.2] Organism hsa  Homo sapiens (human) Pathway hsa01521   EGFR tyrosine kinase inhibitor resistance hsa01522   Endocrine resistance hsa04012   ErbB signaling pathway hsa04015   Rap1 signaling pathway hsa04062   Chemokine signaling pathway hsa04137   Mitophagy…

Continue Reading KEGG T01001: 6714

KEGG T05163: 107386622

Entry 107386622         CDS       T05163                                  Name (RefSeq) gamma-aminobutyric acid receptor subunit beta-4-like isoform X1   KO K05192   gamma-aminobutyric acid receptor subunit theta Organism nfu  Nothobranchius furzeri (turquoise killifish) Pathway nfu04080   Neuroactive ligand-receptor interaction Brite KEGG Orthology (KO) [BR:nfu00001] 09130 Environmental Information Processing  09133 Signaling molecules and interaction   04080 Neuroactive ligand-receptor interaction    107386622 09180 Brite Hierarchies  09183 Protein families: signaling and cellular…

Continue Reading KEGG T05163: 107386622

KEGG T06108: 110176608

Entry 110179502         CDS       T06108                                  Name (RefSeq) probable citrate synthase, mitochondrial isoform X1   KO K01647   citrate synthase [EC:2.3.3.1] Organism dsr  Drosophila serrata Pathway dsr00020   Citrate cycle (TCA cycle) dsr00630   Glyoxylate and dicarboxylate metabolism dsr01100   Metabolic pathways dsr01200   Carbon metabolism dsr01210   2-Oxocarboxylic acid metabolism dsr01230   Biosynthesis of amino acids Module dsr_M00009   Citrate cycle (TCA cycle,…

Continue Reading KEGG T06108: 110176608

Obtaining TPM values from STAR alignment and counts with featurecounts using R’s tidyverse syntax (dplyr and tidyr)

Hello! I have a table of counts that I got by aligning rna seq samples with STAR and using featureCounts, and my goal is to get TPM values for each gene of the table. As a first step, I imported my table into R and modified it a bit to…

Continue Reading Obtaining TPM values from STAR alignment and counts with featurecounts using R’s tidyverse syntax (dplyr and tidyr)

KEGG T00007: b0720

Entry b0720             CDS       T00007                                  Symbol gltA Name (RefSeq) citrate synthase   KO K01647   citrate synthase [EC:2.3.3.1] Organism eco  Escherichia coli K-12 MG1655 Pathway eco00020   Citrate cycle (TCA cycle) eco00630   Glyoxylate and dicarboxylate metabolism eco01100   Metabolic pathways eco01110   Biosynthesis of secondary metabolites eco01120   Microbial metabolism in diverse environments eco01200   Carbon metabolism eco01210   2-Oxocarboxylic acid metabolism…

Continue Reading KEGG T00007: b0720

Retrieve Promoter Sequences by GeneID

Retrieve Promoter Sequences by GeneID 0 Hello! I want to retrieve promoter sequences starting from a list of Gene_ID, i had try to used RSAT-retrieve sequence, but the problem is that they retrieve the sequence from the start codon or the stop codon, but i want retrieve the sequence 1500bp…

Continue Reading Retrieve Promoter Sequences by GeneID

KEGG T05045: 111020632

Entry 111020632         CDS       T05045                                  Name (RefSeq) protein SUPPRESSOR OF K(+) TRANSPORT GROWTH DEFECT 1-like   KO K12196   vacuolar protein-sorting-associated protein 4 Organism mcha  Momordica charantia (bitter melon) Pathway mcha03250   Viral life cycle – HIV-1 mcha04144   Endocytosis Brite KEGG Orthology (KO) [BR:mcha00001] 09120 Genetic Information Processing  09125 Information processing in viruses   03250 Viral life cycle – HIV-1    111020632 09140 Cellular…

Continue Reading KEGG T05045: 111020632

Query in indexing human genome

Hello , I have to do RNAseq analysis of human cancer cell lines , for that I need to index human genome , as a refrence genome. I index the human genome gff file from thr NCBI.. during some lecture I have heard that ncbi human genome file has some…

Continue Reading Query in indexing human genome

Potential segfault bug in featureCounts using long read data

Hi, I think I might have found a bug in featureCounts from Rsubread (v2.12.3). I am trying to find reads overlapping exon junctions from a personalised reference, using Nanopore long read BAMs. I am afraid I cannot share fully reproducible code as I am using my own reference, but this…

Continue Reading Potential segfault bug in featureCounts using long read data

tx2gene.txt : transcript-to-gene mapping file

tx2gene.txt : transcript-to-gene mapping file 0 Hi, I am trying to quantify gene count from transcript abundance (from kallisto, salmon etc.) using Tximport. For that i have to create a transcript to gene mapping file. How can i create this? I created one with from GCF_013265735.2_USDA_OmykA_1.1_rna.fasta (Rainbow trout) fro ncbi…

Continue Reading tx2gene.txt : transcript-to-gene mapping file

Performing GO analysis from Differential Peaks

Performing GO analysis from Differential Peaks 0 Hello everyone, I called for FindMarkers() in order to find differential peaks between two biological conditions and the following was output (“diff.peaks”). My question is how would I generate a nice chart for GO analysis from this? My current code is: install.packages(“JASPAR2022”) library(JASPAR2022)…

Continue Reading Performing GO analysis from Differential Peaks

KEGG T02677: SSYRP_v1c07610

Entry SSYRP_v1c07610    CDS       T02677                                  Symbol coaE Name (GenBank) dephospho-CoA kinase   KO K00859   dephospho-CoA kinase [EC:2.7.1.24] Organism ssyr  Spiroplasma syrphidicola Pathway ssyr00770   Pantothenate and CoA biosynthesis ssyr01100   Metabolic pathways ssyr01240   Biosynthesis of cofactors Module ssyr_M00120   Coenzyme A biosynthesis, pantothenate => CoA Brite KEGG Orthology (KO) [BR:ssyr00001] 09100 Metabolism  09108 Metabolism of cofactors and vitamins   00770 Pantothenate and…

Continue Reading KEGG T02677: SSYRP_v1c07610

KEGG T04126: 106758963

Entry 106758963         CDS       T04126                                  Name (RefSeq) ethylene-responsive transcription factor 1   KO K09286   EREBP-like factor Organism vra  Vigna radiata (mung bean) Brite KEGG Orthology (KO) [BR:vra00001] 09180 Brite Hierarchies  09182 Protein families: genetic information processing   03000 Transcription factors [BR:vra03000]    106758963Transcription factors [BR:vra03000] Eukaryotic type  Other transcription factors   AP2/ERF    106758963 BRITE hierarchyBRITE hierarchy SSDB OrthologParalogGene clusterGFIT Motif Pfam:  AP2 Motif Other DBs NCBI-GeneID: …

Continue Reading KEGG T04126: 106758963

Should you specify “-p” for paired end reads using featureCounts?

Should you specify “-p” for paired end reads using featureCounts? 0 I’m trying to understand whether or not I should be using the -p flag for featureCounts. Here’s the explanation: -p If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads;…

Continue Reading Should you specify “-p” for paired end reads using featureCounts?

Questions about DESeq and GOenrichment analysis for tomato

Questions about DESeq and GOenrichment analysis for tomato 0 Hello all, I am a beginner for bioinformatics and I have 2 questions about RNAseq data processing for tomato. 1) I am always confused about the DESeq’s normalization function for gene length. I have 2 data sets at hand, one is…

Continue Reading Questions about DESeq and GOenrichment analysis for tomato

KEGG T07241: 114692077

Entry 114692077         CDS       T07241                                  Symbol Gmpr Name (RefSeq) GMP reductase 1   KO K00364   GMP reductase [EC:1.7.1.7] Organism pleu  Peromyscus leucopus (white-footed mouse) Pathway pleu00230   Purine metabolism pleu01100   Metabolic pathways pleu01232   Nucleotide metabolism Brite KEGG Orthology (KO) [BR:pleu00001] 09100 Metabolism  09104 Nucleotide metabolism   00230 Purine metabolism    114692077 (Gmpr)Enzymes [BR:pleu01000] 1. Oxidoreductases  1.7  Acting on other nitrogenous compounds as donors   1.7.1  With NAD+ or…

Continue Reading KEGG T07241: 114692077

why the metabolomics file does not merge?

why the metabolomics file does not merge? 1 hello guys, I am trying to get the metabolomics list but it seems like it does not merge , it returns an empty list. where and what and I am doing wrong? library(KEGGREST) library(org.Hs.eg.db) library(annotate) ## Get enzyme-gene annotations res1 = keggLink(“enzyme”,…

Continue Reading why the metabolomics file does not merge?

KEGG T01001: 10137

Entry 10137             CDS       T01001                                  Symbol RBM12, HRIHFB2091, SCZD19, SWAN Name (RefSeq) RNA binding motif protein 12   KO K24526   RNA-binding protein 12 Organism hsa  Homo sapiens (human) Disease H01649   Schizophrenia Brite KEGG Orthology (KO) [BR:hsa00001] 09180 Brite Hierarchies  09182 Protein families: genetic information processing   03041 Spliceosome [BR:hsa03041]    10137 (RBM12)Spliceosome [BR:hsa03041] Other splicing related proteins  Spliceosome associated proteins (SAPs)   RNA binding proteins…

Continue Reading KEGG T01001: 10137

KEGG T01001: 79685

Entry 8819              CDS       T01001                                  Symbol SAP30 Name (RefSeq) Sin3A associated protein 30   KO K19202   histone deacetylase complex subunit SAP30 Organism hsa  Homo sapiens (human) Pathway hsa05169   Epstein-Barr virus infection Brite KEGG Orthology (KO) [BR:hsa00001] 09160 Human Diseases  09172 Infectious disease: viral   05169 Epstein-Barr virus infection    8819 (SAP30) 09180 Brite Hierarchies  09182 Protein families: genetic information processing   03036 Chromosome and associated…

Continue Reading KEGG T01001: 79685

KEGG T01001: 8741

Entry 8741              CDS       T01001                                  Symbol TNFSF13, APRIL, CD256, TALL-2, TALL2, TNLG7B, TRDL-1, UNQ383/PRO715, ZTNF2 Name (RefSeq) TNF superfamily member 13   KO K05475   tumor necrosis factor ligand superfamily member 13 Organism hsa  Homo sapiens (human) Pathway hsa04060   Cytokine-cytokine receptor interaction hsa04672   Intestinal immune network for IgA production hsa05323   Rheumatoid arthritis Drug target Atacicept:  D09704 Sibeprenlimab: …

Continue Reading KEGG T01001: 8741

KEGG T04662: 101947625

Entry 101947625         CDS       T04662                                  Symbol JPH1 Name (RefSeq) junctophilin-1 isoform X1   KO K19530   junctophilin Organism cpic  Chrysemys picta (western painted turtle) Brite KEGG Orthology (KO) [BR:cpic00001] 09190 Not Included in Pathway or Brite  09193 Unclassified: signaling and cellular processes   99992 Structural proteins    101947625 (JPH1) BRITE hierarchy SSDB OrthologParalogGene clusterGFIT Motif Pfam:  MORN DUF4690 Motif Other DBs NCBI-GeneID: …

Continue Reading KEGG T04662: 101947625

KEGG T01003: 502143

Entry 502143            CDS       T01003                                  Symbol Idi2 Name (RefSeq) isopentenyl-diphosphate delta isomerase 2   KO K01823   isopentenyl-diphosphate Delta-isomerase [EC:5.3.3.2] Organism rno  Rattus norvegicus (rat) Pathway rno00900   Terpenoid backbone biosynthesis rno01100   Metabolic pathways Module rno_M00095   C5 isoprenoid biosynthesis, mevalonate pathway rno_M00367   C10-C20 isoprenoid biosynthesis, non-plant eukaryotes Brite KEGG Orthology (KO) [BR:rno00001] 09100 Metabolism  09109 Metabolism of terpenoids and…

Continue Reading KEGG T01003: 502143

Heatmap from count matrix

Heatmap from count matrix 1 Hi everyone, I have a feature count matrix which looks like this GeneID sample 1 sample 2 sample 3 gene1 0 1 7 gene2 120 6 0 gene3 0 100 8 I want to create a heatmap of this data where I want to show…

Continue Reading Heatmap from count matrix

How to import dataset from other software into DESeq2?

How to import dataset from other software into DESeq2? 0 @ddf74715 Last seen 12 hours ago United States Hi, I am new to the DESeq2, and I wonder if the dataset (either .csv or .txt) prepared from other program can be imported to DESeq2 as a form of DESeqDataSet in…

Continue Reading How to import dataset from other software into DESeq2?

How to Merge RNA Replicates

How to Merge RNA Replicates 1 I am following the manual for a program called TimeReg that says “If there are multiple replicates, merge them to get one expression profile. For gene expression data, you may use the average expression (FPKM or TPM) of the replicates.” I have two replicates…

Continue Reading How to Merge RNA Replicates

How to make “Custom annotation File” for GO analysis using TOPgo

How to make “Custom annotation File” for GO analysis using TOPgo 0 Hello Biostars, I would like to perform GO analysis using R package called Topgo. I have deseq data as well as GO term ID gained after functional annotation as image present here. Using these information, I would like…

Continue Reading How to make “Custom annotation File” for GO analysis using TOPgo

Chromosome-level genome assembly of the critically endangered Baer’s pochard (Aythya baeri)

Ethics statement All animal handling and experimental procedures were approved by the Qufu Normal University Biomedical Ethics Committee (approval number: 2022001). Sample and sequencing Baer’s pochard tissue for whole-genome sequencing was obtained from a dead individual that had strayed into a fishing net in Shandong (China). The muscle tissue that…

Continue Reading Chromosome-level genome assembly of the critically endangered Baer’s pochard (Aythya baeri)

Why not use ONLY promoter-bound peaks when testing for enrichment in differentially-bound regions?

In several manuals (example) on ChIP-seq analysis they pre-select, for instance +1000bp and -1000bp from the TSS as the “promoter-bound” regions: peakAnno_bcl11b <- ChIPseeker::annotatePeak(peak = ‘bcl11b_peaks.narrowPeak’, TxDb=txdb, tssRegion=c(-1000, 1000) ) which produces an object with a slot @anno in which each peak is assigned either “Promoter”, “5’ UTR”, “3’ UTR”,…

Continue Reading Why not use ONLY promoter-bound peaks when testing for enrichment in differentially-bound regions?

Error parsing strand (?) from GFF line

Error parsing strand (?) from GFF line 0 I am trying to assemble RNA transcripts using stringtie and facing the following error. Error parsing strand (?) from GFF line: NC_037304.1 RefSeq gene 58315 59481 . ? . ID=gene-DA397_mgp34;Dbxref=GeneID:36335702;Name=nad1;exception=trans-splicing;gbkey=Gene;gene=nad1;gene_biotype=protein_coding;locus_tag=DA397_mgp34;part=2 my comand is : stringtie -p 8 -G Genome/arab_thaliana.gtf -o Assemble/NR1.gtf –l…

Continue Reading Error parsing strand (?) from GFF line

KEGG enrichment in R and gene IDs

KEGG enrichment in R and gene IDs 2 @239caad3 Last seen 3 days ago Belgium Hi, I am trying to run a KEGG enrichment analysis on my data. My genes are in SYMBOL, which I converted to ENTREZID, but I need them in “kegg” or “ncbi-geneID” to run enrichKEGG. I…

Continue Reading KEGG enrichment in R and gene IDs

Error generating counts df for use with DRIMSeq/DEXseq

Hi, I am attempting to work through the workflow described in “Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification.” I am running into an error message when I try to make the counts dataframe for DRIMseq: Error in data.frame(gene_id = txdf$GENEID, feature_id = txdf$TXNAME, cts) : arguments…

Continue Reading Error generating counts df for use with DRIMSeq/DEXseq

TxDB.Hsapiens.UCSC.hg38.knownGene with locateVariants() identifying SNPs from various chromosome being part of the same gene

I am trying to annotate a list of SNPs using the hg38 genome (knownGene) and locateVariants(). The program is able to successfully run and provide “GeneIDs” for several of the loci. However, some GeneIDs are applied to SNPs in completely different regions and on completely different chromosomes. When I cross…

Continue Reading TxDB.Hsapiens.UCSC.hg38.knownGene with locateVariants() identifying SNPs from various chromosome being part of the same gene

KEGG T01001: 98

Entry 98                CDS       T01001                                  Symbol ACYP2, ACYM, ACYP Name (RefSeq) acylphosphatase 2   KO K01512   acylphosphatase [EC:3.6.1.7] Organism hsa  Homo sapiens (human) Pathway hsa00620   Pyruvate metabolism hsa01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:hsa00001] 09100 Metabolism  09101 Carbohydrate metabolism   00620 Pyruvate metabolism    98 (ACYP2)Enzymes [BR:hsa01000] 3. Hydrolases  3.6  Acting on acid anhydrides   3.6.1  In phosphorus-containing anhydrides    3.6.1.7  acylphosphatase     98 (ACYP2) BRITE hierarchy SSDB OrthologParalogGene clusterGFIT Motif…

Continue Reading KEGG T01001: 98

KEGG T01001: 6579

Entry 10599             CDS       T01001                                  Symbol SLCO1B1, HBLRR, LST-1, LST1, OATP-C, OATP1B1, OATP2, OATPC, SLC21A6 Name (RefSeq) solute carrier organic anion transporter family member 1B1   KO K05043   solute carrier organic anion transporter family, member 1B Organism hsa  Homo sapiens (human) Pathway hsa04976   Bile secretion Disease H00208   Hyperbilirubinemia H02057   Rotor syndrome Brite KEGG Orthology (KO)…

Continue Reading KEGG T01001: 6579

Genes in 10x don’t match genes in ENSBL

Genes in 10x don’t match genes in ENSBL 0 Hi everyone, I am trying to map my genes to chromosome location so I can remove low quality cells using high mitochondrial content. When mapping to the cromosoms I obtain the below error. gene_annot <- AnnotationDbi::select(ens.hs.107, keys = genes, keytype =…

Continue Reading Genes in 10x don’t match genes in ENSBL

scATAC annotation file for zebrafish

scATAC annotation file for zebrafish 0 Hi all, I started to analyze scATAC Seq data. I obtained the data from SRA. I have a trouble regarding gene annotation. I used Danio_rerio.GRCz11.109.gtf file to create a GRange object using AcidGenomics package. Here is the r script I used for that: DanioAnno…

Continue Reading scATAC annotation file for zebrafish

R removes 1st column (gene-id) from featureCounts count.txt table

R removes 1st column (gene-id) from featureCounts count.txt table 0 Hi all, I generated a count.txt for sorted.bam files using featureCounts on Linux following the RNA-SEQ data analysis steps. 1- Using txt.editor, I checked the count.text file and found the following columns; geneid Chr Start End Strand Length sample1 sample2…

Continue Reading R removes 1st column (gene-id) from featureCounts count.txt table

KEGG T01001: 3094

Entry 3094              CDS       T01001                                  Symbol HINT1, HINT, NMAN, PKCI-1, PRKCNH1 Name (RefSeq) histidine triad nucleotide binding protein 1   KO K02503   histidine triad (HIT) family protein Organism hsa  Homo sapiens (human) Disease H02390   Autosomal recessive neuromyotonia and axonal neuropathy Brite KEGG Orthology (KO) [BR:hsa00001] 09180 Brite Hierarchies  09183 Protein families: signaling and cellular processes   04147 Exosome [BR:hsa04147]    3094…

Continue Reading KEGG T01001: 3094

Are there any tools that can create a very basic GTF file from contig sequences (no annotations really needed) ?

If anyone still needs help with this, you can use a SAF file as an option with featureCounts. Here’s a script from my VEBA suite github.com/jolespin/veba/blob/main/src/scripts/fasta_to_saf.py Can easily adapt to not require soothsayer_utils below. #!/usr/bin/env python from __future__ import print_function, division import sys, os, argparse import pandas as pd from…

Continue Reading Are there any tools that can create a very basic GTF file from contig sequences (no annotations really needed) ?

KEGG T01001: 29085

Entry 29085             CDS       T01001                                  Symbol PHPT1, CGI-202, HEL-S-132P, HSPC141, PHP, PHP14 Name (RefSeq) phosphohistidine phosphatase 1   KO K01112   phosphohistidine phosphatase [EC:3.9.1.3] Organism hsa  Homo sapiens (human) Brite KEGG Orthology (KO) [BR:hsa00001] 09190 Not Included in Pathway or Brite  09191 Unclassified: metabolism   99980 Enzymes with EC numbers    29085 (PHPT1)Enzymes [BR:hsa01000] 3. Hydrolases  3.9  Acting on phosphorus-nitrogen bonds   3.9.1  Acting on phosphorus-nitrogen bonds (only…

Continue Reading KEGG T01001: 29085

Enrichment analysis based on kegg for zebrafish

Hello There! I am doing the enrichment analysis based on kegg. The analysis is based on zebrafish entrezid/ncbi-geneid Clusterprofiler seems to work for this example. data(geneList, package=”DOSE”) de <- names(geneList)[1:100] yy <- enrichKEGG(de, pvalueCutoff=0.01) head(yy) But when I tried my code, it does not work. I did it for my…

Continue Reading Enrichment analysis based on kegg for zebrafish

How to get gseKEGG() to accept an input gene list?

I’ve got a csv file with 2 columns – one of Entrez IDs and another of gene’s measurement/fold-change. I am running code trying to use gseKEGG(), getting the gene list prepared for that function like this: d <-fread(“file.csv”) geneList <- d[,2] names(geneList) <- as.character(d[,1]) geneList <- sort(geneList, decreasing = TRUE)…

Continue Reading How to get gseKEGG() to accept an input gene list?

Journey from gene id to gene sequence

Journey from gene id to gene sequence 2 Can you tell me how to download gene sequences with 2500 gene ids? NCBI id Gene • 56 views Hi Shweta,If you are referring to NCBI Gene IDs, you can use NCBI Datasets for that task. To download only gene sequences, you…

Continue Reading Journey from gene id to gene sequence

Answer: using Firebrowser to identify disease type

The solution to this is within the `Samples.mRNASeq` that gives data which can be saved in JSON format: [0] { cohort “ACC”, expression_log2 3.635731, gene “CD274”, geneID 29126, protocol “RSEM”, sample_type “TP”, tcga_participant_barcode “TCGA-PK-A5HB”, z-score -0.01802174 }, [1] { cohort “ACC”, expression_log2 2.725785, gene “CD274”, geneID 29126, protocol “RSEM”, sample_type…

Continue Reading Answer: using Firebrowser to identify disease type

KEGG T01001: 54499

Entry 54499             CDS       T01001                                  Symbol TMCO1, CFSMR1, HP10122, PCIA3, PNAS-136, TMCC4 Name (RefSeq) transmembrane and coiled-coil domains 1   KO K21891   calcium load-activated calcium channel Organism hsa  Homo sapiens (human) Disease H02415   Craniofacial dysmorphism, skeletal anomalies, and mental retardation syndrome Brite KEGG Orthology (KO) [BR:hsa00001] 09180 Brite Hierarchies  09183 Protein families: signaling and cellular processes   02000 Transporters…

Continue Reading KEGG T01001: 54499

failed to find the gene identifier attribute in the 9th column of the provided GTF file.

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. 3 Hi, I am trying to use featureCounts to analyse my RNA-seq data with Apis mellifera. My Code and error are as follows. r /softwares/subread-2.0.0-source/bin/featureCounts -T 16 -p -s 1 -a /home/axel/arumoyc/alignment/GCF_003254395.2_Amel_HAv3.1_genomic.gtf -t…

Continue Reading failed to find the gene identifier attribute in the 9th column of the provided GTF file.

Get TPM from RNA counts and gene length?

Get TPM from RNA counts and gene length? 1 Hello, I am working with an RNA-seq FeatureCounts output file that supplies the counts for a given ENSG gene ID, as well as the gene length(according to documentation this is in base pairs, not kilobases). Is there a way to obtain…

Continue Reading Get TPM from RNA counts and gene length?

KEGG T01001: 63826

Entry 63826             CDS       T01001                                  Symbol SRR, ILV1, ISO1 Name (RefSeq) serine racemase   KO K12235   serine racemase [EC:5.1.1.18] Organism hsa  Homo sapiens (human) Pathway hsa00260   Glycine, serine and threonine metabolism hsa00470   D-Amino acid metabolism hsa01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:hsa00001] 09100 Metabolism  09105 Amino acid metabolism   00260 Glycine, serine and threonine metabolism    63826 (SRR)  09106 Metabolism of…

Continue Reading KEGG T01001: 63826

Third quartile normalized logFC data to find differentially express gene using limma

Third quartile normalized logFC data to find differentially express gene using limma 0 I have normalized count matrix which is normalized using conditional quantile normalization and having negative value, I understand that these are normalized logFC values. When I am directly using into limma with following command. It is showing…

Continue Reading Third quartile normalized logFC data to find differentially express gene using limma

KEGG T01001: 151176

Entry 151176            CDS       T01001                                  Symbol ERFE, C1QTNF15, CTRP15, FAM132B Name (RefSeq) erythroferrone   KO K24381   erythroferrone Organism hsa  Homo sapiens (human) Brite KEGG Orthology (KO) [BR:hsa00001] 09180 Brite Hierarchies  09183 Protein families: signaling and cellular processes   04990 Domain-containing proteins not elsewhere classified [BR:hsa04990]    151176 (ERFE)Domain-containing proteins not elsewhere classified [BR:hsa04990] C1q domain-containing proteins  CBLN / gliacolin group proteins   151176 (ERFE) BRITE…

Continue Reading KEGG T01001: 151176

Design matrix in limma

Design matrix in limma 0 I have quantile normalized counts, with negative values: The data sets is: GeneID 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 1/2-SBSRNA4 1.84545405543259 0.6665398175808 1.59873554207786 1.89298926465623 1.26568208427265 1.88410700890907 1.74378606410793 1.48987360722618A1BG 1.6165355686345 2.68681326811308 1.50663367983524 2.25377918290859 2.67375515222443 2.37256130394363 2.98553798816952…

Continue Reading Design matrix in limma

KEGG T01002: 16333

Entry 16333             CDS       T01002                                  Symbol Ins1, Ins-1, Ins2-rs1 Name (RefSeq) insulin I   KO K04526   insulin Organism mmu  Mus musculus (house mouse) Pathway mmu04010   MAPK signaling pathway mmu04014   Ras signaling pathway mmu04015   Rap1 signaling pathway mmu04022   cGMP-PKG signaling pathway mmu04066   HIF-1 signaling pathway mmu04068   FoxO signaling pathway mmu04072   Phospholipase D signaling pathway mmu04114   Oocyte…

Continue Reading KEGG T01002: 16333

KEGG T01028: 702118

Entry 702118            CDS       T01028                                  Symbol OR4M1 Name (RefSeq) olfactory receptor 4M1   KO K04257   olfactory receptor Organism mcc  Macaca mulatta (rhesus monkey) Pathway mcc04740   Olfactory transduction Brite KEGG Orthology (KO) [BR:mcc00001] 09150 Organismal Systems  09157 Sensory system   04740 Olfactory transduction    702118 (OR4M1) 09180 Brite Hierarchies  09183 Protein families: signaling and cellular processes   04030 G protein-coupled receptors [BR:mcc04030]    702118 (OR4M1)G protein-coupled receptors [BR:mcc04030] Others  Chemoreception   Olfactory    702118…

Continue Reading KEGG T01028: 702118

KEGG T08233: 119020260

Entry 119020260         CDS       T08233                                  Name (RefSeq) beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase-like   KO K00727   beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase [EC:2.4.1.102] Organism alat  Acanthopagrus latus (yellowfin seabream) Pathway alat00512   Mucin type O-glycan biosynthesis alat01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:alat00001] 09100 Metabolism  09107 Glycan biosynthesis and metabolism   00512 Mucin type O-glycan biosynthesis    119020260 09180 Brite Hierarchies  09181 Protein families: metabolism   01003 Glycosyltransferases [BR:alat01003]    119020260Enzymes [BR:alat01000] 2. Transferases  2.4  Glycosyltransferases   2.4.1  Hexosyltransferases    2.4.1.102  beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase     119020260Glycosyltransferases [BR:alat01003] Glycan…

Continue Reading KEGG T08233: 119020260

How does enrichGO function calculated p-value?

Hi, I’m doing an Over Representation Analysis using the clusterProfiler package. When I used the enrichGO function, I obtained a dataframe with the following columns: _ONTOLOGY: BP (in my case) _ID: GO ID. _Description: Description of the Biological Process. _GeneRatio: ratio of input genes that are annotated in a term….

Continue Reading How does enrichGO function calculated p-value?

KEGG T01088: 103635163

Entry 103635163         CDS       T01088                                  Name (RefSeq) hydroxyethylthiazole kinase   KO K00878   hydroxyethylthiazole kinase [EC:2.7.1.50] Organism zma  Zea mays (maize) Pathway zma00730   Thiamine metabolism zma00740   Riboflavin metabolism zma01100   Metabolic pathways zma01240   Biosynthesis of cofactors Module zma_M00899   Thiamine salvage pathway, HMP/HET => TMP Brite KEGG Orthology (KO) [BR:zma00001] 09100 Metabolism  09108 Metabolism of cofactors and vitamins   00730 Thiamine metabolism    103635163   00740…

Continue Reading KEGG T01088: 103635163

gff file from NCBI RefSeq GCF dataset has an invalid format

Thank you for noticing this. It is indeed an issue in the GFF3 file. The root of the problem is it’s a gene that is impossible to correctly represent in GFF3 because it incorporates sequence from both strands via trans_splicing. The complexity of this gene can be seen on the…

Continue Reading gff file from NCBI RefSeq GCF dataset has an invalid format

A chromosome-level genome assembly of Plantago ovata

Genome assembly and chromosome identification A Plantago ovata genome reference was generated by utilizing a total of 5.98 M (7 cells, 40.21 Gb, N50 = 10.45 Kb, 50 bp–121.17 Kb) PacBio long reads and 636.5 million (47.74 Gb) Hi-C short-reads. PacBio reads were used to assemble contigs, while Hi-C reads were used to achieve chromosome-level assembly. The final…

Continue Reading A chromosome-level genome assembly of Plantago ovata

“Error parsing strand (?) from GFF line” happenning in gffread, stringtie and cufflinks

“Error parsing strand (?) from GFF line” happenning in gffread, stringtie and cufflinks 0 Hi! I’m working with various genomic data and while trying to use gffread, stringtie and cufflinks I went through the same error: Error parsing strand (?) from GFF line: NC_037304.1 RefSeq gene 58315 59481 . ?…

Continue Reading “Error parsing strand (?) from GFF line” happenning in gffread, stringtie and cufflinks

KEGG T01001: 5742

Entry 5743              CDS       T01001                                  Symbol PTGS2, COX-2, COX2, GRIPGHS, PGG/HS, PGHS-2, PHS-2, hCox-2 Name (RefSeq) prostaglandin-endoperoxide synthase 2   KO K11987   prostaglandin-endoperoxide synthase 2 [EC:1.14.99.1] Organism hsa  Homo sapiens (human) Pathway hsa00590   Arachidonic acid metabolism hsa01100   Metabolic pathways hsa04064   NF-kappa B signaling pathway hsa04370   VEGF signaling pathway hsa04625   C-type lectin receptor signaling pathway hsa04657  …

Continue Reading KEGG T01001: 5742

Having a lot of trouble converting Gene Ranges to GeneID.

Having a lot of trouble converting Gene Ranges to GeneID. 0 I’m having trouble converting gene ranges to gene ids for mm10. For example I have a dataframe of “chromosome”, “start”, “end”, and I want the associated “GENE SYMBOL” for each row. I was looking online, which brought me to…

Continue Reading Having a lot of trouble converting Gene Ranges to GeneID.

Differential gene expression analysis with no replicates using edgeR

Dear all, I have an experimental design where I have only one sample in each condition (2 conditions in total) and want to do differential gene expression analysis using edgeR. This is the script I want to use for the analysis and it runs without any errors – with this…

Continue Reading Differential gene expression analysis with no replicates using edgeR

Differentially expression analysis of orthologous genes between two species

Differentially expression analysis of orthologous genes between two species 1 @4dbfec5b Last seen 10 hours ago Netherlands Hi people, I want to use DESeq2 for differentially expression analysis of orthologous genes between two different species. I am not experienced at all using R and DESeq2, but I think at the…

Continue Reading Differentially expression analysis of orthologous genes between two species

KEGG T00005: YNL036W

Entry YNL036W           CDS       T00005                                  Symbol NCE103, NCE3 Name (RefSeq) carbonate dehydratase NCE103   KO K01673   carbonic anhydrase [EC:4.2.1.1] Organism sce  Saccharomyces cerevisiae (budding yeast) Pathway sce00910   Nitrogen metabolism sce01100   Metabolic pathways Brite KEGG Orthology (KO) [BR:sce00001] 09100 Metabolism  09102 Energy metabolism   00910 Nitrogen metabolism    YNL036W (NCE103)Enzymes [BR:sce01000] 4. Lyases  4.2  Carbon-oxygen lyases   4.2.1  Hydro-lyases    4.2.1.1  carbonic anhydrase     YNL036W (NCE103) BRITE hierarchy SSDB OrthologParalogGene clusterGFIT Motif Pfam: …

Continue Reading KEGG T00005: YNL036W

UCSC Genome Browser | Encyclopedia MDPI

1. History Initially built and still managed by Jim Kent, then a graduate student, and David Haussler, professor of Computer Science (now Biomolecular Engineering) at the University of California, Santa Cruz in 2000, the UCSC Genome Browser began as a resource for the distribution of the initial fruits of the…

Continue Reading UCSC Genome Browser | Encyclopedia MDPI

[SOLVED] Special .bed to .fa conversion (GenomicCoordinates/DNAsequence) ~ Linux Fixes

My aim is to create a custom protein sequence reference file (protein.fa) from genomic coordinates (origin.bed). (origin.bed; with Chromosome, start, end, TranscriptID, strand, GeneID) chr1 109202569 109202584 ENST00000370031.1_uORF_0 – ENSG00000162639.11 chr1 109203584 109203617 ENST00000370031.1_uORF_0 – ENSG00000162639.11 chr11 102188276 102188302 ENST00000263464.3_uORF_0 + ENSG00000023445.9 chr11 10830291 10830306 ENST00000530211.1_uORF_1 – ENSG00000110321.11 chr11 10830400…

Continue Reading [SOLVED] Special .bed to .fa conversion (GenomicCoordinates/DNAsequence) ~ Linux Fixes

KEGG T01001: 4171

Entry 4171              CDS       T01001                                  Symbol MCM2, BM28, CCNL1, CDCL1, D3S3194, DFNA70, MITOTIN, cdc19 Name (RefSeq) minichromosome maintenance complex component 2   KO K02540   DNA replication licensing factor MCM2 [EC:5.6.2.3] Organism hsa  Homo sapiens (human) Pathway hsa03030   DNA replication hsa04110   Cell cycle Disease H00604   Deafness, autosomal dominant Brite KEGG Orthology (KO) [BR:hsa00001] 09120 Genetic Information Processing  09124…

Continue Reading KEGG T01001: 4171

KEGG T02666: 101290786

Entry 101298727         CDS       T02666                                  Name (RefSeq) triacylglycerol lipase SDP1   KO K14674   TAG lipase / steryl ester hydrolase / phospholipase A2 / LPA acyltransferase [EC:3.1.1.3 3.1.1.13 3.1.1.4 2.3.1.51] Organism fve  Fragaria vesca (woodland strawberry) Pathway fve00100   Steroid biosynthesis fve00561   Glycerolipid metabolism fve00564   Glycerophospholipid metabolism fve00565   Ether lipid metabolism fve00590   Arachidonic acid metabolism fve00591   Linoleic…

Continue Reading KEGG T02666: 101290786

How can I convert Ensembl ID to gene symbol in R?

I tried several R packages (mygene, org.Hs.eg.db, biomaRt, EnsDb.Hsapiens.v79) to convert Ensembl.gene to gene.symbol, and found that the EnsDb.Hsapiens.v79 package / gene database provides the best conversion quality (in terms of being able to convert most of Ensembl.gene to gene.symbol). Install the package if you have not installed by running…

Continue Reading How can I convert Ensembl ID to gene symbol in R?