Category: HTseq

When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

Hi all, I have raw counts of samples in a dataframe. The row names is Ensembl ID and I want to convert them to a gene symbol. So I’ve run the code below. query <- GDCquery(project = “TCGA-COAD” , data.category = “Transcriptome Profiling” , data.type = “Gene Expression Quantification”, workflow.type…

Continue Reading When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

Hi all, I have raw counts of samples in a dataframe. The row names is Ensembl ID and I want to convert them to a gene symbol. So I’ve run the code below. query <- GDCquery(project = “TCGA-COAD” , data.category = “Transcriptome Profiling” , data.type = “Gene Expression Quantification”, workflow.type…

Continue Reading When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

Bioinformatics Scientist – Job at DAWSON in Bethesda, MD

Bioinformatics Scientist Full Time Prof-Entry Bethesda, MD, US DAWSON is a Native Hawaiian Organization 8(a) small business that brings the Spirit of Aloha to our employees. As part of the DAWSON “Ohana”, you will be provided a best-in-class benefits program that strives to ensure our great people have peace of…

Continue Reading Bioinformatics Scientist – Job at DAWSON in Bethesda, MD

Finding counts of lncRNAs with htseq-count /featurecounts

Finding counts of lncRNAs with htseq-count /featurecounts 0 Hi, I’m trying to find the counts of novel and known lncRNA transcripts in humans and I have a GTF file already of these transcripts. However, I’m unsure about the following: should the input GTF file for HTSeq count or featurecounts be…

Continue Reading Finding counts of lncRNAs with htseq-count /featurecounts

How to convert HTSeq raw read counts to FPKMs?

How to convert HTSeq raw read counts to FPKMs? 0 Hi, I have a C.elegans RNAseq raw read counts which I generated from HTSeq. I want to convert them to FPKM values. I used “countToFPKM” to do that, but I am not able to get “Biomart.annotations.hg38.txt” file for C.elegans. Is…

Continue Reading How to convert HTSeq raw read counts to FPKMs?

Converting Ensembl gene id to Gene symbol

Converting Ensembl gene id to Gene symbol 0 Hi all, As mentioned earlier in this post, I tried to convert the Ensembl gene ids to the Gene symbol. I didn’t receive any error by the code below but the nrow of ens_to_symbol_biomart is 55605 and the length of ens is…

Continue Reading Converting Ensembl gene id to Gene symbol

A Pipeline for Analyzing eCLIP and iCLIP Data with Htseq-clip and DEWSeq

doi: 10.1007/978-1-0716-1851-6_10. Affiliations Expand Affiliations 1 European Molecular Biology Laboratory (EMBL), Heidelberg, Germany. 2 European Molecular Biology Laboratory (EMBL), Heidelberg, Germany. schwarzl@embl.de. Item in Clipboard Sudeep Sahadevan et al. Methods Mol Biol. 2022. Show details Display options Display options Format AbstractPubMedPMID doi: 10.1007/978-1-0716-1851-6_10. Affiliations 1 European Molecular Biology Laboratory (EMBL), Heidelberg,…

Continue Reading A Pipeline for Analyzing eCLIP and iCLIP Data with Htseq-clip and DEWSeq

Box plot for rna seq data

Box plot for rna seq data 1 Hi friends I plotted this box-wisker for TCGA HTSeq data in R. I want to have harf of them as red and half as blue (control vs treatment groups). or is there any better way for boxplot? How can I do that? I…

Continue Reading Box plot for rna seq data

Low assigned alignments

Low assigned alignments 0 Basecalls performed using CASAVA version v1.8.2 Trimmed reads with fastx_quality_trimmer 0.0.13 with a quality treshhold of 18 and a length of 20 Aligned with Bowtie 2.1.0 and Tophat 2.0.10 using Gencode v19 junctions Samtools 0.1.19-44428cd to make a bam, sort, index Raw counts were generated using…

Continue Reading Low assigned alignments

Bioinformatics Scientist with Security Clearance job in Bethesda at Dawson

Company Description Dawson is a Staffing & Recruiting agency that was founded in 1946 and headquartered in Columbus, OH. They have the vision to help the small as well as large corporations and businesses to recruit the talent that can help them to provide the best customer service with exceptional…

Continue Reading Bioinformatics Scientist with Security Clearance job in Bethesda at Dawson

H/ACA snoRNP gene family as diagnostic/prognostic biomarkers

Introduction Primary liver cancer, including hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma, is the sixth most commonly diagnosed cancer and the fourth leading cause of cancer-related deaths worldwide.1 High metastasis and recurrence rates, as well as limited treatment options, lead to the poor prognosis of advanced HCC.2 Among patients diagnosed with…

Continue Reading H/ACA snoRNP gene family as diagnostic/prognostic biomarkers

DEXSeq prepare annotation script throws “object has no attribute ‘next'” for Ensemble GTFs

DEXSeq prepare annotation script throws “object has no attribute ‘next’” for Ensemble GTFs 0 @24764cda Last seen 23 hours ago United States Hi there, I am trying to run the dexseq_prepare_annotation.py script and the code keeps failing after parsing the first line of the gtf. Specifically, the code is failing…

Continue Reading DEXSeq prepare annotation script throws “object has no attribute ‘next'” for Ensemble GTFs

Is it advisable to input a count matrix that consists of reads aligned using different algorithms (HT-Seq and Salmon)?

Hello! First of all, thank you for the great package and the excellent documentation that supports it, much appreciated! Sadly, I could not find an answer to my problem, so I wanted to ask here. I have two different bulk RNA-seq datasets, one obtained from TCGA using the TCGAbiolinks package,…

Continue Reading Is it advisable to input a count matrix that consists of reads aligned using different algorithms (HT-Seq and Salmon)?

Extracting exon level read coverage of a specific gene

HTSeq – Extracting exon level read coverage of a specific gene 1 Dear all, I am trying to quantify RNASeq reads at the “exon level” using HTSeq. To achieve a quantitative exon comparison. I am using ENCODE mouse data which is Illumina reads alligned to GENCODE M27 (GRCm39) using STAR…

Continue Reading Extracting exon level read coverage of a specific gene

All samples have 0 counts for all genes. check the counting script

DESeq2: All samples have 0 counts for all genes. check the counting script 1 @2f3f6904 Last seen 1 hour ago United Kingdom I am having problems importing my HTSeq count data- it tells me the counts are zero when this is clearly not the case when head outputs: >head(WTCHG_862660_71955267) GeneID…

Continue Reading All samples have 0 counts for all genes. check the counting script

Convert HTSeq-count, raw count to TPM : bioinformatics

Hi Everyone, I am working with a publicly available RNA-Seq dataset for which only the HTSeq-count data is accessible. I have done differential gene expression already (i.e. between sample analysis) however I am also hoping to obtain TPM count for within-sample analysis such as single-sample GSEA and for this I…

Continue Reading Convert HTSeq-count, raw count to TPM : bioinformatics

Using HTSeq-count for paired-end data but unsorted by SAMTOOLS

Using HTSeq-count for paired-end data but unsorted by SAMTOOLS 1 Hi there, as per thread title. If I am using HTSeq-count on paired-end mapped BAM files, but they are unsorted, and I use -s yes on the default option, is it advisable? htseq ngs • 99 views Paired-end .bam need…

Continue Reading Using HTSeq-count for paired-end data but unsorted by SAMTOOLS

How can I convert ensembl exon ID to gene symbol in a gene count dataframe?

How can I convert ensembl exon ID to gene symbol in a gene count dataframe? 1 I have an HTseq count file with a row containing exon ids and columns are exon counts. I need to convert them into gene IDs, given the fact that multiple exon ids may be…

Continue Reading How can I convert ensembl exon ID to gene symbol in a gene count dataframe?

How can I convert ensemble exon ID to gene symbol in a gene count dataframe?

How can I convert ensemble exon ID to gene symbol in a gene count dataframe? 0 I have an HTseq count file with a row containing exon ids and columns are exon counts. I need to convert them into gene IDs, given the fact that multiple exon ids may be…

Continue Reading How can I convert ensemble exon ID to gene symbol in a gene count dataframe?

bioinformatics-ca/HTG_2021: CBW’s High Throughput Genomics Analysis 2021

GitHub – bioinformatics-ca/HTG_2021: CBW’s High Throughput Genomics Analysis 2021 Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time workshop content for HTseq 2021 About CBW’s High Throughput Genomics Analysis 2021 Resources You can’t perform that action at this time. You signed in with another…

Continue Reading bioinformatics-ca/HTG_2021: CBW’s High Throughput Genomics Analysis 2021

Bioconductor – chipseq

    This package is for version 3.4 of Bioconductor; for the stable, up-to-date release version, see chipseq. chipseq: A package for analyzing chipseq data Bioconductor version: 3.4 Tools for helping process short read data for chipseq experiments Author: Deepayan Sarkar, Robert Gentleman, Michael Lawrence, Zizhen Yao Maintainer: Bioconductor Package…

Continue Reading Bioconductor – chipseq

Correct way to make multiple comparisons on DESeq2?

I have a project where I have done RNA-seq (paired-end sequencing on Illumina HiSeq) of a worm at different days of development i.e. Ages 0-12. For each age, I have sequenced 3 replicate specimens. I’m new to DESeq2 and I was wondering if what I did below is correct. library(DESeq2)…

Continue Reading Correct way to make multiple comparisons on DESeq2?

ZhaozzReal/SNV_IPA: Detect SNV-associated intronic polyadenylation events from standard RNAseq data

Description Somatic single nucleotide variants (SNVs) in cancer genome affect gene expression through various mechanisms depending on their genomic location. In this study, we found that somatic SNVs near splice site are associated with abnormal intronic polyadenylation (IPA) . Here we give examples to show how to detect SNV-associated IPA…

Continue Reading ZhaozzReal/SNV_IPA: Detect SNV-associated intronic polyadenylation events from standard RNAseq data

Running htseq-count to “grab” long non coding gene_id names

Running htseq-count to “grab” long non coding gene_id names 0 hi all, new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, “lnc_RNA” and “lncRNA”,…

Continue Reading Running htseq-count to “grab” long non coding gene_id names

gffread error

hello I am currently trying to do RNA-seq using public data in brassica juncea. To use htseq-count for making count table, I have to convert gff file which downloaded in brassica database to gtf file. So I used gffread for converting gff file with below command gffread Bju.genome.gff -T -o…

Continue Reading gffread error

HTseq doesn’t support Multi-Threading ?

HTseq doesn’t support Multi-Threading ? 1 Hello, everyone ! I’m looking for a way to use HTseq with multi-thread. I couldn’t find any options about multi-thread. Anybody knows how to, please ? (I know there are tools support multi-thread like STAR, HISAT2. but just wonder whether HTseq doesn’t support it.)…

Continue Reading HTseq doesn’t support Multi-Threading ?

Fastqc user manual – vodosp.ru

FASTQ format – Wikipedia 06 September 2021 – by TC Collin · 2020 · Cited by 3 — Be accompanied by a step-by-step user-friendly manual, If the user performs FastQC prior to the removal of adapters (step 3), the length Both programs can be used on Linux/MacOS X machines and quite…

Continue Reading Fastqc user manual – vodosp.ru

ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat

    Significance To date, the potential of utilizing root traits in plant breeding remains largely untapped. In this study, we cloned and characterized the ENHANCED GRAVITROPISM2 (EGT2) gene of barley that encodes a STERILE ALPHA MOTIF domain–containing protein. We demonstrated that EGT2 is a key gene of root growth…

Continue Reading ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat

which normalization before differential expression analysis (legacy=TRUE vs. legacy=FALSE)

TCGAbiolinks: which normalization before differential expression analysis (legacy=TRUE vs. legacy=FALSE) 1 Dear All, I am following the TCGAbiolinks tutorial for conducting differential expression analysis on TCGA data (“TCGAanalyze: Analyze data from TCGA” section). I have 2 questions about it. 1) I don’t understand the following: when dealing with legacy=TRUE data…

Continue Reading which normalization before differential expression analysis (legacy=TRUE vs. legacy=FALSE)

r – How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

I’m new to R and DESeq2 and I’m trying to run differential expression as below library(DESeq2) count_file_names <- grep(“counts”,list.files(“HTSeq_counts”),value=T) host_type < c(“Damaged”,”Control”) sample_information <-data.frame(sampleName = count_file_names, fileName = count_file_names, condition = host_type) DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = “HTSeq_counts”, design = ~condition) colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c(‘Damaged’,’Control’)) rld <-…

Continue Reading r – How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

I’m new to R and DESeq2 and I’m trying to run differential expression as below library(DESeq2) count_file_names <- grep(“counts”,list.files(“HTSeq_counts”),value=T) host_type < c(“Damaged”,”Control”) sample_information <-data.frame(sampleName = count_file_names, fileName = count_file_names, condition = host_type) DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = “HTSeq_counts”, design = ~condition) colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c(‘Damaged’,’Control’)) rld <-…

Continue Reading How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

htseq-counts output merge into one matrix ??

htseq-counts output merge into one matrix ?? 8 Dear all, I just need a little help to merge my all features counts into one matrix. I have counted features using htseq-counts and now want to merge into one file like…. ID     c1 c2 c3…………..t1 t2 t2…….etc   The problem is…

Continue Reading htseq-counts output merge into one matrix ??

How can I use R to do many genes survival analysis at the same time?

How can I use R to do many genes survival analysis at the same time? 1 I plan to use the DESeq2::rlog transformed TCGA HTseq_counts data and the TCGA clinical data to do survival analysis. But I am confused how to do many genes(>10000) survival analysis at the same time….

Continue Reading How can I use R to do many genes survival analysis at the same time?

Differential Gene Expression

Can you analyze in GEO2R? => No, because this is RNA-seq and not microarrays. You are lucky thought that the authors seem to provide raw counts so you can easily fede them into DESeq2. Here is a code suggestion, for details please read the DESeq2 vignette extensively, it contains answers…

Continue Reading Differential Gene Expression

does not contain a ‘gene’ attribute

htseq-count returns : does not contain a ‘gene’ attribute 1 Dear BIOSTAR community, I’m trying to make count matrix with htseq-count, htseq-count -s yes -t gene -i gene 01.sorted.sam annotation_cattle.gff > 01.txt even with –idattr=gene , it returns error: Error processing GFF file (line 1864255 of file annotation_cattle.gff): Feature gene-D1Y31_gp1…

Continue Reading does not contain a ‘gene’ attribute

Mapping reads and quantifying genes

Mapping reads and quantifying genes – Metagenomic workshop 0 Hello, I am using the following metagenomic workshop tutorial to analyse my own metagenomic data. metagenomics-workshop.readthedocs.io/en/latest/annotation/quantification.html I performed the following steps: mapped reads with bowtie2 and generated .bam file with samtools sort. Removed duplicates with picard Extracted gene information from prokka…

Continue Reading Mapping reads and quantifying genes

Gdcprepare() error.

I’m really struggling with this and I need urgent help. I keep running the following code but after the gdcprepare function, it either crashes my computer or freezes the console. I have no idea what to do, someone please help. library(“TCGAbiolinks”, quietly = T) library(“limma”, quietly = T) library(“edgeR”, quietly…

Continue Reading Gdcprepare() error.

Error creating DESeq2 Data Set from HTSeq-Count

I am trying to run DESeq2 using gene counts generated by HTSeq-Count. I combine files for different conditions: directory <- “~/GeneCountFiles/” WT_Files <- c( “P0CTRS3.aligned.sam.genecount”, “P0CTRS4.aligned.sam.genecount”, “P0CTRS5.aligned.sam.genecount” ) KO_Files <- c( “P0CTRS1.aligned.sam.genecount”, “P0CTRS2.aligned.sam.genecount”, “P0CTRS6.aligned.sam.genecount” ) I then create the sample table: sampleTable <- data.frame( sampleName=c(WT_Files, KO_Files), fileName=c(WT_Files, KO_Files), genotype=c(rep(“WT”, length(WT_Files)),…

Continue Reading Error creating DESeq2 Data Set from HTSeq-Count

combining quantification (featureCounts) result files into a single dataset

Below is the function I tend to use to read in multiple featureCounts outputs (one per sample): DESeqDataSetFromFeatureCounts <- function (sampleTable, directory = “.”, design, ignoreRank = FALSE, …) { if (missing(design)) stop(“design is missing”) l <- lapply(as.character(sampleTable[, 2]), function(fn) read.table(file.path(directory, fn), skip=2)) if (!all(sapply(l, function(a) all(a$V1 == l[[1]]$V1)))) stop(“Gene…

Continue Reading combining quantification (featureCounts) result files into a single dataset

how htseq-count counts unstranded RNA-seq data

how htseq-count counts unstranded RNA-seq data 1 preliminary Say I have some unstranded RNA-seq data and im mapping to the reference human genome using htseq-count (–stranded=no) My understanding (biologically) was that for a given protein_coding gene, reading DNA in the sense strand gives the protein_coding transcript, reading the gene in…

Continue Reading how htseq-count counts unstranded RNA-seq data

HTSeq-count TruSeq RNA Exome Lib Prep

HTSeq-count TruSeq RNA Exome Lib Prep 0 Hello, I observed a high percentage of “no features” while running HTseq w/ the –stranded yes option enabled (>80%). The library prep kit I am using is Illumina TruSeq RNA Exome which generates stranded data. If I run HTseq-count w/ strand == “no”…

Continue Reading HTSeq-count TruSeq RNA Exome Lib Prep

Differential expression analysis of TCGA data based on tumor staging

Hi everyone I wanted to analyze TCGA-BRCA data for identifying DEGs in different TNM stages (I to IV) between Normal and Tumor. How to change the following code to get the DEGs based on the staging? CancerProject <- “TCGA-BRCA” DataDirectory <- paste0(“../GDC/”,gsub(“-“,”_”,CancerProject)) FileNameData <- paste0(DataDirectory, “_”,”HTSeq_Counts”,”.rda”) query <- GDCquery(project =…

Continue Reading Differential expression analysis of TCGA data based on tumor staging

hisat2 compatibility for long read

hisat2 compatibility for long read 0 Hi, I am trying to align PacBio transcriptome reads against the genome to count the gene number. For pair end read i used the following workflow: # convert gff to gtf /home/software/cufflinks-2.2.1/gffread xxx.gff -T -o xxx.gtf # build index /home/software/hisat2-2.2.1/hisat2_extract_exons.py xxx.gtf > xxx.exon /home/software/hisat2-2.2.1/hisat2_extract_splice_sites.py…

Continue Reading hisat2 compatibility for long read