Tag: HTseq

can gff2 reference used in htseq-count?

Dear all We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples. The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly. However, we were unable to obtain a gff3 version of our target plasmid genome, the…

Continue Reading can gff2 reference used in htseq-count?

Multiplexed genome regulation in vivo with hyper-efficient Cas12a

. 2022 Apr;24(4):590-600. doi: 10.1038/s41556-022-00870-7. Epub 2022 Apr 12. Lucie Y Guo #  1   2 , Jing Bian #  3 , Alexander E Davis  4 , Pingting Liu  4 , Hannah R Kempton  3 , Xiaowei Zhang  3 , Augustine Chemparathy  3 , Baokun Gu  3 , Xueqiu Lin  3 , Draven A Rane  3 , Xiaoshu Xu  3 , Ryan M…

Continue Reading Multiplexed genome regulation in vivo with hyper-efficient Cas12a

HTseq-Count: Long processing time

HTseq-Count: Long processing time 1 Hi everyone, I’m processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks. htseq-count –max-reads-in-buffer=24000000000…

Continue Reading HTseq-Count: Long processing time

Can I convert HTSeq count into RPKM or TPM value or standard unit of RNA-Seq

Can I convert HTSeq count into RPKM or TPM value or standard unit of RNA-Seq 0 Now, I’m comparing RNA expressions that have RNA-Seq and HTSeq count How can I interpret it together with different unit or Can I convert HTSeq count equivalent RNA-Seq? or if you have other suggestions,…

Continue Reading Can I convert HTSeq count into RPKM or TPM value or standard unit of RNA-Seq

HTSeq Counts no longer available

HTSeq Counts no longer available 1 @vm-21340 Last seen 8 hours ago Brazil I’m working with breast cancer expression data from the TCGA-BRCA project. All my scripts were written to retrieve HTSeq counts from GDC, but they seem to have been removed from the GDC Data Portal. When using GDCquery,…

Continue Reading HTSeq Counts no longer available

A comparison of transcriptome analysis methods with reference genome

Background: The application of RNA-seq technology has become more extensive and the number of analysis procedures available has increased over the past years. Selecting an appropriate workflow has become an important issue for researchers in the field. Methods: In our study, six popular analytical procedures/pipeline were compared using four RNA-seq…

Continue Reading A comparison of transcriptome analysis methods with reference genome

htseq-count error

htseq-count error 1 Hi, htseq-count -f bam -s yes ~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam ~refgen/gencode.v39.primary_assembly.annotation.gtf > counts.txt I am trying to run htseq-count with command above but in the err file [E::idx_find_and_load] Could not retrieve index file for ‘~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam’ 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines…

Continue Reading htseq-count error

Feature count is very low using htseq-count

Feature count is very low using htseq-count 0 Hello all, I performed bbmap on my RNA-seq paired sequence data using following cmd bbmap.sh in1=J2_R1.fastq in2=J2_R2.fastq out=output_J2.sam ref=im4.fasta nodisk The header of generated sam file is @HD VN:1.4 SO:unsorted @SQ SN:k141_1006 LN:2503 @SQ SN:k141_5512 LN:5393 @SQ SN:k141_4772 LN:4387 @SQ SN:k141_3267 LN:4531…

Continue Reading Feature count is very low using htseq-count

Htseq is giving me 0 counts using the GFF3 of miRBase

Hello! I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0…

Continue Reading Htseq is giving me 0 counts using the GFF3 of miRBase

RNA Seq HTSeq download GDC portal

RNA Seq HTSeq download GDC portal 0 Hi friends, I am trying to download ht-seq file form GDC portal but it gives me one file for each patinets. Can you please let me know how to download one file including all patients together for all 60000 genes? Is there any…

Continue Reading RNA Seq HTSeq download GDC portal

use tcgabiolinks package to download TCGA data

TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…

Continue Reading use tcgabiolinks package to download TCGA data

python – Packages Not Found Error: Not available from current channel- Bioconda

Using a Mac with M1 chip, I’m trying to install the following Bioconda packages: cutadapttrim-galoresamtoolsbedtools.htseq.bowtie2.deeptools.macs2 I’ve been able to install picard and fastqc with no issues, but all others turn out one of two error messages: PackagesNotFoundError: The following packages are not available from current channels: or Found conflicts! Looking…

Continue Reading python – Packages Not Found Error: Not available from current channel- Bioconda

RNA-Seq HTseq galaxy DE analysis

RNA-Seq HTseq galaxy DE analysis 0 Hi friends I have htseq data from TCGA. it contains patients name in first row and genes in first column : 200 columns and 20000 rows. I dont want deseq2 in R. this needs to be done in galaxy. my question is how to…

Continue Reading RNA-Seq HTseq galaxy DE analysis

The role of ATXR6 expression in modulating genome stability and transposable element repression in Arabidopsis

Significance The plant-specific H3K27me1 methyltransferases ATXR5 and ATXR6 play integral roles connecting epigenetic silencing with genomic stability. However, how H3K27me1 relates to these processes is poorly understood. In this study, we performed a comprehensive transcriptome analysis of tissue- and ploidy-specific expression in a hypomorphic atxr5/6 mutant and revealed that the…

Continue Reading The role of ATXR6 expression in modulating genome stability and transposable element repression in Arabidopsis

downloading RNA seq data

downloading RNA seq data 0 Hi friends I am using the following code to get the data from TCGA. I want to have only one allocate of each person then I will have unique patients ID. Is there any line of code that I should add to this to get…

Continue Reading downloading RNA seq data

How to label columns in HTSeq output

How to label columns in HTSeq output 0 I’ve been working to process RNAseq data and I’ve used hisat2 to align my reads to the reference genome. When I take those output files and put them into HTSeq-count using the below code, I get a count matrix but the columns…

Continue Reading How to label columns in HTSeq output

htseq-count -t gene not working

I found a little problem. When I set the “-t gene”, the reads is mark “__no_feature”. But when I set the “-t exon”, the reads is mark “ENSG00000276104”. The gene “ENSG00000276104” is a single exon gene. I don’t know why this happens. reads: “TGTCTGTGGCGGTGGGATCCCGCGGCCGTGTTTTCCTGGTGGCCCGGCCGTGCCTGAGGTTTCTCCCCGAGCCGCCGCCTCTGCGGGCTCCCGGGTGCCCTTGCCCTCGCGGTCCCCGGCCCTCGCCCGTCTGTGCCCTCTTCCCCGCCCGCCGATCCTCTTCTTCCCCCCGAGCGGCTCACCGGCTTCACGTCCGTTGGTGGCCCCGCCTGGGAC”. I had aligned to hg38 by…

Continue Reading htseq-count -t gene not working

htseq-count python tutorial attribute counts error

Hello, I’m following the htseq-count tutorial for RNA-seq (counting the overlapping genes and exons) here htseq.readthedocs.io/en/master/tour.html. However, when I get to the point where I need to find the overlaps in the .sam file and .gtf file, I get an error. This is the code I ran originally that gave…

Continue Reading htseq-count python tutorial attribute counts error

htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’

htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’ 0 I am trying to run htseq-count (v. 0.13.5) on a sorted and indexed bam file. The command I entered looks like this: htseq-count -f bam -r pos -s yes -t CDS -i gene_id -m union filename_sorted.bam filename.gtf I get the following…

Continue Reading htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Hello, I’m following the htseq-count tutorial for RNA-seq (counting the overlapping genes and exons) here htseq.readthedocs.io/en/master/tour.html. However, when I get to the point where I need to find the overlaps in the .sam file and .gtf file, I get an error. This is the code I ran originally that gave…

Continue Reading HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

About outliers and non -separated samples in PCA

About outliers and non -separated samples in PCA 0 Hi all, I have plotted PCA for my samples(Tumor and Normal) in some cancer types. I have used the HTSeq-counts data from TCGA. Then I’ve normalized them by DESeq2 and the total normalized counts are in cnt dataframe. Head of cnt:…

Continue Reading About outliers and non -separated samples in PCA

Analysing high-throughput sequencing data in Python with HTSeq 2.0

Summary: HTSeq 2.0 provides a more extensive API including a new representation for sparse genomic data, enhancements in htseq-count to suit single cell omics, a new script for data using cell and molecular barcodes, improved documentation, testing and deployment, bug fixes, and Python 3 support. Availability and implementation: HTSeq 2.0…

Continue Reading Analysing high-throughput sequencing data in Python with HTSeq 2.0

TCGA transcriptome data to R (DESeq2)

This seems to be frequently asked question, so here is a robust method to fully recapitulate the counts given by TCGA and port it to DESeq2. Why the long way? Tanya and I noticed via TCGA-Biolinks and Firehose did not generate the full count matrix. ~5-10% of genes were missing…

Continue Reading TCGA transcriptome data to R (DESeq2)

Error “start too small” when running htseq-count on a sorted .bam file

Error “start too small” when running htseq-count on a sorted .bam file 0 Hello, This is my first time aligning scRNA-seq reads to a reference genome to analyze differential gene expression. I am using htseq-count to obtain count files for my different samples and I am receiving the following error:…

Continue Reading Error “start too small” when running htseq-count on a sorted .bam file

Getting errors trying to run rmats

Getting errors trying to run rmats 1 Hi, I am trying to use rmats for splice variation analysis through ssh using slurm after loading rmats module, these are commands that I tried and errors they produced rmats –s1 $PWD/control.txt –s2 $PWD/pdac.txt –gtf mm10/mm10.refGene.gtf Python programming language version 3.6.8 loaded. GNU…

Continue Reading Getting errors trying to run rmats

[2112.00939] Analysing high-throughput sequencing data in Python with HTSeq 2.0

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with…

Continue Reading [2112.00939] Analysing high-throughput sequencing data in Python with HTSeq 2.0

Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor

Tutorial:Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor 3 Hi all, It took me a while to figure this out so I thought it might be useful to a few other people. When you have used htseq-count on each of your RNA-seq’ed samples and have combined all of your…

Continue Reading Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor

is it same to use .bam file or .sam file?

.sam file was generated by following code samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam samtools index Untreated-3_s.bam .gtf file was downloaded by: wget ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz when I use htseq-count: htseq-count -s no -a 10 Untreated-3_sn.sam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count an error…

Continue Reading is it same to use .bam file or .sam file?

Exception type: ValueError, raised in libcalignmentfile.pyx:990

HTSeq-count error: Exception type: ValueError, raised in libcalignmentfile.pyx:990 0 .sam file was generated by following code samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam samtools index Untreated-3_s.bam .gtf file was downloaded by: wget ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz when I use htseq-count: htseq-count -s…

Continue Reading Exception type: ValueError, raised in libcalignmentfile.pyx:990

How to define the gene length for RPKM calculation

How to define the gene length for RPKM calculation 4 Hi guys, I would like to calculate the RPKM of my RNA seq experiment. To do this, as from the formula, I need to know the gene length. My starting point are the row reads (single end) counts resulting from:…

Continue Reading How to define the gene length for RPKM calculation

How to use htseq-count with several samples ?

How to use htseq-count with several samples ? 1 Does anyone know how to use htseq-count with several samples ? We can use htseq-count like : htseq-count sample1.sam reference.gtf > result.count.txt We can get sample1’s count data by above command. But, it is usual that we have more than two…

Continue Reading How to use htseq-count with several samples ?

When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

Hi all, I have raw counts of samples in a dataframe. The row names is Ensembl ID and I want to convert them to a gene symbol. So I’ve run the code below. query <- GDCquery(project = “TCGA-COAD” , data.category = “Transcriptome Profiling” , data.type = “Gene Expression Quantification”, workflow.type…

Continue Reading When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?

Bioinformatics Scientist – Job at DAWSON in Bethesda, MD

Bioinformatics Scientist Full Time Prof-Entry Bethesda, MD, US DAWSON is a Native Hawaiian Organization 8(a) small business that brings the Spirit of Aloha to our employees. As part of the DAWSON “Ohana”, you will be provided a best-in-class benefits program that strives to ensure our great people have peace of…

Continue Reading Bioinformatics Scientist – Job at DAWSON in Bethesda, MD

Finding counts of lncRNAs with htseq-count /featurecounts

Finding counts of lncRNAs with htseq-count /featurecounts 0 Hi, I’m trying to find the counts of novel and known lncRNA transcripts in humans and I have a GTF file already of these transcripts. However, I’m unsure about the following: should the input GTF file for HTSeq count or featurecounts be…

Continue Reading Finding counts of lncRNAs with htseq-count /featurecounts

How to convert HTSeq raw read counts to FPKMs?

How to convert HTSeq raw read counts to FPKMs? 0 Hi, I have a C.elegans RNAseq raw read counts which I generated from HTSeq. I want to convert them to FPKM values. I used “countToFPKM” to do that, but I am not able to get “Biomart.annotations.hg38.txt” file for C.elegans. Is…

Continue Reading How to convert HTSeq raw read counts to FPKMs?

Converting Ensembl gene id to Gene symbol

Converting Ensembl gene id to Gene symbol 0 Hi all, As mentioned earlier in this post, I tried to convert the Ensembl gene ids to the Gene symbol. I didn’t receive any error by the code below but the nrow of ens_to_symbol_biomart is 55605 and the length of ens is…

Continue Reading Converting Ensembl gene id to Gene symbol

A Pipeline for Analyzing eCLIP and iCLIP Data with Htseq-clip and DEWSeq

doi: 10.1007/978-1-0716-1851-6_10. Affiliations Expand Affiliations 1 European Molecular Biology Laboratory (EMBL), Heidelberg, Germany. 2 European Molecular Biology Laboratory (EMBL), Heidelberg, Germany. schwarzl@embl.de. Item in Clipboard Sudeep Sahadevan et al. Methods Mol Biol. 2022. Show details Display options Display options Format AbstractPubMedPMID doi: 10.1007/978-1-0716-1851-6_10. Affiliations 1 European Molecular Biology Laboratory (EMBL), Heidelberg,…

Continue Reading A Pipeline for Analyzing eCLIP and iCLIP Data with Htseq-clip and DEWSeq

Box plot for rna seq data

Box plot for rna seq data 1 Hi friends I plotted this box-wisker for TCGA HTSeq data in R. I want to have harf of them as red and half as blue (control vs treatment groups). or is there any better way for boxplot? How can I do that? I…

Continue Reading Box plot for rna seq data

Low assigned alignments

Low assigned alignments 0 Basecalls performed using CASAVA version v1.8.2 Trimmed reads with fastx_quality_trimmer 0.0.13 with a quality treshhold of 18 and a length of 20 Aligned with Bowtie 2.1.0 and Tophat 2.0.10 using Gencode v19 junctions Samtools 0.1.19-44428cd to make a bam, sort, index Raw counts were generated using…

Continue Reading Low assigned alignments

Bioinformatics Scientist with Security Clearance job in Bethesda at Dawson

Company Description Dawson is a Staffing & Recruiting agency that was founded in 1946 and headquartered in Columbus, OH. They have the vision to help the small as well as large corporations and businesses to recruit the talent that can help them to provide the best customer service with exceptional…

Continue Reading Bioinformatics Scientist with Security Clearance job in Bethesda at Dawson

H/ACA snoRNP gene family as diagnostic/prognostic biomarkers

Introduction Primary liver cancer, including hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma, is the sixth most commonly diagnosed cancer and the fourth leading cause of cancer-related deaths worldwide.1 High metastasis and recurrence rates, as well as limited treatment options, lead to the poor prognosis of advanced HCC.2 Among patients diagnosed with…

Continue Reading H/ACA snoRNP gene family as diagnostic/prognostic biomarkers

DEXSeq prepare annotation script throws “object has no attribute ‘next'” for Ensemble GTFs

DEXSeq prepare annotation script throws “object has no attribute ‘next’” for Ensemble GTFs 0 @24764cda Last seen 23 hours ago United States Hi there, I am trying to run the dexseq_prepare_annotation.py script and the code keeps failing after parsing the first line of the gtf. Specifically, the code is failing…

Continue Reading DEXSeq prepare annotation script throws “object has no attribute ‘next'” for Ensemble GTFs

Is it advisable to input a count matrix that consists of reads aligned using different algorithms (HT-Seq and Salmon)?

Hello! First of all, thank you for the great package and the excellent documentation that supports it, much appreciated! Sadly, I could not find an answer to my problem, so I wanted to ask here. I have two different bulk RNA-seq datasets, one obtained from TCGA using the TCGAbiolinks package,…

Continue Reading Is it advisable to input a count matrix that consists of reads aligned using different algorithms (HT-Seq and Salmon)?

Extracting exon level read coverage of a specific gene

HTSeq – Extracting exon level read coverage of a specific gene 1 Dear all, I am trying to quantify RNASeq reads at the “exon level” using HTSeq. To achieve a quantitative exon comparison. I am using ENCODE mouse data which is Illumina reads alligned to GENCODE M27 (GRCm39) using STAR…

Continue Reading Extracting exon level read coverage of a specific gene

All samples have 0 counts for all genes. check the counting script

DESeq2: All samples have 0 counts for all genes. check the counting script 1 @2f3f6904 Last seen 1 hour ago United Kingdom I am having problems importing my HTSeq count data- it tells me the counts are zero when this is clearly not the case when head outputs: >head(WTCHG_862660_71955267) GeneID…

Continue Reading All samples have 0 counts for all genes. check the counting script

Convert HTSeq-count, raw count to TPM : bioinformatics

Hi Everyone, I am working with a publicly available RNA-Seq dataset for which only the HTSeq-count data is accessible. I have done differential gene expression already (i.e. between sample analysis) however I am also hoping to obtain TPM count for within-sample analysis such as single-sample GSEA and for this I…

Continue Reading Convert HTSeq-count, raw count to TPM : bioinformatics

Using HTSeq-count for paired-end data but unsorted by SAMTOOLS

Using HTSeq-count for paired-end data but unsorted by SAMTOOLS 1 Hi there, as per thread title. If I am using HTSeq-count on paired-end mapped BAM files, but they are unsorted, and I use -s yes on the default option, is it advisable? htseq ngs • 99 views Paired-end .bam need…

Continue Reading Using HTSeq-count for paired-end data but unsorted by SAMTOOLS

How can I convert ensembl exon ID to gene symbol in a gene count dataframe?

How can I convert ensembl exon ID to gene symbol in a gene count dataframe? 1 I have an HTseq count file with a row containing exon ids and columns are exon counts. I need to convert them into gene IDs, given the fact that multiple exon ids may be…

Continue Reading How can I convert ensembl exon ID to gene symbol in a gene count dataframe?

How can I convert ensemble exon ID to gene symbol in a gene count dataframe?

How can I convert ensemble exon ID to gene symbol in a gene count dataframe? 0 I have an HTseq count file with a row containing exon ids and columns are exon counts. I need to convert them into gene IDs, given the fact that multiple exon ids may be…

Continue Reading How can I convert ensemble exon ID to gene symbol in a gene count dataframe?

bioinformatics-ca/HTG_2021: CBW’s High Throughput Genomics Analysis 2021

GitHub – bioinformatics-ca/HTG_2021: CBW’s High Throughput Genomics Analysis 2021 Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time workshop content for HTseq 2021 About CBW’s High Throughput Genomics Analysis 2021 Resources You can’t perform that action at this time. You signed in with another…

Continue Reading bioinformatics-ca/HTG_2021: CBW’s High Throughput Genomics Analysis 2021

Correct way to make multiple comparisons on DESeq2?

I have a project where I have done RNA-seq (paired-end sequencing on Illumina HiSeq) of a worm at different days of development i.e. Ages 0-12. For each age, I have sequenced 3 replicate specimens. I’m new to DESeq2 and I was wondering if what I did below is correct. library(DESeq2)…

Continue Reading Correct way to make multiple comparisons on DESeq2?

Running htseq-count to “grab” long non coding gene_id names

Running htseq-count to “grab” long non coding gene_id names 0 hi all, new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, “lnc_RNA” and “lncRNA”,…

Continue Reading Running htseq-count to “grab” long non coding gene_id names

gffread error

hello I am currently trying to do RNA-seq using public data in brassica juncea. To use htseq-count for making count table, I have to convert gff file which downloaded in brassica database to gtf file. So I used gffread for converting gff file with below command gffread Bju.genome.gff -T -o…

Continue Reading gffread error

HTseq doesn’t support Multi-Threading ?

HTseq doesn’t support Multi-Threading ? 1 Hello, everyone ! I’m looking for a way to use HTseq with multi-thread. I couldn’t find any options about multi-thread. Anybody knows how to, please ? (I know there are tools support multi-thread like STAR, HISAT2. but just wonder whether HTseq doesn’t support it.)…

Continue Reading HTseq doesn’t support Multi-Threading ?

Fastqc user manual – vodosp.ru

FASTQ format – Wikipedia 06 September 2021 – by TC Collin · 2020 · Cited by 3 — Be accompanied by a step-by-step user-friendly manual, If the user performs FastQC prior to the removal of adapters (step 3), the length Both programs can be used on Linux/MacOS X machines and quite…

Continue Reading Fastqc user manual – vodosp.ru

ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat

    Significance To date, the potential of utilizing root traits in plant breeding remains largely untapped. In this study, we cloned and characterized the ENHANCED GRAVITROPISM2 (EGT2) gene of barley that encodes a STERILE ALPHA MOTIF domain–containing protein. We demonstrated that EGT2 is a key gene of root growth…

Continue Reading ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat

which normalization before differential expression analysis (legacy=TRUE vs. legacy=FALSE)

TCGAbiolinks: which normalization before differential expression analysis (legacy=TRUE vs. legacy=FALSE) 1 Dear All, I am following the TCGAbiolinks tutorial for conducting differential expression analysis on TCGA data (“TCGAanalyze: Analyze data from TCGA” section). I have 2 questions about it. 1) I don’t understand the following: when dealing with legacy=TRUE data…

Continue Reading which normalization before differential expression analysis (legacy=TRUE vs. legacy=FALSE)

r – How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

I’m new to R and DESeq2 and I’m trying to run differential expression as below library(DESeq2) count_file_names <- grep(“counts”,list.files(“HTSeq_counts”),value=T) host_type < c(“Damaged”,”Control”) sample_information <-data.frame(sampleName = count_file_names, fileName = count_file_names, condition = host_type) DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = “HTSeq_counts”, design = ~condition) colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c(‘Damaged’,’Control’)) rld <-…

Continue Reading r – How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

I’m new to R and DESeq2 and I’m trying to run differential expression as below library(DESeq2) count_file_names <- grep(“counts”,list.files(“HTSeq_counts”),value=T) host_type < c(“Damaged”,”Control”) sample_information <-data.frame(sampleName = count_file_names, fileName = count_file_names, condition = host_type) DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = “HTSeq_counts”, design = ~condition) colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c(‘Damaged’,’Control’)) rld <-…

Continue Reading How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

htseq-counts output merge into one matrix ??

htseq-counts output merge into one matrix ?? 8 Dear all, I just need a little help to merge my all features counts into one matrix. I have counted features using htseq-counts and now want to merge into one file like…. ID     c1 c2 c3…………..t1 t2 t2…….etc   The problem is…

Continue Reading htseq-counts output merge into one matrix ??

How can I use R to do many genes survival analysis at the same time?

How can I use R to do many genes survival analysis at the same time? 1 I plan to use the DESeq2::rlog transformed TCGA HTseq_counts data and the TCGA clinical data to do survival analysis. But I am confused how to do many genes(>10000) survival analysis at the same time….

Continue Reading How can I use R to do many genes survival analysis at the same time?

Differential Gene Expression

Can you analyze in GEO2R? => No, because this is RNA-seq and not microarrays. You are lucky thought that the authors seem to provide raw counts so you can easily fede them into DESeq2. Here is a code suggestion, for details please read the DESeq2 vignette extensively, it contains answers…

Continue Reading Differential Gene Expression

does not contain a ‘gene’ attribute

htseq-count returns : does not contain a ‘gene’ attribute 1 Dear BIOSTAR community, I’m trying to make count matrix with htseq-count, htseq-count -s yes -t gene -i gene 01.sorted.sam annotation_cattle.gff > 01.txt even with –idattr=gene , it returns error: Error processing GFF file (line 1864255 of file annotation_cattle.gff): Feature gene-D1Y31_gp1…

Continue Reading does not contain a ‘gene’ attribute

Mapping reads and quantifying genes

Mapping reads and quantifying genes – Metagenomic workshop 0 Hello, I am using the following metagenomic workshop tutorial to analyse my own metagenomic data. metagenomics-workshop.readthedocs.io/en/latest/annotation/quantification.html I performed the following steps: mapped reads with bowtie2 and generated .bam file with samtools sort. Removed duplicates with picard Extracted gene information from prokka…

Continue Reading Mapping reads and quantifying genes

Gdcprepare() error.

I’m really struggling with this and I need urgent help. I keep running the following code but after the gdcprepare function, it either crashes my computer or freezes the console. I have no idea what to do, someone please help. library(“TCGAbiolinks”, quietly = T) library(“limma”, quietly = T) library(“edgeR”, quietly…

Continue Reading Gdcprepare() error.

Error creating DESeq2 Data Set from HTSeq-Count

I am trying to run DESeq2 using gene counts generated by HTSeq-Count. I combine files for different conditions: directory <- “~/GeneCountFiles/” WT_Files <- c( “P0CTRS3.aligned.sam.genecount”, “P0CTRS4.aligned.sam.genecount”, “P0CTRS5.aligned.sam.genecount” ) KO_Files <- c( “P0CTRS1.aligned.sam.genecount”, “P0CTRS2.aligned.sam.genecount”, “P0CTRS6.aligned.sam.genecount” ) I then create the sample table: sampleTable <- data.frame( sampleName=c(WT_Files, KO_Files), fileName=c(WT_Files, KO_Files), genotype=c(rep(“WT”, length(WT_Files)),…

Continue Reading Error creating DESeq2 Data Set from HTSeq-Count

how htseq-count counts unstranded RNA-seq data

how htseq-count counts unstranded RNA-seq data 1 preliminary Say I have some unstranded RNA-seq data and im mapping to the reference human genome using htseq-count (–stranded=no) My understanding (biologically) was that for a given protein_coding gene, reading DNA in the sense strand gives the protein_coding transcript, reading the gene in…

Continue Reading how htseq-count counts unstranded RNA-seq data

HTSeq-count TruSeq RNA Exome Lib Prep

HTSeq-count TruSeq RNA Exome Lib Prep 0 Hello, I observed a high percentage of “no features” while running HTseq w/ the –stranded yes option enabled (>80%). The library prep kit I am using is Illumina TruSeq RNA Exome which generates stranded data. If I run HTseq-count w/ strand == “no”…

Continue Reading HTSeq-count TruSeq RNA Exome Lib Prep

Differential expression analysis of TCGA data based on tumor staging

Hi everyone I wanted to analyze TCGA-BRCA data for identifying DEGs in different TNM stages (I to IV) between Normal and Tumor. How to change the following code to get the DEGs based on the staging? CancerProject <- “TCGA-BRCA” DataDirectory <- paste0(“../GDC/”,gsub(“-“,”_”,CancerProject)) FileNameData <- paste0(DataDirectory, “_”,”HTSeq_Counts”,”.rda”) query <- GDCquery(project =…

Continue Reading Differential expression analysis of TCGA data based on tumor staging

hisat2 compatibility for long read

hisat2 compatibility for long read 0 Hi, I am trying to align PacBio transcriptome reads against the genome to count the gene number. For pair end read i used the following workflow: # convert gff to gtf /home/software/cufflinks-2.2.1/gffread xxx.gff -T -o xxx.gtf # build index /home/software/hisat2-2.2.1/hisat2_extract_exons.py xxx.gtf > xxx.exon /home/software/hisat2-2.2.1/hisat2_extract_splice_sites.py…

Continue Reading hisat2 compatibility for long read