Tag: gtf

Standard for aligning smallRNA to a reference human rRNA?

Standard for aligning smallRNA to a reference human rRNA? 0 Hi, I need to label some smallRNA sequences that I know are rRNA fragments. I know that for mRNA these are discarded by aligning to the human genome and filtering out multimapped reads, but I need to try to pin…

Continue Reading Standard for aligning smallRNA to a reference human rRNA?

GENCODE – Human Release 32 Statistics

Statistics about the GENCODE Release 32 The statistics derive from the gtf file that contains only the annotation of the main chromosomes. For details about the calculation of these statistics please see the README_stats.txt file. General stats Total No of Genes 60609 Protein-coding genes 19965 Long non-coding RNA genes 17910…

Continue Reading GENCODE – Human Release 32 Statistics

Using featureCounts and downloading Rsubread

Using featureCounts and downloading Rsubread 1 @4769e097 Last seen 23 hours ago United Kingdom I am trying to perform a count per gene analysis using featureCounts in R. I have downloaded the gtf file and edited it within R to only contain the gene ID, chr, start, end, and strand,…

Continue Reading Using featureCounts and downloading Rsubread

Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts?

Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts? 1 Hi, friends. I only want to perform differential expression analysis on the annotated transcripts of my existing reference genome. I use tophat2 for alignment with –no-novel-juncs…

Continue Reading Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts?

GTFAnnotation class

Represent Gene Transfer Format (GTF) annotations Description The GTFAnnotation class contains annotations for one or more reference sequences, conforming to the GTF file format. You construct a GTFAnnotation object from a GTF-formatted file. Each element in the object represents an annotation. Use the object properties and methods to filter annotations…

Continue Reading GTFAnnotation class

Annotated file with gene ID (instead of gene symbol)

Annotated file with gene ID (instead of gene symbol) 0 @9cb59de3 Last seen 14 hours ago United States Hello, I am using “featureCounts” in Rsubread package for analyzing bulk RNA-seq of drosophila. Since there is no inbuilt annotations of drosophila, I am using a gtf file in the homepage of…

Continue Reading Annotated file with gene ID (instead of gene symbol)

Finding DEGs from HISAT2/STRINGTIE output

Finding DEGs from HISAT2/STRINGTIE output 0 Hello, I have to search for DEGs from four samples of crop. I am following reference based mapping of reads to genome using HISAT2. I have completed till the generation of merged .gtf files for the samples using STRINGTIE. Since I am new to…

Continue Reading Finding DEGs from HISAT2/STRINGTIE output

Transcription Start Site

Transcription Start Site 2 What are the best databases to check out the transcription start sites of specific genes in human genome? TSS • 130 views wget -q -O – “http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeBasicV19.txt.gz” | gunzip -c | awk ‘(int($7)< int($8)) {if($4==”+”) {printf(“%s\t%d\t%d\t%s\t%s\n”,$3,$7,int($7)+1,$2,$4);}else {printf(“%s\t%d\t%d\t%s\t%s\n”,$3,int($8)-3,$8,$2,$4);}}’ chr1 69090 69091 ENST00000335137.3 + chr1 139306 139309 ENST00000423372.3…

Continue Reading Transcription Start Site

HTseq-Count: Long processing time

HTseq-Count: Long processing time 1 Hi everyone, I’m processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks. htseq-count –max-reads-in-buffer=24000000000…

Continue Reading HTseq-Count: Long processing time

how to build index for cdna?

Hello, I can build index for Mus_musculus.GRCm38.dna_sm.toplevel.fa, but when build for Mus_musculus.GRCm38.cdna.all.fa, there is a bug: “rsem-extract-reference-transcripts Mus_musculus.GRCm38.cdna.all.fa 0 Mus_musculus.GRCm38.cdna.all.fa.gtf None 0 Mus_musculus.GRCm38.cdna.all” failed! Plase check if you provide correct parameters/options for the pipeline! Traceback (most recent call last): File “../indrops.py”, line 1770, in project.build_transcriptome(args.genome_fasta_gz, args.ensembl_gtf_gz, mode=args.mode) File “../indrops.py”, line…

Continue Reading how to build index for cdna?

Separate exogenous from endogenous transcripts using Salmon RNAseq DTU

Dear friends, We are trying to use Salmon for DTU analysis. We want to separate exogenous from endogenous transcripts by following this post www.biostars.org/p/443701/ and this paper f1000research.com/articles/7-952 We are focusing on a gene called ASCL1 (endo-ASCL1). We transduced cells with lentiviral vector containing ASCL1 ORF only (Lenti-ASCL1). There should…

Continue Reading Separate exogenous from endogenous transcripts using Salmon RNAseq DTU

deeptools plotHeatMap – Convert bed files to gene list?

I might suggest limiting your search to genes: gtf2bed < Mus_musculus.GRCm38.102.gtf | grep -w “gene” > sorted-mm10.genes.bed But that’s up to you. Otherwise, I think you may also get transcripts/exons, which may be more than you want. Again, up to you. If hnf4a-ko-downreg-clusters.bed is the file containing peaks, as described…

Continue Reading deeptools plotHeatMap – Convert bed files to gene list?

Difference between knownGene and wgEncodeGencodeCompV39

Hi: I am a bit confuse with the the relationship/difference between knownGene and wgEncodeGencodeCompV39 on UCSC Table Browser. Anyone know the precise difference between them? They both can be downloaded from the goldenPath page. knownGene: The schema is here, which is NOT match the file (knownGene.txt.gz) I downloaded. According to…

Continue Reading Difference between knownGene and wgEncodeGencodeCompV39

Using salmon in Galaxy

Hi everyone. I am executing Salmon in Galaxy in order to carry out gene quantification from mouse RNA-Seq data (6 samples). To do so, I am providing a reference genome (cDNA, in fasta format), the processed reads (in fastqsanger.gz format) of one of these samples (after executing Trim-Galore) and a…

Continue Reading Using salmon in Galaxy

htseq-count error

htseq-count error 1 Hi, htseq-count -f bam -s yes ~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam ~refgen/gencode.v39.primary_assembly.annotation.gtf > counts.txt I am trying to run htseq-count with command above but in the err file [E::idx_find_and_load] Could not retrieve index file for ‘~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam’ 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines…

Continue Reading htseq-count error

human genome files

human genome files 0 Hi all, Just wonder to know about these two questions? what is the main difference between the two genome files (Homo_sapiens.GRCh38.dna.primary_assembly.fa and Homo_sapiens.GRCh38.dna.fa) located in the ensemble database? which one should I use for whole-exome sequence alignment? I used Homo_sapiens.GRCh38.dna.fa for the alignment, and later on,…

Continue Reading human genome files

Low transcript quantification with Salmon using GRCm39 annotations

Hi everyone, first time working with mouse samples and unfortunately, there are fewer resources available for the latest mouse Ensembl genome than I was expecting. What I’ve done: I performed rRNA depletion on total RNA extracted from mouse tissue and created Illumina libraries using a cDNA synthesis kit with random…

Continue Reading Low transcript quantification with Salmon using GRCm39 annotations

Bioconductor Package Installation

When I try to install the gtf for hg38 BiocManager::install(“TxDb.Hsapiens.UCSC.hg38.knownGene”) I get the following error: ‘getOption(“repos”)’ replaces Bioconductor standard repositories, see ‘?repositories’ for details replacement repositories: CRAN: cran.rstudio.com/ Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01) Installing package(s) ‘TxDb.Hsapiens.UCSC.hg38.knownGene’ Error in readRDS(dest) : error reading from connection Per stackoverflow.com/questions/67455984/getoptionrepos-replaces-bioconductor-standard-repositories-see-reposito I…

Continue Reading Bioconductor Package Installation

Htseq is giving me 0 counts using the GFF3 of miRBase

Hello! I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0…

Continue Reading Htseq is giving me 0 counts using the GFF3 of miRBase

OrgDb (org.Hs.eg.db) missing many EntrezIDs and their Symbols (Compared to Ensembl GRChg8 v103)

The organism package org.Hs.eg.db is updated twice a year in March and September, a week or two before each Bioconductor release. The current version of Org.Hs.eg.db is dated 15 September 2021. org.Hs.eg.db is 100% comprehensive in that it contains all Entrez IDs that exist at the time it is created….

Continue Reading OrgDb (org.Hs.eg.db) missing many EntrezIDs and their Symbols (Compared to Ensembl GRChg8 v103)

How to extract fasta sequences from assembled transcripts generated by Stringtie

How to extract fasta sequences from assembled transcripts generated by Stringtie 4 Hi all, I used STAR and stringtie for mapping reads to reference genome and assembly. As you know, the generated assembled transcripts by stringtie are in gtf format. Now, I want to have fasta sequence of assembled transcript….

Continue Reading How to extract fasta sequences from assembled transcripts generated by Stringtie

Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2) Install $ git clone github.com/bakerwm/find_te_ins.git&#13; $ cd find_te_ins Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11; $ bash run_pipe.sh run_pipe.sh Prerequisite minimap2 – 2.17-r974-dirty, align long…

Continue Reading Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

Error in Rsubread featureCounts

Hi there, Excellent package! I am using it to do RNA-seq. But I encountered a small problem when using featureCounts(). The code is as follows: featureCounts( “A1.raw_1.fastq.gz.subjunc.BAM”, annot.inbuilt = NULL, annot.ext = “GCF_015227675.2_mRatBN7.2_genomic.gtf”, isGTFAnnotationFile=TRUE, isPairedEnd=TRUE, nthreads = 8 ) And it returns this: ========== _____ _ _ ____ _____ ______…

Continue Reading Error in Rsubread featureCounts

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

There are four types of methods to extract longest transcript or longest CDS regeion with longest transcript from transcripts fasta file or GTF file. 1.Extract longest transcript from gencode transcripts fasta file. 2.Extract longest transcript from gtf format annotation file based on gencode/ensembl/ucsc database. 3.Extract longest CDS regeion with longest…

Continue Reading Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

can not upload GTF file to UCSC genomebrowser

We are unable to reproduce the error you are seeing and we also recentlyexperienced temporary issues with our site. Please let us know if youare still having this problem. Post by Gang WeiDear manager of UCSC Genome Browser,Glad to write to you. I’m now using UCSC genome browser to check…

Continue Reading can not upload GTF file to UCSC genomebrowser

Head of bioinformatics – Lausanne

Head of bioinformatics Introduction UNIL is a leading international teaching and research institution, with over 5,000 employees and 17,000 students split between its Dorigny campus, CHUV and Epalinges. As an employer, UNIL encourages excellence, individual recognition and responsibility. The Lausanne Genomic Technologies Facility (GTF) is a service platform working for…

Continue Reading Head of bioinformatics – Lausanne

Why are there more lincRNA genes than transcripts in Mus_musculus.GRCm38.100.gtf?

Why are there more lincRNA genes than transcripts in Mus_musculus.GRCm38.100.gtf? 1 Hi All, There are 52446 annotated genes (ENSMUSG IDs) and 142,699 transcripts (ENSMUST IDs) in Mus_musculus.GRCm38.100.gtf. It makes sense that there are WAY more transcripts than genes. My question, however, is – why are there more genes than transcripts…

Continue Reading Why are there more lincRNA genes than transcripts in Mus_musculus.GRCm38.100.gtf?

MARS seq alingment

MARS seq alingment 0 Hello everyone, new here and also new to the field. was asked to create a pipeline for RNA seq and after two months of self learning of how to interact with each code im stuck with the program STAR. what im trying to do for now…

Continue Reading MARS seq alingment

genbank to GTF in galaxy

genbank to GTF in galaxy 0 Hi all, I am working on galaxy and have a genome file in genbank format. To use featurecounts for my RNAseq, I need to convert the genbank format to a GTF format because that’s the format the featurecounts tool in galaxy expects. Now, I…

Continue Reading genbank to GTF in galaxy

htseq-count python tutorial attribute counts error

Hello, I’m following the htseq-count tutorial for RNA-seq (counting the overlapping genes and exons) here htseq.readthedocs.io/en/master/tour.html. However, when I get to the point where I need to find the overlaps in the .sam file and .gtf file, I get an error. This is the code I ran originally that gave…

Continue Reading htseq-count python tutorial attribute counts error

“Paired-end reads were detected in single-end read library”

“Paired-end reads were detected in single-end read library” 0 @9cb59de3 Last seen 12 hours ago United States Hello, I am using “featureCounts” in Rsubread package for analyzing bulk RNA-seq of drosophila. Since there is no inbuilt annotations of drosophila, I am trying to use a gtf file in the homepage…

Continue Reading “Paired-end reads were detected in single-end read library”

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree 0 When I run circm6A (github.com/canceromics/circm6a) example code: cd ../.. java -Xmx16g -jar circm6a.jar -ip test_data/HeLa_eluate_rep_1.chr22.bam -input test_data/HeLa_input_rep_1.chr22.bam -r test_data/gencode_chr22.gtf -g test_data/hg38_chr22.fa -o test_data/example_Hela The following error occurred: Start at 2021-12-12 16:33:26 Exception in thread “main” java.lang.NoClassDefFoundError: htsjdk/samtools/util/IntervalTree at main.Method.loadGenes(Method.java:200) at main.Method.run(Method.java:66) at main.Main.main(Main.java:9) Caused by: java.lang.ClassNotFoundException: htsjdk.samtools.util.IntervalTree…

Continue Reading NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

Indexing with STAR

Indexing with STAR 0 Hello, I am working with RNA seq data and creating an index of reference genome Gossypium hirsutum by using STAR. STAR asks GTF annotation format while my file is GFF3. According to literature, in order to run GFF file I need to remove –sjdbOverhang 50 and…

Continue Reading Indexing with STAR

For Differential Gene Expression , which indexing format is better: GFF or GTF?

For Differential Gene Expression , which indexing format is better: GFF or GTF? 0 Hello, I am working on DGE and wish to create reference index for mapping. Two file formats are used for it GFF and GTF. My question is: What is the major difference between GTF and GFF?…

Continue Reading For Differential Gene Expression , which indexing format is better: GFF or GTF?

htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’

htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’ 0 I am trying to run htseq-count (v. 0.13.5) on a sorted and indexed bam file. The command I entered looks like this: htseq-count -f bam -r pos -s yes -t CDS -i gene_id -m union filename_sorted.bam filename.gtf I get the following…

Continue Reading htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’

get rRNA FASTA file for a particular bacteria

get rRNA FASTA file for a particular bacteria 0 Hey all, I was trying to find a way to get all rRNA (5S, 16S and 23S) FASTA sequences for a particular bacteria (B. thetaiotaomicron VPI-5482, which is the type strain). I wanted this file so that I could use something…

Continue Reading get rRNA FASTA file for a particular bacteria

Convertion Of Gff3 To Gtf

Convertion Of Gff3 To Gtf 3 How do I convert GFF file to a GTF file? Is there any tool available? gtf gff • 79k views The easiest way is to use the gffread program that comes with the Cufflinks software suite (Tuxedo) gffread my.gff3 -T -o my.gtf See gffread…

Continue Reading Convertion Of Gff3 To Gtf

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Hello, I’m following the htseq-count tutorial for RNA-seq (counting the overlapping genes and exons) here htseq.readthedocs.io/en/master/tour.html. However, when I get to the point where I need to find the overlaps in the .sam file and .gtf file, I get an error. This is the code I ran originally that gave…

Continue Reading HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Tximport in usegalaxy

Tximport in usegalaxy 0 Devon Ryan: Please help in resolving this issue. How to use tximport in usegalaxy to convert transcript ID(DESEQ2-SALMON) to gene ID. I want to get gene ids from the results of deseq2(salmon) . Which GTF should be used for tximport. Iam getting the following error in…

Continue Reading Tximport in usegalaxy

Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2

A detailed protocol of differential expression analysis methods for RNA sequencing was provided: limma, EdgeR, DESeq2. Three differential expression analysis methods for RNA sequencing:limma, EdgeR, and DESeq2. Open the RStudio program and then load R file, DEGs. The file can be acquired from supplementary files.One. Downloading and pre-processing of data.1.1….

Continue Reading Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2

Gene Expression Prediction from DNA sequences

Gene Expression Prediction from DNA sequences 1 Hi everyone! I am a university student working on my Master’s thesis. I worked on a paper called Xpresso which has the purpose to predict the gene expression levels starting from DNA sequences using deep learning techniques. Now, my lecturers have asked me…

Continue Reading Gene Expression Prediction from DNA sequences

featureCounts difference assigned reads summary file and summed up reads in feature count matrix

featureCounts difference assigned reads summary file and summed up reads in feature count matrix 0 Dear all, this might be a naive question but my googlefoo fails me. I count reads from a bam, aligend by Star against a custom hg19 genome, after running picard markDuplicates, then counting reads assigned…

Continue Reading featureCounts difference assigned reads summary file and summed up reads in feature count matrix

TCGA transcriptome data to R (DESeq2)

This seems to be frequently asked question, so here is a robust method to fully recapitulate the counts given by TCGA and port it to DESeq2. Why the long way? Tanya and I noticed via TCGA-Biolinks and Firehose did not generate the full count matrix. ~5-10% of genes were missing…

Continue Reading TCGA transcriptome data to R (DESeq2)

Adding repeats in a genome fasta at a particular location without messing up the annotations?

Adding repeats in a genome fasta at a particular location without messing up the annotations? 0 I want to add a bunch of expanded repeats in a genome fasta file, for eg. 100 ATTs at a particular location eg Chr1-1:2. How do I that and at the same time update…

Continue Reading Adding repeats in a genome fasta at a particular location without messing up the annotations?

Error “start too small” when running htseq-count on a sorted .bam file

Error “start too small” when running htseq-count on a sorted .bam file 0 Hello, This is my first time aligning scRNA-seq reads to a reference genome to analyze differential gene expression. I am using htseq-count to obtain count files for my different samples and I am receiving the following error:…

Continue Reading Error “start too small” when running htseq-count on a sorted .bam file

Getting errors trying to run rmats

Getting errors trying to run rmats 1 Hi, I am trying to use rmats for splice variation analysis through ssh using slurm after loading rmats module, these are commands that I tried and errors they produced rmats –s1 $PWD/control.txt –s2 $PWD/pdac.txt –gtf mm10/mm10.refGene.gtf Python programming language version 3.6.8 loaded. GNU…

Continue Reading Getting errors trying to run rmats

GTF file danio rerio

GTF file danio rerio 1 I’ve been trying to find a GTF file for the zebrafish genome and have been unsuccessful. Would anyone be able to point me in the right direction rerio GTF zebrafish danio • 35 views • link updated 43 minutes ago by GenoMax 110k • written…

Continue Reading GTF file danio rerio

featurecounts in command line

featurecounts in command line 0 I’m trying to convert my bam files to count data with the help of feature counts in command line, I used the code: featurecounts -T 8 -a /Users/ria/Desktop/bowtie_2/GCF_000001405.39_GRCh38.p13_genomic.gtf -g ‘transcrip_id’ -o readcounts/readcount1.txt bam files/-.bam (readcounts is a the directory for dumping the output) the error…

Continue Reading featurecounts in command line

Error when using featurecounts

I am doing some RNA analysis and am having issues trying to generate count data. I mapped my reads to a reference genome fasta file (genbank fasta file from ncbi) using bbmap and .sam files as the output. I am now trying to use featurecounts to generate count data but…

Continue Reading Error when using featurecounts

STAR index generation for bacterial genome

STAR index generation for bacterial genome 0 Hi, I’m trying to analyze RNA-Seq data for a bacteria – Mycobacterium tuberculosis. I used the FASTA and GTF files from NCBI to create the index, and set the –genomeSAindexNbases at 8 based on this previous post. The bash script I used is:…

Continue Reading STAR index generation for bacterial genome

Generate a count matrix for a consensus peak set and related peaks/reads

Hi everyone, I have a consensus peak file (.bed) that have, Chr, start, end. It doesn’t have any header. Something like this: chr1 721000 726999 chr1 817800 821799 chr1 1027400 1030799 chr1 1033600 1037599 chr1 1047400 1050399 I want to generate a count matrix for my downstream analysis and differential…

Continue Reading Generate a count matrix for a consensus peak set and related peaks/reads

How to infer the TSS from a gtf file

I have a csv file that I’ve downloaded from this paper here (abridged version below), and it’s referring to CpG locations in the genome from beadChip array data, using assembly hg18 as follows: CpGmarker Build Chr MapInfo SourceVersion TSS_Coordinate Gene_Strand Symbol Synonym Accession GID cg00075967 36 15 72282407 36.1 72282245…

Continue Reading How to infer the TSS from a gtf file

How To Convert Gencode Gtf Into Bed Format ?

My solution, based on Ian’s answer: zcat ../../../data/annotations/gencode.v24.annotation.gtf.gz | awk ‘OFS=”t” {if ($3==”gene”) {print $1,$4-1,$5,$10,$16,$7}}’ | tr -d ‘”;’ | head chr1 11868 14408 ENSG00000223972.5 . + chr1 14403 29569 ENSG00000227232.5 . – chr1 17368 17435 ENSG00000278267.1 . – chr1 29553 31108 ENSG00000243485.3 . + chr1 30365 30502 ENSG00000274890.1 ….

Continue Reading How To Convert Gencode Gtf Into Bed Format ?

STAR Genome indexing (Homo_sapiens_assembly38.fasta vs. GRCh38.primary_assembly.genome.fa)

I have a a query regarding STAR alignment. I used the following commands to generate genome index. (Homo_sapiens_assembly38.fasta) STAR –runMode genomeGenerate –genomeDir /home/bsh/BC_MCFcellLine_WTS/result/STAR_indexing/ –genomeFastaFiles /data1/database/ftp.broadinstitute.org/bundle/hg38_210610_download/Homo_sapiens_assembly38.fasta –sjdbGTFfile /home/bsh/BC_MCFcellLine_WTS/gencode.v27.annotation.gtf And I used the following commands for mapping and bam file was successfully generated. STAR –runThreadN 4 –outFilterType BySJout –outFilterMismatchNmax 999 –outFilterMultimapNmax 10…

Continue Reading STAR Genome indexing (Homo_sapiens_assembly38.fasta vs. GRCh38.primary_assembly.genome.fa)

Has anyone here worked with CNCI before? EXHAUSTED

Has anyone here worked with CNCI before? EXHAUSTED 1 Has anyone here worked with CNCI before? I’m just about exhausted trying to figure out what I’m doing wrong. So, I tried the following: python CNCI.py candidate_lncs.gtf -g -o test -m ve -p 16 -d ./dbase GRCm38.primary_assembly.genome.fa and I recieved the…

Continue Reading Has anyone here worked with CNCI before? EXHAUSTED

Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor

Tutorial:Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor 3 Hi all, It took me a while to figure this out so I thought it might be useful to a few other people. When you have used htseq-count on each of your RNA-seq’ed samples and have combined all of your…

Continue Reading Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor

STAR alignment in a full directory

STAR alignment in a full directory 0 Hi! I’m working with STAR and I would like to align multiple file, but separately. I have file paires like these two: Dros_01_S48_L001_R1_001.fastq.gz Dros_01_S48_L001_R2_001.fastq.gz etc. Only the S[number] changes and the R1 and R2 in the names. I have a code, what I…

Continue Reading STAR alignment in a full directory

Explanation of ENSEMBL GTF features

Explanation of ENSEMBL GTF features 4 Hi Guys, I am trying to find more info about the features in the ENSEMBL GTF file, but don’t know where to find it. I am using the hg38 GTF file from ENSEMBL, and I am interested in column 3 (feature). More specific I…

Continue Reading Explanation of ENSEMBL GTF features

How can I produce gene level quantification using Salmon pseudo-aligner?

How can I produce gene level quantification using Salmon pseudo-aligner? 3 Hi ! I am using Salmon in order to permform pseudo-alignment on paired end rna-seq data. I want a gene quantification but i obtain files cith transcripts quantification : command line used : salmon quant -i Transcriptome_GH38_release_92/Homo_sapiens.GRCh38.92.cdna.ncrna.fa_quasi_index/ -l A…

Continue Reading How can I produce gene level quantification using Salmon pseudo-aligner?

Annotation of vcftools fst output using GTF or GFF

Annotation of vcftools fst output using GTF or GFF 1 hi everyone I have an outputted file from vcftools for fst calculation in the following format CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST MEAN_FST ch1 1 40000 75 0.0516003 0.0355082 ch1 20001 60000 22 -0.00980986 -0.0205035 ch1 40001 80000 46 0.0180676 0.0236424…

Continue Reading Annotation of vcftools fst output using GTF or GFF

STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice?

STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice? 0 Hi Currently I have pair-end mouse RNA-seq data. 3 Tissues with 17 samples/tissue. Length of the sequence is from 100bp to 150bp. My goal is to perform DEG after the alignment. This is…

Continue Reading STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice?

Weird output of geneCount for STAR?

Weird output of geneCount for STAR? 1 Hi I’m using STAR to perform alignment and my goal is to do the DEG analysis in the future. The parameter I set up as follows: Step 1: STAR –runThreadN 12 –runMode genomeGenerate –genomeDir genomedir –genomeFastaFiles ./ref/GRCm39.primary_assembly.genome.fa –sjdbGTFfile ./ref/gencode.vM27.primary_assembly.annotation.gtf –sjdbOverhang 100 Step 2:…

Continue Reading Weird output of geneCount for STAR?

Error after STAR mapping

Error after STAR mapping 0 Hi, I’m doing the STAR mapping, but I get the bam files with some problems.When I use the command samtools flagstat SRR7195620_2.fastq.gz_Aligned.sortedByCoord.out.bam to see the details of the bam file,it shows this: 3266075 + 0 in total (QC-passed reads + QC-failed reads) 1044500 + 0…

Continue Reading Error after STAR mapping

is it same to use .bam file or .sam file?

.sam file was generated by following code samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam samtools index Untreated-3_s.bam .gtf file was downloaded by: wget ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz when I use htseq-count: htseq-count -s no -a 10 Untreated-3_sn.sam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count an error…

Continue Reading is it same to use .bam file or .sam file?

Exception type: ValueError, raised in libcalignmentfile.pyx:990

HTSeq-count error: Exception type: ValueError, raised in libcalignmentfile.pyx:990 0 .sam file was generated by following code samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam samtools index Untreated-3_s.bam .gtf file was downloaded by: wget ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz when I use htseq-count: htseq-count -s…

Continue Reading Exception type: ValueError, raised in libcalignmentfile.pyx:990

Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice?

Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice? 0 Hi Currently I have pair-end mouse RNA-seq data. 3 Tissues with 17 samples/tissue. Length of the sequence is from 100bp to 150bp. My goal is to perform DEG…

Continue Reading Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice?

feutureCount in the subread

feutureCount in the subread 0 Hello Everyone, I am quantifying read counts in the bam using feutureCount in the command line but getting errors below ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is ‘gene_id’ An…

Continue Reading feutureCount in the subread

Multi-fasta file for gffread

Multi-fasta file for gffread 0 Hey Guys, I’m having a problem trying to extract the transcripts from a merged StringTie .gtf file with gffread. I have downloaded the cDNA fastq file from ENSEMBL and tried to run the following command: gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf However I’m getting the…

Continue Reading Multi-fasta file for gffread

How to define the gene length for RPKM calculation

How to define the gene length for RPKM calculation 4 Hi guys, I would like to calculate the RPKM of my RNA seq experiment. To do this, as from the formula, I need to know the gene length. My starting point are the row reads (single end) counts resulting from:…

Continue Reading How to define the gene length for RPKM calculation

How to convert a .tsv annotation file to .gtf?

How to convert a .tsv annotation file to .gtf? 1 I am a first-time poster so I hope I am doing this okay. I am trying to execute a genome indexing script. The following are my parameters: –runThreadN 14 –runMode genomeGenerate –genomeDir /scratch/projects/ag_transcriptomics/transcriptomes –genomeFastaFiles /scratch/projects/ag_transcriptomics/transcriptomes/PSTR.fna –sjdbGTFfile /scratch/projects/ag_transcriptomics/transcriptomes/INDEX/WHEREMYFILEWOULDBE –sjdbOverhang 100 –sjdbGTFtagExonParentTranscript…

Continue Reading How to convert a .tsv annotation file to .gtf?

EXOM-seq counting

EXOM-seq counting 0 Hi everyone, Does anyone know where to download the human Annotating Genomes with GFF3 or GTF files. I want to apply featureCounts to quantify read counts in the bam file in the command line. featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bam Best, AD expression…

Continue Reading EXOM-seq counting

counting EXOM reads using subread featureCounts

counting EXOM reads using subread featureCounts 0 does anyone know where to download the Annotating Genomes with GFF3 or GTF files? My idea is to download one of these files in the command line or any other way and to apply featureCounts to quantify read counts in the bam file….

Continue Reading counting EXOM reads using subread featureCounts

Trouble with Ensembl statistics

Hello! I need to know how many coding genes are in the Y chromosome. The first i try to filter gtf file with R using this code #load gtf file gtf <- rtracklayer::import(‘~/lapd/Index_hum/ann/Homo_sapiens.GRCh38.104.gtf’) gtf_df=as.data.frame(gtf) ##filter gtf file library(dplyr) gtf_filt= filter(gtf_df, type==’gene’, gene_biotype == ‘protein_coding’) chrY=filter(gtf_filt, chromosome_name == ‘Y’) Thus, i…

Continue Reading Trouble with Ensembl statistics

How GenomicFeatures cdsBy() accounts for the frame info in the gff to get the CDS?

How GenomicFeatures cdsBy() accounts for the frame info in the gff to get the CDS? The info in the the 8th gff field m.ensembl.org/info/website/upload/gff.html frame – One of ‘0’, ‘1’ or ‘2’. ‘0’ indicates that the first base of the feature is the first base of a codon, ‘1’ that…

Continue Reading How GenomicFeatures cdsBy() accounts for the frame info in the gff to get the CDS?

counting(quantifying) the expression level of genes from Exom_seq

counting(quantifying) the expression level of genes from Exom_seq 0 Hi everyone, I am quantifying the expression level of genes from EXOM seq. I used featureCount function and passed the bam file in the command line as stated in the biostar hand book page number 792 but got errors as shown…

Continue Reading counting(quantifying) the expression level of genes from Exom_seq

Error at parsing .tlst line (invalid strand): 3210 PGSC0003DMT400030180 ST4.03ch12. 56298573-56298656

Error at parsing .tlst line (invalid strand): 3210 PGSC0003DMT400030180 ST4.03ch12. 56298573-56298656 0 Hi, I am new to RNA-seq and am trying to map paired-end reads with TopHat but I keep getting this error: [2021-11-15 19:37:57] Beginning TopHat run (v2.1.1) ———————————————– [2021-11-15 19:37:57] Checking for Bowtie Bowtie version: 2.4.4.0 [2021-11-15 19:37:58]…

Continue Reading Error at parsing .tlst line (invalid strand): 3210 PGSC0003DMT400030180 ST4.03ch12. 56298573-56298656

3′ bias on polyA RNAseq goes wrong!

Hi! I’m checking the output from gene_bodyCoverage.py using a bed file with housekeeping genes. We sequenced some samples using a polyA capture library (all in the same kit, same sequencing run), so I would expect a 3′ bias. However, the output I got was somewhat mixed – I see some…

Continue Reading 3′ bias on polyA RNAseq goes wrong!

Why does featurecounts give me an output file with only 0s?

Why does featurecounts give me an output file with only 0s? 0 Hello, I’m trying to run featurecounts on my .bam files, but the resulting file yields only 0s in every row and column. Here are the steps I have taken so far: (de novo) Assembled 40 transcripts from RNASeq…

Continue Reading Why does featurecounts give me an output file with only 0s?

How to use htseq-count with several samples ?

How to use htseq-count with several samples ? 1 Does anyone know how to use htseq-count with several samples ? We can use htseq-count like : htseq-count sample1.sam reference.gtf > result.count.txt We can get sample1’s count data by above command. But, it is usual that we have more than two…

Continue Reading How to use htseq-count with several samples ?

Tophat running error

Tophat running error 1 Hello, I’m trying to run tophat and I’m getting the following error that I don’t know how to fix it. Appreciate any help. [2018-05-11 11:14:58] Checking for Bowtie Bowtie version: 2.2.9.0 [2018-05-11 11:14:58] Checking for Bowtie index files (genome).. [2018-05-11 11:14:58] Checking for reference FASTA file…

Continue Reading Tophat running error

mm39 genePred file

mm39 genePred file 1 Hello, i need a gene annotation file for the mm39 mouse genome in the genePred format. I found that there is a utility which can convert the information from the gtf format. However, where I would download the gtf file it says that it was created…

Continue Reading mm39 genePred file

StringTie merged transcripts for two different conditions

StringTie merged transcripts for two different conditions 0 Hi guys, When using StringTie for finding new transcripts in two different conditions, for example treatment_A and treatment_B, do I need to have two different (one for each condition) merged .gtf files? For later running StringTie -eB for samples of treatment_A with…

Continue Reading StringTie merged transcripts for two different conditions

Transform a GTF file into a data frame in R

Transform a GTF file into a data frame in R 4 Hi, I would like to analyse the content of a GTF file. I am quite able with R and dplyr, so I would like to transform my GTF file into a data frame to facilitate my analysis. Does anybody…

Continue Reading Transform a GTF file into a data frame in R

Simplify annotation file by collapsing gene isoforms into a single annotation per gene

Simplify annotation file by collapsing gene isoforms into a single annotation per gene 1 Hi, I need a simplified annotation file that contains a single “complete” annotation for each gene of the human genome. In other words, what I need is similar to when an annotation track in the UCSC…

Continue Reading Simplify annotation file by collapsing gene isoforms into a single annotation per gene

Generate GTF file

Generate GTF file 1 Is there a tool that can generate a GTF from a fasta file -eg GCF_000001405.39_GRCh38.p13_genomic.fna? I can convert the fasta to bed, and know that BEDOPS can convert GFF to BED, but I want to go the other way. Either GFF or GTF is fine as…

Continue Reading Generate GTF file

Splice sequence indexing failed with err =127

Tophat2 Error: Splice sequence indexing failed with err =127 0 I’ve been trying to map my RNA-seq results onto an entire genome, and I’ve encountered a problem with splices. The script.pbs I submitted to cluster servers is: #PBS -N tophat_cufflinks_1 #PBS -o tophat_cufflinks_1_out.txt #PBS -e tophat_cufflinks_1_error_out.txt #PBS -l nodes=cu01:ppn=24 export…

Continue Reading Splice sequence indexing failed with err =127

Finding counts of lncRNAs with htseq-count /featurecounts

Finding counts of lncRNAs with htseq-count /featurecounts 0 Hi, I’m trying to find the counts of novel and known lncRNA transcripts in humans and I have a GTF file already of these transcripts. However, I’m unsure about the following: should the input GTF file for HTSeq count or featurecounts be…

Continue Reading Finding counts of lncRNAs with htseq-count /featurecounts

Help for extraction of fasta sequences

Hello everyone, I hope you are well. I am writing this post because I have a question or rather I have a problem with my workflow. Perform a workflow for RNA-seq processing as follows: quality control – Hisat2 – Stringtie – Deseq2 A simple, normal workflow that threw me important…

Continue Reading Help for extraction of fasta sequences

featureCounts low annotation rate RNA-seq

Hey everybody! I am trying to annotate my RNA-seq files (paired-end) with featureCounts. However, I keep having a quite low annotation rate, ~35-36% for all the files. My command line is the following: featureCounts -T 6 -p -s 2 -a annotation_file.gtf -o output_file.txt input_files.bam I am using the parameter -s…

Continue Reading featureCounts low annotation rate RNA-seq

STAR producing an empty BAM file

STAR producing an empty BAM file 0 I’m trying to run STAR but I am getting an empty BAM file. Does anyone know why this is happening and how to fix it? iCount mapstar demultiplexed/demux_NNNGGCGNN.fastq.gz hs88 mapping_NNNGGCGNN > –annotation homo_sapiens.88.gtf.gz #for context, mapstar needs the following arguments reads, genome_index, out_dir…

Continue Reading STAR producing an empty BAM file

replacing ensembl ID with the gene symbol?

Im a noob with a very unclear idea of what I am doing, but I’m doing my best. The other day, the ncbi webpage for downloading genomes and GTF files was down. As a result, I had to do my analysis on this RNA seq data using the ensembl files,…

Continue Reading replacing ensembl ID with the gene symbol?

How to find/extract sequences of protein coding genes and Transposable elements from rice transciptome.gtf file ?

Forum:How to find/extract sequences of protein coding genes and Transposable elements from rice transciptome.gtf file ? 0 Hello researchers, i am stuck’ed in project, required effective solutions, 1) how to find out protein coding genes and transposable elements from rice transcriptome.gtf file ? 2) how to extract out protein coding…

Continue Reading How to find/extract sequences of protein coding genes and Transposable elements from rice transciptome.gtf file ?

Bioconductor – Bioconductor 3.14 Released

Home Bioconductor 3.14 Released October 27, 2021 Bioconductors: We are pleased to announce Bioconductor 3.14, consisting of 2083 software packages, 408 experiment data packages, 904 annotation packages, 29 workflows and 8 books. There are 89 new software packages, 13 new data experiment packages, 10 new annotation packages, 1 new workflow,…

Continue Reading Bioconductor – Bioconductor 3.14 Released

Low mapping frequency on STAR

Low mapping frequency on STAR 0 Hi all, I’ve been trying to re-map some RNA-seq fasta files to mm39 using STAR. I was told by the sequencing facility who ran the requencing for me that when they mapped the reads onto mm10 using BWA MEM their mapping frequency for each…

Continue Reading Low mapping frequency on STAR

How should I deal with my RNAseq result with high % of mutiple aligned reads?

How should I deal with my RNAseq result with high % of mutiple aligned reads? 1 Here is one of my result from STAR alignment. As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74% This sounds not that bad because more than…

Continue Reading How should I deal with my RNAseq result with high % of mutiple aligned reads?

Trouble to generate genome indexes for human RNA-seq reads in STAR

Trouble to generate genome indexes for human RNA-seq reads in STAR 0 Hi everyone, I´m a masters student and new at bioinformatics. My research is in RNA-Seq analysis, and right now I´m having trouble to generate the genome indexes prior to the alignment in STAR. A couple minutes after lauching…

Continue Reading Trouble to generate genome indexes for human RNA-seq reads in STAR

RNA-seq analysis, wrong genome build

RNA-seq analysis, wrong genome build 2 Hi, I have been doing a feature count and subsequent differential gene expression analysis on some RNA-seq samples which I now suspect is giving me poor results because I used a GRCm39 feature file from Ensembl but bam files which I suspect were aligned…

Continue Reading RNA-seq analysis, wrong genome build

Qualimap whole exome sequencing depth of coverage

I’m trying to calculate the depth of coverage from my WXS data. Using Qualimap, I first used as the feature file the Gencode human genome (release 38) .gtf file associated with the genome I aligned to: feat=”gencode.v38.primary_assembly.annotation.gtf” for ea in *bam do $qualimap bamqc –java-mem-size=20G -bam $ea –feature-file $feat done;…

Continue Reading Qualimap whole exome sequencing depth of coverage

How to deal with high % of reads mapped to multiple loci on STAR?

How to deal with high % of reads mapped to multiple loci on STAR? 0 Here is one of my result from STAR alignment. As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74% This sounds not that bad because more than 90%…

Continue Reading How to deal with high % of reads mapped to multiple loci on STAR?

Extend 3′ UTR of a GTF file

Tool:Extend 3′ UTR of a GTF file 1 Hello guys. Some times ago I’ve asked here if there’s an existing approach designed to extend 3′ terminus of genes by a provided length: I received no answers, because apparently there’s no one. In my team we encountered this needing because of…

Continue Reading Extend 3′ UTR of a GTF file