Tag: gtf

Head of bioinformatics – Lausanne

Head of bioinformatics Introduction UNIL is a leading international teaching and research institution, with over 5,000 employees and 17,000 students split between its Dorigny campus, CHUV and Epalinges. As an employer, UNIL encourages excellence, individual recognition and responsibility. The Lausanne Genomic Technologies Facility (GTF) is a service platform working for…

Continue Reading Head of bioinformatics – Lausanne

Why are there more lincRNA genes than transcripts in Mus_musculus.GRCm38.100.gtf?

Why are there more lincRNA genes than transcripts in Mus_musculus.GRCm38.100.gtf? 1 Hi All, There are 52446 annotated genes (ENSMUSG IDs) and 142,699 transcripts (ENSMUST IDs) in Mus_musculus.GRCm38.100.gtf. It makes sense that there are WAY more transcripts than genes. My question, however, is – why are there more genes than transcripts…

Continue Reading Why are there more lincRNA genes than transcripts in Mus_musculus.GRCm38.100.gtf?

MARS seq alingment

MARS seq alingment 0 Hello everyone, new here and also new to the field. was asked to create a pipeline for RNA seq and after two months of self learning of how to interact with each code im stuck with the program STAR. what im trying to do for now…

Continue Reading MARS seq alingment

genbank to GTF in galaxy

genbank to GTF in galaxy 0 Hi all, I am working on galaxy and have a genome file in genbank format. To use featurecounts for my RNAseq, I need to convert the genbank format to a GTF format because that’s the format the featurecounts tool in galaxy expects. Now, I…

Continue Reading genbank to GTF in galaxy

htseq-count python tutorial attribute counts error

Hello, I’m following the htseq-count tutorial for RNA-seq (counting the overlapping genes and exons) here htseq.readthedocs.io/en/master/tour.html. However, when I get to the point where I need to find the overlaps in the .sam file and .gtf file, I get an error. This is the code I ran originally that gave…

Continue Reading htseq-count python tutorial attribute counts error

“Paired-end reads were detected in single-end read library”

“Paired-end reads were detected in single-end read library” 0 @9cb59de3 Last seen 12 hours ago United States Hello, I am using “featureCounts” in Rsubread package for analyzing bulk RNA-seq of drosophila. Since there is no inbuilt annotations of drosophila, I am trying to use a gtf file in the homepage…

Continue Reading “Paired-end reads were detected in single-end read library”

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree 0 When I run circm6A (github.com/canceromics/circm6a) example code: cd ../.. java -Xmx16g -jar circm6a.jar -ip test_data/HeLa_eluate_rep_1.chr22.bam -input test_data/HeLa_input_rep_1.chr22.bam -r test_data/gencode_chr22.gtf -g test_data/hg38_chr22.fa -o test_data/example_Hela The following error occurred: Start at 2021-12-12 16:33:26 Exception in thread “main” java.lang.NoClassDefFoundError: htsjdk/samtools/util/IntervalTree at main.Method.loadGenes(Method.java:200) at main.Method.run(Method.java:66) at main.Main.main(Main.java:9) Caused by: java.lang.ClassNotFoundException: htsjdk.samtools.util.IntervalTree…

Continue Reading NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

Indexing with STAR

Indexing with STAR 0 Hello, I am working with RNA seq data and creating an index of reference genome Gossypium hirsutum by using STAR. STAR asks GTF annotation format while my file is GFF3. According to literature, in order to run GFF file I need to remove –sjdbOverhang 50 and…

Continue Reading Indexing with STAR

For Differential Gene Expression , which indexing format is better: GFF or GTF?

For Differential Gene Expression , which indexing format is better: GFF or GTF? 0 Hello, I am working on DGE and wish to create reference index for mapping. Two file formats are used for it GFF and GTF. My question is: What is the major difference between GTF and GFF?…

Continue Reading For Differential Gene Expression , which indexing format is better: GFF or GTF?

htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’

htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’ 0 I am trying to run htseq-count (v. 0.13.5) on a sorted and indexed bam file. The command I entered looks like this: htseq-count -f bam -r pos -s yes -t CDS -i gene_id -m union filename_sorted.bam filename.gtf I get the following…

Continue Reading htseq-count Error ‘_StepVector_Iterator_obj’ object has no attribute ‘next’

get rRNA FASTA file for a particular bacteria

get rRNA FASTA file for a particular bacteria 0 Hey all, I was trying to find a way to get all rRNA (5S, 16S and 23S) FASTA sequences for a particular bacteria (B. thetaiotaomicron VPI-5482, which is the type strain). I wanted this file so that I could use something…

Continue Reading get rRNA FASTA file for a particular bacteria

Convertion Of Gff3 To Gtf

Convertion Of Gff3 To Gtf 3 How do I convert GFF file to a GTF file? Is there any tool available? gtf gff • 79k views The easiest way is to use the gffread program that comes with the Cufflinks software suite (Tuxedo) gffread my.gff3 -T -o my.gtf See gffread…

Continue Reading Convertion Of Gff3 To Gtf

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Hello, I’m following the htseq-count tutorial for RNA-seq (counting the overlapping genes and exons) here htseq.readthedocs.io/en/master/tour.html. However, when I get to the point where I need to find the overlaps in the .sam file and .gtf file, I get an error. This is the code I ran originally that gave…

Continue Reading HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

Tximport in usegalaxy

Tximport in usegalaxy 0 Devon Ryan: Please help in resolving this issue. How to use tximport in usegalaxy to convert transcript ID(DESEQ2-SALMON) to gene ID. I want to get gene ids from the results of deseq2(salmon) . Which GTF should be used for tximport. Iam getting the following error in…

Continue Reading Tximport in usegalaxy

Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2

A detailed protocol of differential expression analysis methods for RNA sequencing was provided: limma, EdgeR, DESeq2. Three differential expression analysis methods for RNA sequencing:limma, EdgeR, and DESeq2. Open the RStudio program and then load R file, DEGs. The file can be acquired from supplementary files.One. Downloading and pre-processing of data.1.1….

Continue Reading Three Differential Expression Analysis Methods for RNA Sequencing: limma, EdgeR, DESeq2

Gene Expression Prediction from DNA sequences

Gene Expression Prediction from DNA sequences 1 Hi everyone! I am a university student working on my Master’s thesis. I worked on a paper called Xpresso which has the purpose to predict the gene expression levels starting from DNA sequences using deep learning techniques. Now, my lecturers have asked me…

Continue Reading Gene Expression Prediction from DNA sequences

featureCounts difference assigned reads summary file and summed up reads in feature count matrix

featureCounts difference assigned reads summary file and summed up reads in feature count matrix 0 Dear all, this might be a naive question but my googlefoo fails me. I count reads from a bam, aligend by Star against a custom hg19 genome, after running picard markDuplicates, then counting reads assigned…

Continue Reading featureCounts difference assigned reads summary file and summed up reads in feature count matrix

TCGA transcriptome data to R (DESeq2)

This seems to be frequently asked question, so here is a robust method to fully recapitulate the counts given by TCGA and port it to DESeq2. Why the long way? Tanya and I noticed via TCGA-Biolinks and Firehose did not generate the full count matrix. ~5-10% of genes were missing…

Continue Reading TCGA transcriptome data to R (DESeq2)

Adding repeats in a genome fasta at a particular location without messing up the annotations?

Adding repeats in a genome fasta at a particular location without messing up the annotations? 0 I want to add a bunch of expanded repeats in a genome fasta file, for eg. 100 ATTs at a particular location eg Chr1-1:2. How do I that and at the same time update…

Continue Reading Adding repeats in a genome fasta at a particular location without messing up the annotations?

Error “start too small” when running htseq-count on a sorted .bam file

Error “start too small” when running htseq-count on a sorted .bam file 0 Hello, This is my first time aligning scRNA-seq reads to a reference genome to analyze differential gene expression. I am using htseq-count to obtain count files for my different samples and I am receiving the following error:…

Continue Reading Error “start too small” when running htseq-count on a sorted .bam file

Getting errors trying to run rmats

Getting errors trying to run rmats 1 Hi, I am trying to use rmats for splice variation analysis through ssh using slurm after loading rmats module, these are commands that I tried and errors they produced rmats –s1 $PWD/control.txt –s2 $PWD/pdac.txt –gtf mm10/mm10.refGene.gtf Python programming language version 3.6.8 loaded. GNU…

Continue Reading Getting errors trying to run rmats

GTF file danio rerio

GTF file danio rerio 1 I’ve been trying to find a GTF file for the zebrafish genome and have been unsuccessful. Would anyone be able to point me in the right direction rerio GTF zebrafish danio • 35 views • link updated 43 minutes ago by GenoMax 110k • written…

Continue Reading GTF file danio rerio

featurecounts in command line

featurecounts in command line 0 I’m trying to convert my bam files to count data with the help of feature counts in command line, I used the code: featurecounts -T 8 -a /Users/ria/Desktop/bowtie_2/GCF_000001405.39_GRCh38.p13_genomic.gtf -g ‘transcrip_id’ -o readcounts/readcount1.txt bam files/-.bam (readcounts is a the directory for dumping the output) the error…

Continue Reading featurecounts in command line

Error when using featurecounts

I am doing some RNA analysis and am having issues trying to generate count data. I mapped my reads to a reference genome fasta file (genbank fasta file from ncbi) using bbmap and .sam files as the output. I am now trying to use featurecounts to generate count data but…

Continue Reading Error when using featurecounts

STAR index generation for bacterial genome

STAR index generation for bacterial genome 0 Hi, I’m trying to analyze RNA-Seq data for a bacteria – Mycobacterium tuberculosis. I used the FASTA and GTF files from NCBI to create the index, and set the –genomeSAindexNbases at 8 based on this previous post. The bash script I used is:…

Continue Reading STAR index generation for bacterial genome

Generate a count matrix for a consensus peak set and related peaks/reads

Hi everyone, I have a consensus peak file (.bed) that have, Chr, start, end. It doesn’t have any header. Something like this: chr1 721000 726999 chr1 817800 821799 chr1 1027400 1030799 chr1 1033600 1037599 chr1 1047400 1050399 I want to generate a count matrix for my downstream analysis and differential…

Continue Reading Generate a count matrix for a consensus peak set and related peaks/reads

How to infer the TSS from a gtf file

I have a csv file that I’ve downloaded from this paper here (abridged version below), and it’s referring to CpG locations in the genome from beadChip array data, using assembly hg18 as follows: CpGmarker Build Chr MapInfo SourceVersion TSS_Coordinate Gene_Strand Symbol Synonym Accession GID cg00075967 36 15 72282407 36.1 72282245…

Continue Reading How to infer the TSS from a gtf file

How To Convert Gencode Gtf Into Bed Format ?

My solution, based on Ian’s answer: zcat ../../../data/annotations/gencode.v24.annotation.gtf.gz | awk ‘OFS=”t” {if ($3==”gene”) {print $1,$4-1,$5,$10,$16,$7}}’ | tr -d ‘”;’ | head chr1 11868 14408 ENSG00000223972.5 . + chr1 14403 29569 ENSG00000227232.5 . – chr1 17368 17435 ENSG00000278267.1 . – chr1 29553 31108 ENSG00000243485.3 . + chr1 30365 30502 ENSG00000274890.1 ….

Continue Reading How To Convert Gencode Gtf Into Bed Format ?

STAR Genome indexing (Homo_sapiens_assembly38.fasta vs. GRCh38.primary_assembly.genome.fa)

I have a a query regarding STAR alignment. I used the following commands to generate genome index. (Homo_sapiens_assembly38.fasta) STAR –runMode genomeGenerate –genomeDir /home/bsh/BC_MCFcellLine_WTS/result/STAR_indexing/ –genomeFastaFiles /data1/database/ftp.broadinstitute.org/bundle/hg38_210610_download/Homo_sapiens_assembly38.fasta –sjdbGTFfile /home/bsh/BC_MCFcellLine_WTS/gencode.v27.annotation.gtf And I used the following commands for mapping and bam file was successfully generated. STAR –runThreadN 4 –outFilterType BySJout –outFilterMismatchNmax 999 –outFilterMultimapNmax 10…

Continue Reading STAR Genome indexing (Homo_sapiens_assembly38.fasta vs. GRCh38.primary_assembly.genome.fa)

Has anyone here worked with CNCI before? EXHAUSTED

Has anyone here worked with CNCI before? EXHAUSTED 1 Has anyone here worked with CNCI before? I’m just about exhausted trying to figure out what I’m doing wrong. So, I tried the following: python CNCI.py candidate_lncs.gtf -g -o test -m ve -p 16 -d ./dbase GRCm38.primary_assembly.genome.fa and I recieved the…

Continue Reading Has anyone here worked with CNCI before? EXHAUSTED

Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor

Tutorial:Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor 3 Hi all, It took me a while to figure this out so I thought it might be useful to a few other people. When you have used htseq-count on each of your RNA-seq’ed samples and have combined all of your…

Continue Reading Extract Total Non-Overlapping Exon Length Per Gene With Bioconductor

STAR alignment in a full directory

STAR alignment in a full directory 0 Hi! I’m working with STAR and I would like to align multiple file, but separately. I have file paires like these two: Dros_01_S48_L001_R1_001.fastq.gz Dros_01_S48_L001_R2_001.fastq.gz etc. Only the S[number] changes and the R1 and R2 in the names. I have a code, what I…

Continue Reading STAR alignment in a full directory

Explanation of ENSEMBL GTF features

Explanation of ENSEMBL GTF features 4 Hi Guys, I am trying to find more info about the features in the ENSEMBL GTF file, but don’t know where to find it. I am using the hg38 GTF file from ENSEMBL, and I am interested in column 3 (feature). More specific I…

Continue Reading Explanation of ENSEMBL GTF features

How can I produce gene level quantification using Salmon pseudo-aligner?

How can I produce gene level quantification using Salmon pseudo-aligner? 3 Hi ! I am using Salmon in order to permform pseudo-alignment on paired end rna-seq data. I want a gene quantification but i obtain files cith transcripts quantification : command line used : salmon quant -i Transcriptome_GH38_release_92/Homo_sapiens.GRCh38.92.cdna.ncrna.fa_quasi_index/ -l A…

Continue Reading How can I produce gene level quantification using Salmon pseudo-aligner?

Annotation of vcftools fst output using GTF or GFF

Annotation of vcftools fst output using GTF or GFF 1 hi everyone I have an outputted file from vcftools for fst calculation in the following format CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST MEAN_FST ch1 1 40000 75 0.0516003 0.0355082 ch1 20001 60000 22 -0.00980986 -0.0205035 ch1 40001 80000 46 0.0180676 0.0236424…

Continue Reading Annotation of vcftools fst output using GTF or GFF

STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice?

STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice? 0 Hi Currently I have pair-end mouse RNA-seq data. 3 Tissues with 17 samples/tissue. Length of the sequence is from 100bp to 150bp. My goal is to perform DEG after the alignment. This is…

Continue Reading STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice?

Weird output of geneCount for STAR?

Weird output of geneCount for STAR? 1 Hi I’m using STAR to perform alignment and my goal is to do the DEG analysis in the future. The parameter I set up as follows: Step 1: STAR –runThreadN 12 –runMode genomeGenerate –genomeDir genomedir –genomeFastaFiles ./ref/GRCm39.primary_assembly.genome.fa –sjdbGTFfile ./ref/gencode.vM27.primary_assembly.annotation.gtf –sjdbOverhang 100 Step 2:…

Continue Reading Weird output of geneCount for STAR?

Error after STAR mapping

Error after STAR mapping 0 Hi, I’m doing the STAR mapping, but I get the bam files with some problems.When I use the command samtools flagstat SRR7195620_2.fastq.gz_Aligned.sortedByCoord.out.bam to see the details of the bam file,it shows this: 3266075 + 0 in total (QC-passed reads + QC-failed reads) 1044500 + 0…

Continue Reading Error after STAR mapping

is it same to use .bam file or .sam file?

.sam file was generated by following code samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam samtools index Untreated-3_s.bam .gtf file was downloaded by: wget ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz when I use htseq-count: htseq-count -s no -a 10 Untreated-3_sn.sam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count an error…

Continue Reading is it same to use .bam file or .sam file?

Exception type: ValueError, raised in libcalignmentfile.pyx:990

HTSeq-count error: Exception type: ValueError, raised in libcalignmentfile.pyx:990 0 .sam file was generated by following code samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam samtools index Untreated-3_s.bam .gtf file was downloaded by: wget ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz when I use htseq-count: htseq-count -s…

Continue Reading Exception type: ValueError, raised in libcalignmentfile.pyx:990

Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice?

Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice? 0 Hi Currently I have pair-end mouse RNA-seq data. 3 Tissues with 17 samples/tissue. Length of the sequence is from 100bp to 150bp. My goal is to perform DEG…

Continue Reading Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice?

feutureCount in the subread

feutureCount in the subread 0 Hello Everyone, I am quantifying read counts in the bam using feutureCount in the command line but getting errors below ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is ‘gene_id’ An…

Continue Reading feutureCount in the subread

Multi-fasta file for gffread

Multi-fasta file for gffread 0 Hey Guys, I’m having a problem trying to extract the transcripts from a merged StringTie .gtf file with gffread. I have downloaded the cDNA fastq file from ENSEMBL and tried to run the following command: gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf However I’m getting the…

Continue Reading Multi-fasta file for gffread

How to define the gene length for RPKM calculation

How to define the gene length for RPKM calculation 4 Hi guys, I would like to calculate the RPKM of my RNA seq experiment. To do this, as from the formula, I need to know the gene length. My starting point are the row reads (single end) counts resulting from:…

Continue Reading How to define the gene length for RPKM calculation

How to convert a .tsv annotation file to .gtf?

How to convert a .tsv annotation file to .gtf? 1 I am a first-time poster so I hope I am doing this okay. I am trying to execute a genome indexing script. The following are my parameters: –runThreadN 14 –runMode genomeGenerate –genomeDir /scratch/projects/ag_transcriptomics/transcriptomes –genomeFastaFiles /scratch/projects/ag_transcriptomics/transcriptomes/PSTR.fna –sjdbGTFfile /scratch/projects/ag_transcriptomics/transcriptomes/INDEX/WHEREMYFILEWOULDBE –sjdbOverhang 100 –sjdbGTFtagExonParentTranscript…

Continue Reading How to convert a .tsv annotation file to .gtf?

EXOM-seq counting

EXOM-seq counting 0 Hi everyone, Does anyone know where to download the human Annotating Genomes with GFF3 or GTF files. I want to apply featureCounts to quantify read counts in the bam file in the command line. featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bam Best, AD expression…

Continue Reading EXOM-seq counting

counting EXOM reads using subread featureCounts

counting EXOM reads using subread featureCounts 0 does anyone know where to download the Annotating Genomes with GFF3 or GTF files? My idea is to download one of these files in the command line or any other way and to apply featureCounts to quantify read counts in the bam file….

Continue Reading counting EXOM reads using subread featureCounts

Trouble with Ensembl statistics

Hello! I need to know how many coding genes are in the Y chromosome. The first i try to filter gtf file with R using this code #load gtf file gtf <- rtracklayer::import(‘~/lapd/Index_hum/ann/Homo_sapiens.GRCh38.104.gtf’) gtf_df=as.data.frame(gtf) ##filter gtf file library(dplyr) gtf_filt= filter(gtf_df, type==’gene’, gene_biotype == ‘protein_coding’) chrY=filter(gtf_filt, chromosome_name == ‘Y’) Thus, i…

Continue Reading Trouble with Ensembl statistics

How GenomicFeatures cdsBy() accounts for the frame info in the gff to get the CDS?

How GenomicFeatures cdsBy() accounts for the frame info in the gff to get the CDS? The info in the the 8th gff field m.ensembl.org/info/website/upload/gff.html frame – One of ‘0’, ‘1’ or ‘2’. ‘0’ indicates that the first base of the feature is the first base of a codon, ‘1’ that…

Continue Reading How GenomicFeatures cdsBy() accounts for the frame info in the gff to get the CDS?

counting(quantifying) the expression level of genes from Exom_seq

counting(quantifying) the expression level of genes from Exom_seq 0 Hi everyone, I am quantifying the expression level of genes from EXOM seq. I used featureCount function and passed the bam file in the command line as stated in the biostar hand book page number 792 but got errors as shown…

Continue Reading counting(quantifying) the expression level of genes from Exom_seq

Error at parsing .tlst line (invalid strand): 3210 PGSC0003DMT400030180 ST4.03ch12. 56298573-56298656

Error at parsing .tlst line (invalid strand): 3210 PGSC0003DMT400030180 ST4.03ch12. 56298573-56298656 0 Hi, I am new to RNA-seq and am trying to map paired-end reads with TopHat but I keep getting this error: [2021-11-15 19:37:57] Beginning TopHat run (v2.1.1) ———————————————– [2021-11-15 19:37:57] Checking for Bowtie Bowtie version: 2.4.4.0 [2021-11-15 19:37:58]…

Continue Reading Error at parsing .tlst line (invalid strand): 3210 PGSC0003DMT400030180 ST4.03ch12. 56298573-56298656

3′ bias on polyA RNAseq goes wrong!

Hi! I’m checking the output from gene_bodyCoverage.py using a bed file with housekeeping genes. We sequenced some samples using a polyA capture library (all in the same kit, same sequencing run), so I would expect a 3′ bias. However, the output I got was somewhat mixed – I see some…

Continue Reading 3′ bias on polyA RNAseq goes wrong!

Why does featurecounts give me an output file with only 0s?

Why does featurecounts give me an output file with only 0s? 0 Hello, I’m trying to run featurecounts on my .bam files, but the resulting file yields only 0s in every row and column. Here are the steps I have taken so far: (de novo) Assembled 40 transcripts from RNASeq…

Continue Reading Why does featurecounts give me an output file with only 0s?

How to use htseq-count with several samples ?

How to use htseq-count with several samples ? 1 Does anyone know how to use htseq-count with several samples ? We can use htseq-count like : htseq-count sample1.sam reference.gtf > result.count.txt We can get sample1’s count data by above command. But, it is usual that we have more than two…

Continue Reading How to use htseq-count with several samples ?

Tophat running error

Tophat running error 1 Hello, I’m trying to run tophat and I’m getting the following error that I don’t know how to fix it. Appreciate any help. [2018-05-11 11:14:58] Checking for Bowtie Bowtie version: 2.2.9.0 [2018-05-11 11:14:58] Checking for Bowtie index files (genome).. [2018-05-11 11:14:58] Checking for reference FASTA file…

Continue Reading Tophat running error

mm39 genePred file

mm39 genePred file 1 Hello, i need a gene annotation file for the mm39 mouse genome in the genePred format. I found that there is a utility which can convert the information from the gtf format. However, where I would download the gtf file it says that it was created…

Continue Reading mm39 genePred file

StringTie merged transcripts for two different conditions

StringTie merged transcripts for two different conditions 0 Hi guys, When using StringTie for finding new transcripts in two different conditions, for example treatment_A and treatment_B, do I need to have two different (one for each condition) merged .gtf files? For later running StringTie -eB for samples of treatment_A with…

Continue Reading StringTie merged transcripts for two different conditions

Transform a GTF file into a data frame in R

Transform a GTF file into a data frame in R 4 Hi, I would like to analyse the content of a GTF file. I am quite able with R and dplyr, so I would like to transform my GTF file into a data frame to facilitate my analysis. Does anybody…

Continue Reading Transform a GTF file into a data frame in R

Simplify annotation file by collapsing gene isoforms into a single annotation per gene

Simplify annotation file by collapsing gene isoforms into a single annotation per gene 1 Hi, I need a simplified annotation file that contains a single “complete” annotation for each gene of the human genome. In other words, what I need is similar to when an annotation track in the UCSC…

Continue Reading Simplify annotation file by collapsing gene isoforms into a single annotation per gene

Generate GTF file

Generate GTF file 1 Is there a tool that can generate a GTF from a fasta file -eg GCF_000001405.39_GRCh38.p13_genomic.fna? I can convert the fasta to bed, and know that BEDOPS can convert GFF to BED, but I want to go the other way. Either GFF or GTF is fine as…

Continue Reading Generate GTF file

Splice sequence indexing failed with err =127

Tophat2 Error: Splice sequence indexing failed with err =127 0 I’ve been trying to map my RNA-seq results onto an entire genome, and I’ve encountered a problem with splices. The script.pbs I submitted to cluster servers is: #PBS -N tophat_cufflinks_1 #PBS -o tophat_cufflinks_1_out.txt #PBS -e tophat_cufflinks_1_error_out.txt #PBS -l nodes=cu01:ppn=24 export…

Continue Reading Splice sequence indexing failed with err =127

Finding counts of lncRNAs with htseq-count /featurecounts

Finding counts of lncRNAs with htseq-count /featurecounts 0 Hi, I’m trying to find the counts of novel and known lncRNA transcripts in humans and I have a GTF file already of these transcripts. However, I’m unsure about the following: should the input GTF file for HTSeq count or featurecounts be…

Continue Reading Finding counts of lncRNAs with htseq-count /featurecounts

Help for extraction of fasta sequences

Hello everyone, I hope you are well. I am writing this post because I have a question or rather I have a problem with my workflow. Perform a workflow for RNA-seq processing as follows: quality control – Hisat2 – Stringtie – Deseq2 A simple, normal workflow that threw me important…

Continue Reading Help for extraction of fasta sequences

featureCounts low annotation rate RNA-seq

Hey everybody! I am trying to annotate my RNA-seq files (paired-end) with featureCounts. However, I keep having a quite low annotation rate, ~35-36% for all the files. My command line is the following: featureCounts -T 6 -p -s 2 -a annotation_file.gtf -o output_file.txt input_files.bam I am using the parameter -s…

Continue Reading featureCounts low annotation rate RNA-seq

STAR producing an empty BAM file

STAR producing an empty BAM file 0 I’m trying to run STAR but I am getting an empty BAM file. Does anyone know why this is happening and how to fix it? iCount mapstar demultiplexed/demux_NNNGGCGNN.fastq.gz hs88 mapping_NNNGGCGNN > –annotation homo_sapiens.88.gtf.gz #for context, mapstar needs the following arguments reads, genome_index, out_dir…

Continue Reading STAR producing an empty BAM file

replacing ensembl ID with the gene symbol?

Im a noob with a very unclear idea of what I am doing, but I’m doing my best. The other day, the ncbi webpage for downloading genomes and GTF files was down. As a result, I had to do my analysis on this RNA seq data using the ensembl files,…

Continue Reading replacing ensembl ID with the gene symbol?

How to find/extract sequences of protein coding genes and Transposable elements from rice transciptome.gtf file ?

Forum:How to find/extract sequences of protein coding genes and Transposable elements from rice transciptome.gtf file ? 0 Hello researchers, i am stuck’ed in project, required effective solutions, 1) how to find out protein coding genes and transposable elements from rice transcriptome.gtf file ? 2) how to extract out protein coding…

Continue Reading How to find/extract sequences of protein coding genes and Transposable elements from rice transciptome.gtf file ?

Bioconductor – Bioconductor 3.14 Released

Home Bioconductor 3.14 Released October 27, 2021 Bioconductors: We are pleased to announce Bioconductor 3.14, consisting of 2083 software packages, 408 experiment data packages, 904 annotation packages, 29 workflows and 8 books. There are 89 new software packages, 13 new data experiment packages, 10 new annotation packages, 1 new workflow,…

Continue Reading Bioconductor – Bioconductor 3.14 Released

Low mapping frequency on STAR

Low mapping frequency on STAR 0 Hi all, I’ve been trying to re-map some RNA-seq fasta files to mm39 using STAR. I was told by the sequencing facility who ran the requencing for me that when they mapped the reads onto mm10 using BWA MEM their mapping frequency for each…

Continue Reading Low mapping frequency on STAR

How should I deal with my RNAseq result with high % of mutiple aligned reads?

How should I deal with my RNAseq result with high % of mutiple aligned reads? 1 Here is one of my result from STAR alignment. As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74% This sounds not that bad because more than…

Continue Reading How should I deal with my RNAseq result with high % of mutiple aligned reads?

Trouble to generate genome indexes for human RNA-seq reads in STAR

Trouble to generate genome indexes for human RNA-seq reads in STAR 0 Hi everyone, I´m a masters student and new at bioinformatics. My research is in RNA-Seq analysis, and right now I´m having trouble to generate the genome indexes prior to the alignment in STAR. A couple minutes after lauching…

Continue Reading Trouble to generate genome indexes for human RNA-seq reads in STAR

RNA-seq analysis, wrong genome build

RNA-seq analysis, wrong genome build 2 Hi, I have been doing a feature count and subsequent differential gene expression analysis on some RNA-seq samples which I now suspect is giving me poor results because I used a GRCm39 feature file from Ensembl but bam files which I suspect were aligned…

Continue Reading RNA-seq analysis, wrong genome build

Qualimap whole exome sequencing depth of coverage

I’m trying to calculate the depth of coverage from my WXS data. Using Qualimap, I first used as the feature file the Gencode human genome (release 38) .gtf file associated with the genome I aligned to: feat=”gencode.v38.primary_assembly.annotation.gtf” for ea in *bam do $qualimap bamqc –java-mem-size=20G -bam $ea –feature-file $feat done;…

Continue Reading Qualimap whole exome sequencing depth of coverage

How to deal with high % of reads mapped to multiple loci on STAR?

How to deal with high % of reads mapped to multiple loci on STAR? 0 Here is one of my result from STAR alignment. As you could see, Uniquely mapped reads: 32.37% and % of reads mapped to multiple loci: 60.74% This sounds not that bad because more than 90%…

Continue Reading How to deal with high % of reads mapped to multiple loci on STAR?

Extend 3′ UTR of a GTF file

Tool:Extend 3′ UTR of a GTF file 1 Hello guys. Some times ago I’ve asked here if there’s an existing approach designed to extend 3′ terminus of genes by a provided length: I received no answers, because apparently there’s no one. In my team we encountered this needing because of…

Continue Reading Extend 3′ UTR of a GTF file

StringTie creates .tsv value that has null values for every gene id

Isoform Analysis: StringTie creates .tsv value that has null values for every gene id 0 Hi everyone, I am new to bioinformatics and Biostars as a whole. I am doing isoform analysis on some samples and I’ve come across a problem. The following code is what I used for StringTie….

Continue Reading StringTie creates .tsv value that has null values for every gene id

Paired-end reads somehow counted twice?

Paired-end reads somehow counted twice? 0 Hi. I’m new in Bioinformatics and try to extract read counts from fastq files. I compared my result with answer count matrix, and read counts are doubled. (Left one is from the answer read count matrix, and right one is my result.) I used…

Continue Reading Paired-end reads somehow counted twice?

featureCounts has low successfully assigned reads

featureCounts has low successfully assigned reads 1 After finishing STAR two-step alignment, I got 62% uniquely mapped reads. But featureCounts gives me only 17% successfully aligned rate. featureCounts -T 8 -F GTF -p –countReadPairs -t exon -g gene_id -a ~/genome_ref/gencode.v38.annotation.gtf -o ~/expression/all_counts.txt *.bam I also have a look at other…

Continue Reading featureCounts has low successfully assigned reads

Technical Support Specialist – BioInformatics – Invitae

POSITION SUMMARYThe Technical Support Specialist provides first level technical support on Invitae Somatic Oncology products from the Invitae office in Boulder, CO. This individual will escalate customer inquiries effectively, collect customer feedback and share this with internal teams to improve products. This individual will assist in coordinating activities for special…

Continue Reading Technical Support Specialist – BioInformatics – Invitae

CrossMap issues changing genome version

CrossMap issues changing genome version 0 I installed CrossMap for conversion of bam, bed, gtf files from one version to another. I use GENCODE so I need to make the files compatible with UCSC browser. I am running into two issues with this tool… If I use following command to…

Continue Reading CrossMap issues changing genome version

Technical Support Specialist – BioInformatics – Invitae (Formerly ArcherDx)

POSITION SUMMARYThe Technical Support Specialist provides first level technical support on Invitae Somatic Oncology products from the Invitae office in Boulder, CO. This individual will escalate customer inquiries effectively, collect customer feedback and share this with internal teams to improve products. This individual will assist in coordinating activities for special…

Continue Reading Technical Support Specialist – BioInformatics – Invitae (Formerly ArcherDx)

GTF upload error UCSC related to stradedness

GTF upload error UCSC related to stradedness 0 Hello, I’m having issues with uploading a .gtf file to the UCSC browser. I am getting the following error: “Error GFF/GTF group STRG.155047.1 on chr12+, this line is on chr12-, all group members must be on same seq and strand” I have…

Continue Reading GTF upload error UCSC related to stradedness

Technical Support Specialist – BioInformatics at Invitae

POSITION SUMMARYThe Technical Support Specialist provides first level technical support on Invitae Somatic Oncology products from the Invitae office in Boulder, CO. This individual will escalate customer inquiries effectively, collect customer feedback and share this with internal teams to improve products. This individual will assist in coordinating activities for special…

Continue Reading Technical Support Specialist – BioInformatics at Invitae

What is bigwig file?

Asked by: Vada Ratke Score: 4.7/5 (25 votes) BigWig is a file format for display of dense, continuous data in a genome browser track, created by conversion from Wiggle (WIG) format. BigWig format is described at the UCSC Genome Bioinformatics web site, and the Broad Institute file format guide provides…

Continue Reading What is bigwig file?

What files (fasta, GTF) do I need for RNA seq analysis

What files (fasta, GTF) do I need for RNA seq analysis 1 I am very new to programming in general, and I’m trying my best to teach myself R for analyzing RNA-seq data we have. I am using this guide and have gotten to the step where I need to…

Continue Reading What files (fasta, GTF) do I need for RNA seq analysis

DEXSeq prepare annotation script throws “object has no attribute ‘next'” for Ensemble GTFs

DEXSeq prepare annotation script throws “object has no attribute ‘next’” for Ensemble GTFs 0 @24764cda Last seen 23 hours ago United States Hi there, I am trying to run the dexseq_prepare_annotation.py script and the code keeps failing after parsing the first line of the gtf. Specifically, the code is failing…

Continue Reading DEXSeq prepare annotation script throws “object has no attribute ‘next'” for Ensemble GTFs

Rsubread FeatureCounts return 0.0% assigned

Using featureCounts in the Rsubread package I am getting 0 annotations. I started from raw sequencing data and the Refseq genome and Refseq Genomic GTF files downloaded from here: www.ncbi.nlm.nih.gov/assembly/GCF_000001635.27/ through the download assembly button on the side. I had the top option to RefSeq for both downloads and chose…

Continue Reading Rsubread FeatureCounts return 0.0% assigned

How can I be sure that raw read counts are well processed from fastq files?

How can I be sure that raw read counts are well processed from fastq files? 0 Hi. I’m new in bioinformatics and try to process fastq files for getting raw read count matrix. I downloaded fastq files from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452 I used fasterq -dump to download fastq files from SRR Aligned…

Continue Reading How can I be sure that raw read counts are well processed from fastq files?

Aligning multiple fastq files with genome in one script/one line with STAR

Hi there! This is probably a VERY basic question but I don’t have the best terminal skills so I’m struggling a little. I want to apply what I wrote below for all my fastq scripts without doing a for loop or manually writing the code for each (ideally they all…

Continue Reading Aligning multiple fastq files with genome in one script/one line with STAR

kallisto genomebam error

Hello! I am trying to produce bam files to load to igv after kallisto quant with –genobam option. This is the code that I am using kallisto quant -i Homo_sapiens.GRCh38.cdna.all.release-100.idx -o $FOLDER -t 12 –genomebam -g Homo_sapiens.GRCh38.100.gtf.gz -c hg38.chrom.sizes_clean_tab.txt –rf-stranded ${FILES[$i]} ${FILES[$i+1]} My chromosome file looks like that, tabbed and…

Continue Reading kallisto genomebam error

cellranger count output does not give all genes.

cellranger count output does not give all genes. 0 Dear all, I have recently started using the cellranger from 10x for scRNA-seq data, after having used my own pipeline (with STAR alignment) for smart-seq2 data, to get the count matrix for then later analyzing with Seurat or Scanpy. I have…

Continue Reading cellranger count output does not give all genes.

kallisto genomebam not showing reads on igv

Hello! I am trying to produce bam files to load to igv after kallisto quant with –genobam option. After producing and loading the pseudoalignment bam to the igv, it is empty. This is my initial command: kallisto quant -i Homo_sapiens.GRCh38.cdna.all.release-100.idx -o pseudo -t 10 –genomebam -g Homo_sapiens.GRCh38.100.gtf -c hg38.chrom.sizes R1.fastq.gz.trim_1.fq.gz…

Continue Reading kallisto genomebam not showing reads on igv

Comparative cellular analysis of motor cortex in human, marmoset and mouse

Statistics and reproducibility For multiplex fluorescent in situ hybridization (FISH) and immunofluorescence staining experiments, each ISH probe combination was repeated with similar results on at least two separate individuals per species, and on at least two sections per individual. The experiments were not randomized and the investigators were not blinded…

Continue Reading Comparative cellular analysis of motor cortex in human, marmoset and mouse

TPMCalculator returns zero for some genes, which are non-zero with Cuffdiff (FPKM).

TPMCalculator returns zero for some genes, which are non-zero with Cuffdiff (FPKM). 0 I ran TPMCalculator on my RNA-seq data. This was my first time using this package. It seemed to have done without any problems, but I realized that counts of some genes are zero. I performed DEG analysis…

Continue Reading TPMCalculator returns zero for some genes, which are non-zero with Cuffdiff (FPKM).

Regulation of prefrontal patterning and connectivity by retinoic acid

Data reporting No statistical methods were used to predetermine sample size.  Data collection was performed by independent investigators. Prior to data analysis, all experiments were randomized and analysed by independent blinded observers. Analysis of human and macaque transcriptomic data Developing human and macaque brain RNA-seq data (counts file) with the…

Continue Reading Regulation of prefrontal patterning and connectivity by retinoic acid

Parsing transcript version in Ensembl mouse annotation

Parsing transcript version in Ensembl mouse annotation 1 Hi all, I aligned some data to a Ensembl transcriptome with novel transcripts. I am trying to lift over the sites from transcriptome to genome, which I have previously done using the R package genomicRanges. The Ensembl FASTA headers look like this…

Continue Reading Parsing transcript version in Ensembl mouse annotation

RSEM problem

RSEM problem 0 Good day all! I’m running RSEM as a batch job on a cluster: rsem-prepare-reference –gtf /home/tbeckett/lustre/honours_project/RSEM/ref/Mus_musculus.GRCm39.103.chr.gtf –bowtie2 –bowtie2-path software/bowtie2/2.3.4 /home/tbeckett/lustre/honours_project/RSEM/ref/Mus_musculus.GRCm39.dna.toplevel.fa /home/tbeckett/lustre/honours_project/RSEM/ref/mouse_ref I recieved an error: Transcript ENSMUST00000132294 is out of chromosome 7’s boundary! Any idea what this error might be? Also, the second line when specifying bowtie,…

Continue Reading RSEM problem

How to filter nanopore transcriptome alignments to trust 3′ ends?

How to filter nanopore transcriptome alignments to trust 3′ ends? 0 I have direct RNA data mapped to the gencode transcriptome with minimap2. Finding the ‘true’ transcript of origin for a read is nontrivial as there are many secondary alignments with very close alignment scores to the primary. After visualising…

Continue Reading How to filter nanopore transcriptome alignments to trust 3′ ends?

How To Convert Bed Format To Gtf?

How To Convert Bed Format To Gtf? 4 Hello, I’ve seen a lot of posts that convert gtf to bed files. However, i have a bed file that I’m trying to convert to gtf. Is there any tool that can convert bed->gtf?? Thanks, convert bed gtf • 19k views •…

Continue Reading How To Convert Bed Format To Gtf?

align_and_estimate_abundance error Trinty

align_and_estimate_abundance error Trinty 0 Hello, I am trying to prepare a reference for alignment and abundance estimation. I have taken the transcriptome fasta file, do I need to use genomic fasta file or gtf file? I don’t understand this point. pl guide me I am using this code perl /cabinfs/opt/applications/trinity/trinityrnaseq-Trinity-v2.5.1/util/align_and_estimate_abundance.pl…

Continue Reading align_and_estimate_abundance error Trinty