Tag: gtf

Problem with DRAGEN RNAseq hashtable directory

Problem with DRAGEN RNAseq hashtable directory 1 Dear all, Recently I wrote a code to work with DRAGEN and RNAseq pipeline. I use this command: /opt/edico/bin/dragen -f -l \ -r refdir \ -1 ${forward} \ -2 ${reverse} \ -a ${gtf} \ –output-dir output/${sample} \ –output-file-prefix ${sample} \ –RGID ${sample}_group_id \…

Continue Reading Problem with DRAGEN RNAseq hashtable directory

Trying to understand STAR fastqLog.final.out File

Trying to understand STAR fastqLog.final.out File 0 Hello, I am analyzing ribo-seq data and am trying to understand if my interpretation of star’s log file is correct. I do not have extensive bioinformatics/computational experience, so it’s been a bit difficult trying to understand how to proceed (the guides online are…

Continue Reading Trying to understand STAR fastqLog.final.out File

How to setup the pipeline of the RNA-Seq FASTQ file processing (macOS version)

This is a guide for preparing for importing RNA-Seq FASTQ files to Subio Platform on a Mac computer. If you use a Windows10 machine, please go to the guide for Windows10. Subio Platform utilizes the following tools to process the RNA-Seq FASTQ files. fastp to trim adapters and filter low-quality…

Continue Reading How to setup the pipeline of the RNA-Seq FASTQ file processing (macOS version)

FeatureCounts Invalid Parameter Error

FeatureCounts Invalid Parameter Error 0 Hello! I’m trying to use featureCounts, and it keeps on giving me this error: ERROR: invalid parameter: ‘SRR11860547.bam’ I’m pretty new at using featureCounts, so I have no clue what is wrong. I’ve tried changing the directory and location of the file, but it keeps…

Continue Reading FeatureCounts Invalid Parameter Error

A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Mouse breeding and husbandry All experimental procedures related to the use of mice were approved by the Institutional Animal Care and Use Committee of the AIBS, in accordance with NIH guidelines. Mice were housed in a room with temperature (21–22 °C) and humidity (40–51%) control within the vivarium of the AIBS…

Continue Reading A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain

Mouse brain tissues All experimental procedures using live animals were approved by the Salk Institute Animal Care and Use Committee under protocol number 18-00006. Adult (P56) C57BL/6J male mice were purchased from the Jackson Laboratory at 7 weeks of age and maintained in the Salk animal barrier facility on 12-h dark–light…

Continue Reading Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain

how to merge human reference genome and GTF file with a custom sequence.

Hello Biostars, I am looking for some guidance on how to merge some files for my rna-bulk sequencing analysis. Let me start by describing the problem: I recieved an mRNA sequence of 4775 characters which I would like to merge with the human reference genome that I download from NCBI…

Continue Reading how to merge human reference genome and GTF file with a custom sequence.

Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder

Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder 0 Hi all, I am running a workflow to identify single copy orthogroups from RNAseq data including 9 species in a family of non-model organisms. All 9 species are closely related enough that they can be aligned to…

Continue Reading Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder

Convert NCBI Downloaded files to ANNOVAR format

Convert NCBI Downloaded files to ANNOVAR format 0 I have been trying to understand from the ANNOVAR documentation and other sites the steps needed to make these files from NCBI available to ANNOVAR. I admit to being new to bioinformatics, but have been a software developer for 30+ years. My…

Continue Reading Convert NCBI Downloaded files to ANNOVAR format

Read count vs Depth

Hi! I have been RNA seq short read sequencing data for a 112 dengue samples. I need to know by what percentage transcriptome is covered by our sequencing reads? I found Bedtools as an appropriate tool for this. however, i am unable to understand two different outputs from this tool…..

Continue Reading Read count vs Depth

Very low successfully assigned alignments with feature counts

Hello everyone, I am stuck trying to analyze some single-end RNAseq data from human tissue. My issue is that the alignment with HISAT 2 went very well: 94.95% overall alignment rate. However, when I use featureCounts, I get: 5.7% when I set the strandSpecific parameter to 1. 5.3% when I…

Continue Reading Very low successfully assigned alignments with feature counts

How To Install bedtools on Debian 11

In this tutorial we learn how to install bedtools on Debian 11. bedtools is suite of utilities for comparing genomic features Introduction In this tutorial we learn how to install bedtools on Debian 11. What is bedtools bedtools is: The BEDTools utilities allow one to address common genomics tasks such…

Continue Reading How To Install bedtools on Debian 11

Annotation GTF/GFF Arabidopsis thaliana

Annotation GTF/GFF Arabidopsis thaliana 0 Hello, this is my first time working with Arabidopsis and I am quantifying with featureCounts as follows: featureCounts -p –countReadPairs -t exon -g gene_id -a ../genome_arabidopsis/Arabidopsis_thaliana.TAIR10.57.gtf -o SRR14059988.txt ../alignment_hisat2/SRR14059988_sorted.bam However, in my counts I am having counts associated with long non conding, ribosomals, mitochondrial and…

Continue Reading Annotation GTF/GFF Arabidopsis thaliana

An issue with gtf file (ballgownrsem)

An issue with gtf file (ballgownrsem) 0 Hi everyone, When I tried to run ballgownrsem I encountered an issue which was caused by the inappropriately structured GTF file. Also, I tried to run code which is part of ballgownrsem and did not find the root of the issue. GTF files…

Continue Reading An issue with gtf file (ballgownrsem)

Bam files generated with STAR cause a segmentation fault core dump error when used with another tool

I am mapping RNA-Seq data using STAR, using multi-sample two-pass mapping. I first mapped all samples with one-pass then concatenated their SJOut files and filtered junctions. I launched the second mapping by using this SJOut file. I used this command to generate genome : ` /home/STAR-2.7.10b/bin/Linux_x86_64/STAR \ –runThreadN 10 \…

Continue Reading Bam files generated with STAR cause a segmentation fault core dump error when used with another tool

Fastest way to convert BED to GTF/GFF with gene_ids?

This is probably a duplicated question from: How To Convert Bed Format To Gtf? How to convert original BED file to a GTF ? Converting different annotation file formats (GTF/GFF/BED) to each other How to change scaffold.fasta file or scaffold.bed file to GTF file? Convert bed12 to GFF convert bed12…

Continue Reading Fastest way to convert BED to GTF/GFF with gene_ids?

Compute matrix skipping many regions stating not found in compute matrix output

Compute matrix skipping many regions stating not found in compute matrix output 0 Hello all I am working on Chip seq data and for generating the TSS Plots when I am computing the matrix it is giving a very long list stating “skipping NR_049895_r4, due to being absent in the…

Continue Reading Compute matrix skipping many regions stating not found in compute matrix output

Viral genes not showing up in combined mouse+virus alignment

Viral genes not showing up in combined mouse+virus alignment 1 I created a combined MHV-A59 and mm10 fasta and GTF file using the linux cat command. The last two entries of the mm10 and first two of the A59 of the combined GTF looks like this: I then made a…

Continue Reading Viral genes not showing up in combined mouse+virus alignment

Generate Read counts from bam file

Generate Read counts from bam file 2 Currently i am working on a project related to LHON disease (rare mitochondrial disorder which leads to progressive visual loss). I have 9 RNA-seq fastq files out of which 3 are for carriers, 3 for affected and 3 for control. Data downloaded is…

Continue Reading Generate Read counts from bam file

low rate of ‘Successfully assigned alignments’

Hello everybody. I’m a newbie in RNA-seq Analysis, and I have this situation that I don’t really understand. While working with featureCounts for RNA-seq read quantification, I came across an intriguing issue. The rate of successfully assigned alignments turned out to be unexpectedly low, totalling just 15463270 (7.6%). This was…

Continue Reading low rate of ‘Successfully assigned alignments’

hg38 RepeatMasker v4.0.7 Dfam_2.0 question

Dear UCSC Genome Browser Support Team:   I hope this message finds you well.   I am writing to you as a postdoctoral researcher from Dr. Xianjun Dong’s Lab at Harvard Medical School. Firstly, I would like to extend my heartfelt gratitude for your significant efforts in developing the human…

Continue Reading hg38 RepeatMasker v4.0.7 Dfam_2.0 question

issue in RNA -seq analysis

Forum:issue in RNA -seq analysis 0 hello all. i am working on RNA seq analysis. i would like to know following things: first i downloaded genome fasta file for non-coding rna from ensembl and got the gtf file for hg38 from there itself. performed hist2 and got 17% alignment for…

Continue Reading issue in RNA -seq analysis

Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA

Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA 0 Dear Biostar Community I’m currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements. For this, I’ve downloaded the variants file (VCF) and the…

Continue Reading Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA

Salmon (or other pseudo-mappers) for multi-species RNAseq read filtering

Hello all, Background: I’ve inherited a new RNAseq data set and am thinking about updating my approaches (last time I did this I was using HISAT and Cuffdiff). I’d like some opinions on best strategies to disentangle/filter out parasite microbe reads from infected host reads before preforming a differential gene…

Continue Reading Salmon (or other pseudo-mappers) for multi-species RNAseq read filtering

A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei

Cell growth and transfections Procyclic form (PCF) T. brucei, strain 29-1354, which carries integrated genes for the T7 polymerase and the tetracycline repressor, was grown in SDM-79 medium supplemented with 10% fetal calf serum, in the presence of 50 μg/ml hygromycin. Cells were grown in the presence of 15 μg/ml G418 for…

Continue Reading A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei

Issues while running htseq-count

Issues while running htseq-count 0 My data is Candida glabrata and when i use htseq-count, no read is mapped to the gene_id. Thank you for your time and help. Foad htseq-count GSNO_SRR1582646.sam Candida_glabrata_genome.gtf > GSNO_SRR1582646.count 10975 GFF lines processed. 8843 alignment record pairs processed. head GSNO_SRR1582646.count gene-CAGL0A00165g 0 gene-CAGL0A00187g 0…

Continue Reading Issues while running htseq-count

Transgenerational epigenetic effects imposed by neonicotinoid thiacloprid exposure

This study is aimed at revealing the transgenerational effects of thia. We chose the developmental window from embryonic days 6.5 to E15.5 because of its importance in germ cell program establishment. The mice breeding was described in the Materials and Methods section “Mouse treatment and dissection.” The design of the…

Continue Reading Transgenerational epigenetic effects imposed by neonicotinoid thiacloprid exposure

Error with HTseq RNAseq read count – rna-seq

Hi, I am getting error while running HTseq. This is the command and the error: htseq-count -q -f bam -s yes Ac1_mapped/ac1_mappedAligned.bam /global/home/users/catalinacastro/star/genome/genomic_v2.gtf count.txt Error occurred when processing GFF file (line 637338 of file /global/home/users/catalinacastro/star/genome/genomic_v2.gtf): not enough values to unpack (expected 9, got 1) [Exception type: ValueError, raised in init.py:221]…

Continue Reading Error with HTseq RNAseq read count – rna-seq

featureCount Error “No paired-end reads were detected in paired-end read library”

I created a combined mm39 and MHV-A59 (Viral) reference genome and aligned my paired end reads using STAR with the following input commands: STAR –runMode alignReads –runThreadN 16 –genomeDir /genomeDir –readFilesIn /FastqDir/1.fq.gz, /FastqDir/2.fq.gz –readFilesCommand gunzip -c –outReadsUnmapped Fastx –outSAMtype BAM SortedByCoordinate It seems that everything went fine. Here is a…

Continue Reading featureCount Error “No paired-end reads were detected in paired-end read library”

Error with HTseq RNAseq read count

Error with HTseq RNAseq read count 0 Hi, I am getting error while running HTseq. This is the command and the error: htseq-count -q -f bam -s yes Ac1_mapped/ac1_mappedAligned.bam /global/home/users/catalinacastro/star/genome/genomic_v2.gtf count.txt Error occurred when processing GFF file (line 637338 of file /global/home/users/catalinacastro/star/genome/genomic_v2.gtf): not enough values to unpack (expected 9, got…

Continue Reading Error with HTseq RNAseq read count

Should unique gene names/transcript IDs be used for ribosomal gene copies in a GTF/GFF file?

Should unique gene names/transcript IDs be used for ribosomal gene copies in a GTF/GFF file? 0 Hi, I have a GTF/GFF transcriptome that includes ribosomal sequences annotated from barrnap. I end up with ribosomal sequences that are present with the same gene IDs / transcript IDs at different sites and…

Continue Reading Should unique gene names/transcript IDs be used for ribosomal gene copies in a GTF/GFF file?

How to get the read counts from the BAM file for the regions specified in the BED file and get output as JSON?

Hi I have two input files A BAM file A BED file I want to get an output of read counts for the regions specified in the BED file as JSON I have earlier used FeatureCounts for getting the read counts but there I used gtf and aligned BAM file…

Continue Reading How to get the read counts from the BAM file for the regions specified in the BED file and get output as JSON?

GTF file for Rhinolophus sinicus

GTF file for Rhinolophus sinicus 1 Hello, I am currently working on a project investigating tissue-specific alternative splicing in Rhinolophus sinicus, particularly focusing on cochlear adaptations during seasonal transitions. I am in need of a GTF (Gene Transfer Format) file for this species to aid in my RNA-sequencing analysis. Could…

Continue Reading GTF file for Rhinolophus sinicus

Filter a BED file based on genome coordinates for gene names

Filter a BED file based on genome coordinates for gene names 0 Hi, I have BED file with certain regions of interests that looks like this: chr1 0 91923 chr1 323234 4596845 … with the start and end coordinates for each gene for the respective chromosome. But I want to…

Continue Reading Filter a BED file based on genome coordinates for gene names

STAR GeneCounts for most genes are 0

STAR GeneCounts for most genes are 0 0 Hello all, I am relatively new to this field. I am doing an RNA-seq alignment and expecting a gene count output for a prokaryote genome. I have an annotation GTF file in which I have converted the third column into “exon” for…

Continue Reading STAR GeneCounts for most genes are 0

featureCounts error???

featureCounts error??? 0 # get gtf # wget #https://ftp.ensembl.org/pub/release-110/gtf/mus_musculus/Mus_musculus.GRCm39.110.gtf.gz input.dir <- “/home/laudy/data/featurecounts/” setwd(input.dir) featureCounts -p -O -T -a /input.dir/Mus_musculus.GRCm39.110.gtf -o /input.dir/quants.txt /input.dir/PMN_CTRAligned.sortedByCoord.RD.RG.RC.out.bam please can someone tell me what’s wrong Im tried allllllll the options and he give me the same error: Error: object ‘p’ not found or Error: unexpected symbol…

Continue Reading featureCounts error???

RNAseq how to map Mouse+Virus Genome with STAR

RNAseq how to map Mouse+Virus Genome with STAR 0 I have Fasta and GTF files from the mouse and virus genome I would like to map to. From what I have read its best to combine the Fasta and GTF files but I’m not sure how this is accomplished. For…

Continue Reading RNAseq how to map Mouse+Virus Genome with STAR

PIGx ChIP-seq pipeline error

Hi Lisa, You also need to modify the gtf annotation file using: sed ‘/^#/d’ annotation_file.gtf > annotation_file_no_header.gtf Best, Alex > On 12. Oct 2022, at 15:07, Bora Uyar <borauy…@gmail.com> wrote: > > You would need to check how your fasta headers look and how the chromosomes are represented in…

Continue Reading PIGx ChIP-seq pipeline error

Comparative transcriptomics between species

Comparative transcriptomics between species 0 Hey all, I am new to the transcriptomics world, therefore I have some questions. I am currently working on a study where the goal is to compare transcriptomes across 5 species. I mapped all rna-seq to reference genomes (different for every species) using Hisat2, then…

Continue Reading Comparative transcriptomics between species

Indexing the reference genome

Indexing the reference genome 0 nohup STAR –runMode genomeGenerate \ –genomeDir /Users/yasi/mockexp/genome/genome_index/ \ –genomeFastaFiles /Users/yasi/mockexp/genome/GCF_000001405.39_GRCh38.p13_genomic.fna\ –sjdbGTFfile /Users/yasi/mockexp/genome/genomic.gtf –sjdbOverhang 80 > star_genome_generate.log 2>&1 What I am missing or doing wrong while trying to index the reference genome with STAR? STAR indexing genome • 47 views Read more here: Source link

Continue Reading Indexing the reference genome

zero counts for all genes in RNAseq data of Ferret

zero counts for all genes in RNAseq data of Ferret 0 I have bulk RNAseq data from Ferret and trying to get counts per gene. to do so I used hisat2 and got the genome from here: hgdownload.soe.ucsc.edu/goldenPath/musFur1/bigZips/musFur1.2bit after aligning the fastq files I used htseq and the following command:…

Continue Reading zero counts for all genes in RNAseq data of Ferret

all(rownames(cts) %in% txdf$TXNAME) is FALSE in DTU Analysis in R

Good afternoon, I am trying to do a DTU analysis for my research, but I am kinda new to this stuff and I have some problems. In particular on point 5). I am following the workflow of Bioconductor vignette rnaseqDTU and my pipeline is this: 1) read salmon quants ##…

Continue Reading all(rownames(cts) %in% txdf$TXNAME) is FALSE in DTU Analysis in R

HT-Seq and count matrix

HT-Seq and count matrix 0 Hello, I am trying to count reads using HT-Seq. In my gtf file, there are 60603 genes when I use the gene_id option I get all the genes in the count file with reads, but when I use the gene_name option I get only 59055…

Continue Reading HT-Seq and count matrix

An extremely fast Non-Overlapping Exon Length calculator written in Rust

Hi all! Introducing the Non-Overlapping Exon Length calculator (NOEL), an extremely fast GTF/GFF per gene exon length extractor written in Rust. See the code and latest updates here: github/alejandrogzi/noel In case you do not want to read the whole text: NOEL outperforms all open-sourced scripts/tools for this task. It can…

Continue Reading An extremely fast Non-Overlapping Exon Length calculator written in Rust

Mouse genome rewriting and tailoring of three important disease loci

BAC plasmids Human (CH17-203N23, CH17-449P15 and CH17-339H2) and mouse (RP23-51O13, RP23-75P20 and RP23-204E8) BACs were purchased from BACPAC Resources Center. Yeast–bacterium shuttle vector pLM1050 was modified by L. Mitchell based on a previous study28. pWZ699 was constructed by inserting a cassette containing pPGK-ΔTK-SV40pA transcription unit and the Actb gene into…

Continue Reading Mouse genome rewriting and tailoring of three important disease loci

couldn’t find matching transcriptome, returning non-ranged SummarizedExperiment AND unable to find an inherited method for function ‘seqinfo’ for signature ‘”SummarizedExperiment”‘

Dear Michael, I have not been able to run tximeta properly. I have read #38 but could not get any clue. The quant.sf files were generated by the latest nf-core RNA-seq pipeline (3.12.0), as the pipeline did not save the Salmon index, I generated it myself. Salmon used by nf-core…

Continue Reading couldn’t find matching transcriptome, returning non-ranged SummarizedExperiment AND unable to find an inherited method for function ‘seqinfo’ for signature ‘”SummarizedExperiment”‘

list index out of range” error while running BETA (Cistrome)

“IndexError: list index out of range” error while running BETA (Cistrome) 0 Dear all, I am getting the error “indexError: list index out of range” while running BETA basic on centos server. Please find the error and input files screenshot attached with this message. Command used: BETA basic -p hcleaf.27me3.final_peaks_macs3_5col.bed…

Continue Reading list index out of range” error while running BETA (Cistrome)

Ensembl transcript IDs

Ensembl transcript IDs 0 Hi everyone, From the GENCODE gtf file, I noticed that there are multiple ensembl transcript IDs for one gene ID and and one ensembl transcript id may have different versions (different values after the decimal). There are different transcript isoforms of one gene (due to alternative…

Continue Reading Ensembl transcript IDs

Fisher exact test on different gene sets

Fisher exact test on different gene sets 0 Hey everyone! I am working on a project exploring the role of one exact transcriptional factor on the acquired chemoresistance. I have Illumina transcriptome data, which has been proceeded with DESeq2 to find differentially expressed genes, and the resulting table containing only…

Continue Reading Fisher exact test on different gene sets

Converting STAR Gene-level alignment to TPM expression

Converting STAR Gene-level alignment to TPM expression 0 Hi, I have recently performed gene-level alignment with STAR on 20 samples with the parameter –quantMode GeneCounts and –outSAMtype BAM SortedByCoordinate. I have the output files ReadsPerGene.out.tab and Aligned.sortedByCoord.out.bam. From this, how can I generate reliable TPM values with either the sorted…

Continue Reading Converting STAR Gene-level alignment to TPM expression

Queries on Tophat and Cufflinks Data

Queries on Tophat and Cufflinks Data 0 Hello, A previous student sent off some RNA extraction for sequencing and analysis to a company and the files are confusing me so was wondering if anyone could help. The company sent the raw RNA-seq data, Tophat analysis and Cufflinks analysis for 3…

Continue Reading Queries on Tophat and Cufflinks Data

Different “Reads Mapped Confidently to Transcriptome” values in scRNA

Hello everyone. My question is about sing-cell RNASeq. I am re-analyzing a scRNA raw data in my lab, which has previously analyzed by seqencing company, i am trying to replicate the results and update/optimize my pipeline. Currently my pipeline is as follows: 1. Creating custom reference I indexed the reference…

Continue Reading Different “Reads Mapped Confidently to Transcriptome” values in scRNA

Proper HTSeq usage on bacterial genome. Don’t quite understand –t

Proper HTSeq usage on bacterial genome. Don’t quite understand –t 1 Hi everyone, I’m trying to run HTSeq on a group of BAM files generated from the alignment of an RNAseq illumina reads mapped to a reference genome. The reference genome is the sequence with highest quality available and was…

Continue Reading Proper HTSeq usage on bacterial genome. Don’t quite understand –t

file conversion from gtf to gff3 for evidence modeler

file conversion from gtf to gff3 for evidence modeler 0 Hi, could you please guide me how to convert the stringtie output file stringtie_transcript.gtf into .gff3 format for the evidence modeler of genome annotation. gff3 stringtie gtf • 162 views • link updated 32 minutes ago by Ram 41k •…

Continue Reading file conversion from gtf to gff3 for evidence modeler

DESeq2 Results Annotation

DESeq2 Results Annotation 0 Hi, I am getting NA values instead of gene names in my Feature column after I annotate my DEseq2 results in Galaxy using mm39 version of genome and gtf file for mouse genome. Help if you have an idea. RNA-seq • 220 views • link updated…

Continue Reading DESeq2 Results Annotation

Widespread occurrence of chitinase-encoding genes suggests the Endozoicomonadaceae family as a key player in chitin processing in the marine benthos

Genes coding for endo-chitinases (EC 3.2.1.14), the enzymes that cleave chitin polymers into oligomers [16], were found on 32 of 42 Endozoicomonadaceae genomes (Fig. 1 and Table S1), including representatives of all formally described genera and all Candidatus Gorgonimonas MAGs. Several genomes harbored more than one endo-chitinase encoding gene, resulting in…

Continue Reading Widespread occurrence of chitinase-encoding genes suggests the Endozoicomonadaceae family as a key player in chitin processing in the marine benthos

What should I do with STAR two-pass novel splice junctions?

What should I do with STAR two-pass novel splice junctions? 0 Hi, I have a few relatively naive questions which I don’t fully understand. I know that the STAR two-pass mode can detect for novel splice junctions on top of the annotations from GTF/GFF3 files. Let’s say I run a…

Continue Reading What should I do with STAR two-pass novel splice junctions?

ROSE Algorithm: index out of range

Hi again, I am trying to run the ROSE algorithm created by the young lab, url here: younglab.wi.mit.edu/super_enhancer_code.html Specifically, I am running the ROSE_main.py script: younglab.wi.mit.edu/super_enhancer_code.html I created a python 2.7 environment to run the script as it is compatible with python 2.7. When I run the script in ubuntu:…

Continue Reading ROSE Algorithm: index out of range

Converting GFF to GTF

Converting GFF to GTF 0 Hello, I am having trouble transferring my gff file to a gtf. I have tried using gffread, gffcompare, and rtracklayer, which all have left me with the same or no output. Here are my files. Please help! gff gtf • 38 views • link updated…

Continue Reading Converting GFF to GTF

How to get RPKM from count matrix

How to get RPKM from count matrix 0 Hi Biostars, I have a count matrix with mouse gene name and need to get RPKM. I know it is not a good metric but biologists used to it. gtf <- readGFF(“/reference_genome/mm39.ncbiRefSeq.gtf”) gtf_exon <- gtf[gtf$type == “exon”, ] width <- gtf_exon$end -…

Continue Reading How to get RPKM from count matrix

Indexing human chromosome assembly of GRCh38.p14 using STAR

Indexing human chromosome assembly of GRCh38.p14 using STAR 1 I want to index the genome assembly “GRCh38.p14” before aligning to my reads. however one parameter that STAR needs is the overhang length –sjdbOverhang ReadLength-1I only have the chromosome assembly and the gtf file, how should I find out what is…

Continue Reading Indexing human chromosome assembly of GRCh38.p14 using STAR

bedtools: coordinates of read covered

bedtools: coordinates of read covered 0 Hi bedtools is quite impressive. I could generate a data with 13 columns, when giving my gtf and sampl.bam. I do get the coverage breadth, depth of reads, coordinates of the gene/other feature against which the coverage was calculated….. though i also wanted to…

Continue Reading bedtools: coordinates of read covered

Discrepancy in Alignment Rates: HISAT2 vs FeatureCounts

Hi everyone, I hope you’re doing well. I’ve been encountering a puzzling issue in my RNA-seq analysis pipeline and was hoping to get some insights from this knowledgeable community. I’m currently working on an RNA-seq project, where I’ve aligned my trimmed reads to the mouse reference genome (GRCm39) using HISAT2…

Continue Reading Discrepancy in Alignment Rates: HISAT2 vs FeatureCounts

Stringtie coverage calculation for DE analysis

Stringtie coverage calculation for DE analysis 0 Hello Biostars, I have a question about stringtie and how it assigns reads to a specific transcript isoform. I wanted to perform a Differential Expression analysis using a genome assembly without a gene annotation, as a reference. To do this I have aligned…

Continue Reading Stringtie coverage calculation for DE analysis

AWS STAR Genome Index Error

AWS STAR Genome Index Error 0 Hello, I have been trying to run this line of code for the longest time: STAR –runThreadN 20 –runMode genomeGenerate –genomeDir genomeDir/ –genomeFastaFiles Homo_sapiens.GRCh38.dna.toplevel.fa –sjdbGTFfile Homo_sapiens.GRCh38.110.chr.gtf I first tried running it on my home terminal but then realized that that it would take several…

Continue Reading AWS STAR Genome Index Error

Help with error velocyto

Help with error velocyto 0 Hi Biostars, I try to get the loom file and after trying I still don’t know how to fix this error: velocyto run -b MT/outs/filtered_feature_bc_matrix/barcodes.tsv -o output_dir -m Hg38_rmsk.gtf MT/outs/possorted_genome_bam.bam /cellranger/reference/refdata-cellranger-GRCh38 -3.0.0/genes/genes.gtf –samtools-memory 8000 –samtools-threads 8 MemoryError: bam file #0 could not be sorted by…

Continue Reading Help with error velocyto

Converting from BED to SAF/GFF

I believe that SAF format use 1-based coordinates that are closed on both ends. Here is how I got this conclusion. First make some toy data. $ cat genome.fa >chr1 AATTCCGGAAAATTTTCCCCGGGGAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCC $ cat reads.fa >q1 AAAATTTTCCCCGGGGAAAAAAAAAAAAAAAAAACC Map reads to the genome: $ STAR –runMode genomeGenerate –genomeDir test_star –genomeFastaFiles genome.fa –genomeSAindexNbases…

Continue Reading Converting from BED to SAF/GFF

STAR Genome index Error

STAR Genome index Error 0 I tried to run STAR command for RNAseq but I got the following error. /home/pshekar/RNAseq/STAR-2.7.11a/source/STAR –runMode genomeGenerate \ –genomeDir GRCh38.79.chrom1 \ –genomeFastaFiles genome/Homo_sapiens.GRCh38.dna.chromosome.1.fa \ –sjdbGTFfile gtf/Homo_sapiens.GRCh38.79.chrom1.gtf \ /home/pshekar/RNAseq/STAR-2.7.11a/source/STAR –runMode genomeGenerate –genomeDir GRCh38.79.chrom1 –genomeFastaFiles genome/Homo_sapiens.GRCh38.dna.chromosome.1.fa –sjdbGTFfile gtf/Homo_sapiens.GRCh38.79.chrom1.gtf –sjdbOverhang 62 –sjdbOverhang 62 *!!!!! WARNING: –genomeSAindexNbases 14 is…

Continue Reading STAR Genome index Error

Issues with featureCounts

Dear community, I have been struggling finding the problem for the past few days now. I work on making a differential expression analysis (DEA) with direct RNA nanopore long reads from the minion platform. I ran into this wall and I would really appreciate your help! I sequenced the RNA…

Continue Reading Issues with featureCounts

Filter, Plot, and Explore with Seurat in RStudio

First thing’s first, we need to load the packages we will be using. In order to use any functions of a package, we must first call the library of that package. In your console (likely in the lower left corner of your RStudio window), run the following lines of code…

Continue Reading Filter, Plot, and Explore with Seurat in RStudio

How to assign gene names after kallisto when I add GFP?

How to assign gene names after kallisto when I add GFP? 1 Hello, I would like to generate a new reference for kallisto where I add GFP. I found this link: github.com/igordot/genomics/blob/master/workflows/ref-genome-gfp.md and it seems pretty straightforward to add the GFP for the alignment. However, I am not sure how…

Continue Reading How to assign gene names after kallisto when I add GFP?

rna-seq analysis with Salmon – how to Import and summarize using tximport

Hi! I’m trying to do RNA-seq analysis using salmon and would like to have a matrix of read counts of 10 RNA fastq files. I installed salmon with bioconda, however, I can only find version : 0.8.1 even after ‘conda update salmon’. So I have been doing with version 0.8.1…

Continue Reading rna-seq analysis with Salmon – how to Import and summarize using tximport

Help with error running velocyto

Help with error running velocyto 1 Hi Biostars, I try to get the loom file to do velocity analysis: velocyto run10x -m Hg38_rmsk.gtf WT/outs /cellranger/reference/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf Inside WT/outs I have some folders such as filtered_feature_bc_matrix. Inside filtered_feature_bc_matrix, I have barcodes.tsv.gz ERROR – This is an older version of cellranger, cannot check…

Continue Reading Help with error running velocyto

TPM from STAR output without re-allign the file using RSEM or Salmon

Hi, I want to get the TPM files from aligned files generate with STAR and reading I found out that the easiest way is using RSEM or Salmon. My code for the alignment is /Users/c/STAR/bin/MacOSX_x86_64/STAR runThreadN 4 –genomeDir /Users/c/Desktop/Human_genome_index –readFilesIn /Users/c/Desktop/test/C1D20_R1_001_paired.fastq /Users/c/Desktop/test/C1D20_R2_001_paired.fastq –quantMode TranscriptomeSAM GeneCounts –outFileNamePrefix C1D20 –outSAMtype BAM SortedByCoordinate…

Continue Reading TPM from STAR output without re-allign the file using RSEM or Salmon

Getting the overlap between two GTF files

Getting the overlap between two GTF files 1 Hello, I have two GTF files which contain the information of transcripts, and I want to get the overlap of transcripts between the two GTF file. Can anyone give me some advice? Thanks! file RNA-seq GTF • 24 views bedtools intersect \…

Continue Reading Getting the overlap between two GTF files

rMats Run Does Not Generate More Than One Output Row Per File

rMats Run Does Not Generate More Than One Output Row Per File 0 Using singularity I pulled the docker image from the mcfonsecalab/rmats docker. Then I tried to use it using the following script on the model organism Zebrafish in my case (paths are removed due to the rules of…

Continue Reading rMats Run Does Not Generate More Than One Output Row Per File

AMD Unveils Purpose-Built, FPGA-Based Accelerator for Ultra-Low Latency Electronic Trading

— New AMD Alveo fintech accelerator card provides trading firms and brokerages with breakthrough trade execution performance at nanosecond speed and AI-enabled trading strategies — — Solution partners Alpha Data, Exegy and Hypertec add to growing ecosystem of ultra-low latency solutions for fintech market — LONDON, Sept. 29, 2023 /PRNewswire/…

Continue Reading AMD Unveils Purpose-Built, FPGA-Based Accelerator for Ultra-Low Latency Electronic Trading

Issues with featureCounts (No

Dear community, i have been struggling finding the problem for the past few days now. I work on making a differential expression analysis (DEA) with direct RNA nanopore long reads from the minion platform. I ran into this wall and i would really appreciate your help! I sequenced the RNA…

Continue Reading Issues with featureCounts (No

How to get the gft file to run velocyto for velocity analysis?

How to get the gft file to run velocyto for velocity analysis? 1 You need the same GTF file that was used during mapping of your 10x data. If you used CellRanger and it was mouse (given you used recent CellRanger) you can find it in the CellRanger folder under…

Continue Reading How to get the gft file to run velocyto for velocity analysis?

Splitting VCF/BCF file into seperate gene files

Splitting VCF/BCF file into seperate gene files 0 I have a multi-sample bcf file which I would like to split into smaller files per gene so I can use this for some downstream eQTL analysis. I’ve started a bash script which pipes bcftools query -f ‘%SAMPLE\t%POS\t%REF\t%ALT\t%GT\n’ into an awk script…

Continue Reading Splitting VCF/BCF file into seperate gene files

Dataset’s name in BioMart for S. pombe

Dataset’s name in BioMart for S. pombe 2 Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart to create TranscriptDB? cheers, S.pombe BioMart dataset • 3.6k views Looks like you figured out another way of getting what…

Continue Reading Dataset’s name in BioMart for S. pombe

cannot open file 2 for reading From Cufflinks Version 2.2.1 When Attempting To Use CuffDiff

Error: cannot open file 2 for reading From Cufflinks Version 2.2.1 When Attempting To Use CuffDiff 1 I am trying to use CuffDiff (from Cufflinks version 2.2.1) on 3 controls and 3 experimental samples. The variables used and the main section of code in a bash script is below: The…

Continue Reading cannot open file 2 for reading From Cufflinks Version 2.2.1 When Attempting To Use CuffDiff

capTEs enables locus-specific dissection of transcriptional outputs from reference and nonreference transposable elements

Cell culture All cell lines were grown in 6 cm dishes at 37 °C in a 5% CO2 incubator. The K562, MDA-MB-231 and HCT 116 cell lines were cultured in high-glucose DMEM supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin antibiotics (pen-strep). NCM460 cells were cultured in RPMI 1640 medium supplemented…

Continue Reading capTEs enables locus-specific dissection of transcriptional outputs from reference and nonreference transposable elements

Genes with promoter and enhancer regions as GTF

Okay, first things first Please I am in hurry and need help Slow down and think again what you are doing. I used MACS Used MACS for what? (BTW, convert SAM to BAM and save some space. Just a suggestion). I’m sure you wanted to find out ChIP enriched regions,…

Continue Reading Genes with promoter and enhancer regions as GTF

STAR Intron Motif Script Gives Segmentation fault Error

STAR Intron Motif Script Gives Segmentation fault Error 0 I have the following inputs: # Define input directory containing FASTQ files Input_directory=”/path/to/fastq/folder” # Define output directory for STAR output files Output_directory=”/path/to/output/directory” # Define paths to reference files Annotation_GTF=”/path/to/Zebra/fish/GRCz11.110.chr.gtf” Genome_FASTA=”/path/to/soft/masked/Zebra/fish/primary_assembly.fa” Reference=”/path/to/soft/masked/STAR/created/reference/only/for/use/with/STAR” # Define the number of threads to use num_threads=4 To…

Continue Reading STAR Intron Motif Script Gives Segmentation fault Error

Finding sequences in unannotated genomes using reference coordinates

Finding sequences in unannotated genomes using reference coordinates 0 Hey Stars! I have a really confounding issue at hand. I am working on extracting upstream regions of genes from 100 different genomes of A. thaliana. The problem being, I have one reference genome for TAIR10 version (which has an annotated…

Continue Reading Finding sequences in unannotated genomes using reference coordinates

STAR index not working

STAR index not working 0 Hi, I am trying to build the index for the STAR alignment and it basically doesn’t work as it does not progress at all. I have a M1 Mac and I have enough memory, is my computer the problem? I tried yesterday my code on…

Continue Reading STAR index not working

gffread outputs empty gtf file

gffread outputs empty gtf file 1 Hi, I’ve been trying to convert my prokka output in gff format to gtf format to be able to use for my hisat-stringtie analysis. However, using gffreads to convert yields an empty gtf file. Im not sure if im going wrong somewhere. Any help…

Continue Reading gffread outputs empty gtf file

Allele specific binding of histone modifications and a transcription factor does not predict allele specific expression in correlated ChIP-seq peak-exon pairs

ChIP-seq and RNA-seq Tissue sampling and RNA-sequencing for three Holstein dairy cows and two of their foetuses (one male and one female with a shared sire) are described in17 and18. ChIP-sequencing for all tissues was as described in16, with the inclusion of more tissues. Whole genome sequence for each animal…

Continue Reading Allele specific binding of histone modifications and a transcription factor does not predict allele specific expression in correlated ChIP-seq peak-exon pairs

Gene-based differential expression analysis of genetically modified mouse line

Gene-based differential expression analysis of genetically modified mouse line 0 Hello, I am trying to analyze my bulk RNAseq data set from hippocampal tissue extracted from our WT/KO mice. The knockout consists of a 10kb deletion in a single exon of our gene of interest. I want to look at…

Continue Reading Gene-based differential expression analysis of genetically modified mouse line

GTF annotation file for Hg38 Dec 2013 (First Release)

GTF annotation file for Hg38 Dec 2013 (First Release) 2 Hello, I am looking for the GTF file corresponding to the initial Dec. 2013 hg38 release. I have tried looking including here: hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/ but the directory does not contain the file (it does contain the original .fa file, though). I…

Continue Reading GTF annotation file for Hg38 Dec 2013 (First Release)

The genomic footprint of whaling and isolation in fin whale populations

Samples and sequencing Tissue samples from 50 fin whales (Balaenoptera physalus) were collected using a standard protocol to obtain skin biopsies from free-ranging cetacean species, which use a small stainless-steel biopsy dart deployed from a crossbow or rifle73,74. These samples were collected throughout the Eastern North Pacific (ENP; N = 30, represented…

Continue Reading The genomic footprint of whaling and isolation in fin whale populations

Ensembl Release 104 and newer GTF files no longer have genes sorted by position

Following up on my previous post, I dug deeper and want to more precisely describe my “problem”. Up until and including Ensembl Release 103, the GTF files provided had all the gene entries in strictly sorted order (with all the transcript, exon, etc. entries pertaining to a gene entry listed…

Continue Reading Ensembl Release 104 and newer GTF files no longer have genes sorted by position

GTF files from Ensembl Releases 105 and 106 unsorted

There is nothing wrong with these files. Sort (as any GTF): zcat Homo_sapiens.GRCh38.105.gtf.gz \ | awk ‘$1 ~ /^#/ {print $0;next} {print $0 | “sort -k1,1 -k4,4n -k5,5n”}’ \ | bgzip > Homo_sapiens.GRCh38.105_sorted.gtf.gz That having said, if you need the file being strictly coordinate-sorted then you always have to do…

Continue Reading GTF files from Ensembl Releases 105 and 106 unsorted

convert bed12 to sorted gtf

convert bed12 to sorted gtf 1 Hello I m trying to convert bed12 to sorted gtf but output file ‘Precapture_uniq.gff’ is empty i m very new for this work if you can help me to solve this i appreciate it. awk -f bed12togff Postcapture_uniq_chr.bed12 | sort -k1,1 -k4,4n -k5,5n “$@”…

Continue Reading convert bed12 to sorted gtf

Is there a tool that sorts gtf files?

gff3sort.pl seems to make sure lines having no “Parent=” attribute comes before those having it, if chrom and start position are the same. I think with unix standard program it should go like this: $ (grep -v “Parent=” sortme.gtf;grep “Parent=” sortme.gtf)| sort -k1,1 -k4,4n -s EDIT: Should’nt we have to…

Continue Reading Is there a tool that sorts gtf files?

TAPIS installation and usage

download gmap and install cd ~/software wget research-pub.gene.com/gmap/src/gmap-gsnap-2023-07-20.tar.gz tar xvzf gmap-gsnap-2023-07-20.tar.gz cd ~/software/gmap-gsnap-2023-07-20 ./configure –prefix=$HOME/jcbu/software/gmap make -j 20 make check make install build gmap index cd /home/jcbu/refGenome/gencode/mouse/GRCm38.p6_releaseM20/Sequence/gmapIndex nohup /home/jcbu/jcbu/software/gmap/bin/gmap_build -d GRCm38.gmap -D /home/jcbu/refGenome/gencode/mouse/GRCm38.p6_releaseM20/Sequence/gmapIndex \ /home/jcbu/refGenome/gencode/mouse/GRCm38.p6_releaseM20/Sequence/WholeGenomeFasta/GRCm38.p6.genome.fa & 3.alignPacBio.py cd /home/jcbu/YJ/PAIsoseq/ES/TAPIS/gmap/ nohup /usr/bin/python2 /home/jcbu/YJ/PAIsoseq/comp_bio-tapis-44cc05ebc78c/scripts/alignPacBio.m.py \ -p 20 -o Gm_Ctrl_rep1.alignPacBio \ /home/jcbu/refGenome/gencode/mouse/GRCm38.p6_releaseM20/Sequence/gmapIndex/GRCm38.gmap \…

Continue Reading TAPIS installation and usage

Searching a tool to modify annotation files.

Searching a tool to modify annotation files. 0 In two different projects I need to modify annotation files. For instance I need to split a gene into two independent ones following evidence that they are separate transcriptional units. I also need to create a new alternative isoform of a gene…

Continue Reading Searching a tool to modify annotation files.

Genome-wide DNA methylation patterns in bumble bee (Bombus vosnesenskii) populations from spatial-environmental range extremes

Orr, H. A. The genetic theory of adaptation: A brief history. Nat. Rev. Genet. 6, 119–127 (2005). Article  CAS  PubMed  Google Scholar  Dillon, M. E. & Lozier, J. D. Adaptation to the abiotic environment in insects: the influence of variability on ecophysiology and evolutionary genomics. Curr. Opin. Insect Sci. 36,…

Continue Reading Genome-wide DNA methylation patterns in bumble bee (Bombus vosnesenskii) populations from spatial-environmental range extremes

How many ‘novel’ splice junctions/splice events are resonably expected from human RNA,

Hello all, I was just wondering what a reasonable percentage of ‘novel’ splice junctions/splice events is for human RNAseq data using the program junction_annotation.py. I am new to RNAseq and just running some published human RNAseq data through my pipeline in order to familiarize myself with the programs and protocols….

Continue Reading How many ‘novel’ splice junctions/splice events are resonably expected from human RNA,