Tag: GFF

genbank sequence format

HHS Vulnerability Disclosure, Help This document is an overview of the Entrez databases, with general information on If you are not sure that the “Save” option in your program will do this for you, use “Save As”, In Excel, select “Save As” from the File menu. optimizations to reduce memory…

Continue Reading genbank sequence format

Bug#1024835: python-pauvre: ftbfs with biopython 1.80

Source: python-pauvre Version: 0.2.3-1 Severity: important Tags: ftbfs Dear Maintainer, python-pauvre fails its build time test suite when built against biopython 1.80 in experimental but builds successfully against biopython 1.79. The relevant part of the log in case of failure shows: I: pybuild base:240: cd /<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pauvre/build; python3.10 -m unittest discover -v test_normal_plotting_scenario (pauvre.tests.test_synplot.libSeq_test_case) This…

Continue Reading Bug#1024835: python-pauvre: ftbfs with biopython 1.80

Bedtools Bam To Bed With Code Examples

Bedtools Bam To Bed With Code Examples With this article, we’ll look at some examples of how to address the Bedtools Bam To Bed problem . bedtools bamtobed [OPTIONS] -i <BAM> As we have seen, a large number of examples were utilised in order to solve the Bedtools Bam To…

Continue Reading Bedtools Bam To Bed With Code Examples

Freebayes-parallel with large bam file – individual threads running for >6 days

Context: I’m trying to call variants on a sequencing project using pooled genotyping-by-sequencing. Pools consist of 94 samples each, alongside a number of individuals. Sequence data was demultiplexed and then aligned to a reference genome using hisat2, and the resultant bams were merged with samtools merge. The problem bam is…

Continue Reading Freebayes-parallel with large bam file – individual threads running for >6 days

Python pandas transforming int to float in gff subsetting

Hey guys, I’ve written this python code. import pandas as pd from Bio import SeqIO import argparse parser= argparse.ArgumentParser(add_help=False) parser.add_argument(“-h”, “–help”, action=”help”, default=argparse.SUPPRESS, help= “Get partial gff given a pattern on Names field”) parser.add_argument(“-g”, help= “-g: gff file”, required = “True”) parser.add_argument(“-l”, help= “-l: list of patterns to search on…

Continue Reading Python pandas transforming int to float in gff subsetting

can gff2 reference used in htseq-count?

Dear all We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples. The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly. However, we were unable to obtain a gff3 version of our target plasmid genome, the…

Continue Reading can gff2 reference used in htseq-count?

Running synteny of 2 strain.

Running synteny of 2 strain. 0 “If i have 2 strain of same species. And i have genomic island regions on excel sheet. And now i want to view the synteny of those regions on both strain. How can i do this? I have used a tool names synvisio there…

Continue Reading Running synteny of 2 strain.

The low successful assignment ratio of FeatureCounts

Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…

Continue Reading The low successful assignment ratio of FeatureCounts

Use RSEM and Bowtie2 to align paired-end sequences

Use RSEM and Bowtie2 to align paired-end sequences 0 I want to use rsem-calculate-expression and bowtie2 aligner to align paired-end sequence based on the following conditions: 2 processors generate BAM file very fast bowtie2 sensitivity append gene/transcript name My code: rsem-refseq-extract-primary-assembly GCF_000001405.31_GRCh38.p5_genomic.fna GCF_000001405.31_GRCh38.p5_genomic.primary_assembly.fna rsem-prepare-reference –gff3 GCF_000001405.31_GRCh38.p5_genomic.gff –bowtie2 –bowtie2-path /bowtie2-2.4.5-py39hd2f7db1_2 –trusted-sources…

Continue Reading Use RSEM and Bowtie2 to align paired-end sequences

Accurate assembly of multi-end RNA-seq data with Scallop2

Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). Article  Google Scholar  Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28,…

Continue Reading Accurate assembly of multi-end RNA-seq data with Scallop2

htseq-count error

htseq-count error 1 Hi, htseq-count -f bam -s yes ~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam ~refgen/gencode.v39.primary_assembly.annotation.gtf > counts.txt I am trying to run htseq-count with command above but in the err file [E::idx_find_and_load] Could not retrieve index file for ‘~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam’ 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines…

Continue Reading htseq-count error

“transcript reads were aligned to the RefSeq transcriptome (downloaded March 2013) using Tophat” was done by orginal author.Please help me to get that 2013 transcriptome and gff data

“transcript reads were aligned to the RefSeq transcriptome (downloaded March 2013) using Tophat” was done by orginal author.Please help me to get that 2013 transcriptome and gff data 1 Greetings all, I have Rna-Seq data generated from celseq protocol.I want to replicate the same trancriptome mapping as previous study.As they…

Continue Reading “transcript reads were aligned to the RefSeq transcriptome (downloaded March 2013) using Tophat” was done by orginal author.Please help me to get that 2013 transcriptome and gff data

Feature count is very low using htseq-count

Feature count is very low using htseq-count 0 Hello all, I performed bbmap on my RNA-seq paired sequence data using following cmd bbmap.sh in1=J2_R1.fastq in2=J2_R2.fastq out=output_J2.sam ref=im4.fasta nodisk The header of generated sam file is @HD VN:1.4 SO:unsorted @SQ SN:k141_1006 LN:2503 @SQ SN:k141_5512 LN:5393 @SQ SN:k141_4772 LN:4387 @SQ SN:k141_3267 LN:4531…

Continue Reading Feature count is very low using htseq-count

Petabase-scale sequence alignment catalyses viral discovery

Serratus alignment architecture Serratus (v0.3.0) (github.com/ababaian/serratus) is an open-source cloud-infrastructure designed for ultra-high-throughput sequence alignment against a query sequence or pangenome (Extended Data Fig. 1). Serratus compute costs are dependent on search parameters (expanded discussion available: github.com/ababaian/serratus/wiki/pangenome_design). The nucleotide vertebrate viral pangenome search (bowtie2, database size: 79.8 MB) reached processing rates…

Continue Reading Petabase-scale sequence alignment catalyses viral discovery

How to label columns in HTSeq output

How to label columns in HTSeq output 0 I’ve been working to process RNAseq data and I’ve used hisat2 to align my reads to the reference genome. When I take those output files and put them into HTSeq-count using the below code, I get a count matrix but the columns…

Continue Reading How to label columns in HTSeq output

Indexing with STAR

Indexing with STAR 0 Hello, I am working with RNA seq data and creating an index of reference genome Gossypium hirsutum by using STAR. STAR asks GTF annotation format while my file is GFF3. According to literature, in order to run GFF file I need to remove –sjdbOverhang 50 and…

Continue Reading Indexing with STAR

For Differential Gene Expression , which indexing format is better: GFF or GTF?

For Differential Gene Expression , which indexing format is better: GFF or GTF? 0 Hello, I am working on DGE and wish to create reference index for mapping. Two file formats are used for it GFF and GTF. My question is: What is the major difference between GTF and GFF?…

Continue Reading For Differential Gene Expression , which indexing format is better: GFF or GTF?

How to retrieve fasta sequence after local blast?

How to retrieve fasta sequence after local blast? 1 Hello, I have created a Blast database using a reference genome. Then, I have performed a local blast search in command line using a gene of interest. I have obtained some hits with the usual Blasting information. Now, I want to…

Continue Reading How to retrieve fasta sequence after local blast?

Convertion Of Gff3 To Gtf

Convertion Of Gff3 To Gtf 3 How do I convert GFF file to a GTF file? Is there any tool available? gtf gff • 79k views The easiest way is to use the gffread program that comes with the Cufflinks software suite (Tuxedo) gffread my.gff3 -T -o my.gtf See gffread…

Continue Reading Convertion Of Gff3 To Gtf

how to identiify real isomers in mirge3.0’s output files.

how to identiify real isomers in mirge3.0’s output files. 0 How do you distinguish/extract ‘real’ isomirnas from the exhaustive output of mirge3.0? Im trying to do a differential expression analysis on the isomers of miRNA in my dataset. Im using mirge3.0 with the -gff and other outputs (basically all of…

Continue Reading how to identiify real isomers in mirge3.0’s output files.

Submit sequence data to NCBI

Data provision and standards. GEO sequence submission procedures are designed to encourage provision of MINSEQE elements: Thorough descriptions of the biological samples under investigation, and procedures to which they were subjected. Thorough descriptions of the protocols used to generate and process the data. Request updates to accessioned records per the…

Continue Reading Submit sequence data to NCBI

How to get enome feature annotation through NCBI api ?

How to get enome feature annotation through NCBI api ? 1 Hi, I wanna get the whole genome annotion result with some information ,like transcript,exon,gene etc , As we know ,NCBI has provided the GFF file containing the above information , but I wanna get the latest content from NCBI…

Continue Reading How to get enome feature annotation through NCBI api ?

Refseq annotation for processed/unprocessed Pseudogenes

Refseq annotation for processed/unprocessed Pseudogenes 0 Hi, I have extracted the pseudogenes from refseq annotation file. However there is no information about the type of the pseudogene being processed/unprocessed in the gff file. on the other hand ensembl/gencode gff files do have this type of information. the problem is not…

Continue Reading Refseq annotation for processed/unprocessed Pseudogenes

How to write gffutils.feature.Feature object to file

How to write gffutils.feature.Feature object to file 0 How do you most efficiently write a collection of gffutils.feature.Feature objects to file, so that you can create a gff3 file from a collection of Feature objects? I am trying to create a gff3 file without the ##FASTA part at the bottom,…

Continue Reading How to write gffutils.feature.Feature object to file

read count to gene

read count to gene 0 I am using this command to get read counts to gene by using the bedtools intersect. samtools view -Shu -q10 -@ 20 UE-2955-CMLib12_sorted.bam | bedtools intersect -c -a GCA_900659725.1_ASM90065972v1_genomic.gff -b stdin > UE-2955-CMLib{i}_intersect_counts2.bed The command work for other files but not for one file. Which…

Continue Reading read count to gene

fetch out common/conserved genes from a bunch of bacteria species

fetch out common/conserved genes from a bunch of bacteria species 0 Hi all, I have a difficulty in determining and fetching out the common/conserved regulator genes from a bunch of species. I fetched out all the regulator genes from each bacteria species according to the GFF annotation. I would like…

Continue Reading fetch out common/conserved genes from a bunch of bacteria species

gffread error

hello I am currently trying to do RNA-seq using public data in brassica juncea. To use htseq-count for making count table, I have to convert gff file which downloaded in brassica database to gtf file. So I used gffread for converting gff file with below command gffread Bju.genome.gff -T -o…

Continue Reading gffread error

Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and documents in this repository are freely available for all uses. Code is available under the MIT license and images, documentations and talks under the Creative Commons No…

Continue Reading Incubator for useful bioinformatics code, primarily in Python and R

wont recognize the gtf or gff3 files (runtime exception)

snpeff : wont recognize the gtf or gff3 files (runtime exception) 1 Hi, I am trying to build a custom databasee for snpeff. As instructed both in the forum and snpeff instructions, I did the following; Then I added the following into snpEff.config file # BG94_1 BG94_1.genome : BG94_1 Then…

Continue Reading wont recognize the gtf or gff3 files (runtime exception)

Are there any alternatives to Liftoff

Are there any alternatives to Liftoff – Mapping annotations (GFF/GTF) between assemblies 2 Hi, I am annotating closely related accession (varieties) using reference assembly (please note that I am using only a region, so that is the reason why you don’t see chromosome info). I really liked liftoff (ver 1.6.1:…

Continue Reading Are there any alternatives to Liftoff

convert genomic bigWig file to transcriptome space

convert genomic bigWig file to transcriptome space 0 Hi all, Is anyone aware of a function to convert a bw file mapped to a genome to map to a transcriptome (of said genome), where the input would be the genomic bw file and gff/gtf/bed annotation and output a single ‘transcriptomic’…

Continue Reading convert genomic bigWig file to transcriptome space

Blank output When converting GFF3 file to GTF using either gffread or AGAT

Blank output When converting GFF3 file to GTF using either gffread or AGAT 1 Hi, I am trying to convert gff3 file (please see below) to GTF. I used two tools suggested here gffread and agat here. #gff-version 3 Bg_94-1_CX35|chr01_10700000_16500000 Liftoff gene 1 1345 . + . ID=gene_1;Name=Os01g0293800 gene;coverage=0.997;sequence_ID=0.982;extra_copy_number=0;copy_num_ID=gene_1_0 Bg_94-1_CX35|chr01_10700000_16500000…

Continue Reading Blank output When converting GFF3 file to GTF using either gffread or AGAT

How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis?

How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis? 0 I am trying to perform STAR alignment and I need the reference files for indexing. I would like to know how to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files so that I can use my following code for indexing…

Continue Reading How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis?

Handy online tool for genomic analysis and data visualization

Previously, I have recommended two powered online tools for genomic analysis and data visualization here. I want to share with you other handy tools that I found recently. iTOL is perfect for beautifying genomic data. circos is useful for displaying the relationships between objects and positions. You could discover their…

Continue Reading Handy online tool for genomic analysis and data visualization

How to align and visualize data with .fasta and .gff3 files in IGV?

How to align and visualize data with .fasta and .gff3 files in IGV? 1 Hi everyone, I have an issue in aligning and visualizing my data in IGV. As I read in manual of IGV, to align and visualize data, I need to to prepare .BAM/.SAM or other input format…

Continue Reading How to align and visualize data with .fasta and .gff3 files in IGV?

does not contain a ‘gene’ attribute

htseq-count returns : does not contain a ‘gene’ attribute 1 Dear BIOSTAR community, I’m trying to make count matrix with htseq-count, htseq-count -s yes -t gene -i gene 01.sorted.sam annotation_cattle.gff > 01.txt even with –idattr=gene , it returns error: Error processing GFF file (line 1864255 of file annotation_cattle.gff): Feature gene-D1Y31_gp1…

Continue Reading does not contain a ‘gene’ attribute

Bio-DB-HTS installation and ensembl-vep

Bio-DB-HTS installation and ensembl-vep 0 I want to use ensembl-vep with custom annotation. In order to use gff file I need to have library Bio-DB-HTS installed. I downloaded Bio-DB-HTS and used Build.PL with no errors. When I try to install ensembl-vep it still gives an error asking for Bio-DB-HTS library….

Continue Reading Bio-DB-HTS installation and ensembl-vep

GRCh37 GFF filter transcript isoforms by RefSeq Select tag or longest

GRCh37 GFF filter transcript isoforms by RefSeq Select tag or longest 0 Dear all, I tried to filter the “RefSeq Select” transcript isoforms in the GRCh37.p13 human genome annotation gff (GCF_000001405.25_GRCh37.p13_genomic.gff.gz). Specifically my goal is to retain for each gene a transcript isoform with a tag=RefSeq Select attribute if exists,…

Continue Reading GRCh37 GFF filter transcript isoforms by RefSeq Select tag or longest

MiRBase miRNA analysis with STAR

MiRBase miRNA analysis with STAR 0 Hi All, I am using the latest mice reference genome (GRCm39) for small RNAseq/miRNA-seq analysis. MiRBase database doesn’t have any GFF/GTF file for the mouse mature-miRNA/loop-miRNA. I just have mature-miRNA and loop-miRNA fasta sequences from MiRBase. How I can use the STAR tool to…

Continue Reading MiRBase miRNA analysis with STAR

Answer: PopGenome – VCF, fasta, GTF and codons still missing

Dear Maciek Hopefully you were able to solve these problems already. I cannot comment on the main set of issues you reported. However, I also encountered the error: `Error in START[!REV, 3] : incorrect number of dimensions` following certain instances of `set.synnonsyn` which I also noticed occurred for genes which…

Continue Reading Answer: PopGenome – VCF, fasta, GTF and codons still missing

MAKER genome annotation error with SNAP ab initio prediction

I am trying to do a second round of maker genome annotation with ab initio prediction by snap. The error I am getting is as follows: error: unknown command “genome.hmm”, see ‘snap help’. ERROR: Snap failed –> rank=NA, hostname=bioinformatics ERROR: Failed while preparing ab-inits ERROR: Chunk failed at level:0, tier_type:2…

Continue Reading MAKER genome annotation error with SNAP ab initio prediction

How to trim a GFF3 file based on specific coordinates?

How to trim a GFF3 file based on specific coordinates? 0 Hi, I would like to create a GFF3 file containing information only for specific coordinates from the chromosome level GFF3 file. I know how to extract gene and CDS info separately but don’t know how to do trimming based…

Continue Reading How to trim a GFF3 file based on specific coordinates?

STAR rna-seq for bacterial genomes

Hi, I’m willing to use STAR for bacterial genomes. I wanted to ask if this is strongly unadvised or if there is a way to manage the main challenges of mapping reads to prokaryotes. (I know there are specific tools for this purpose, i.e. EdgePro, but I’m a beginner in…

Continue Reading STAR rna-seq for bacterial genomes

hisat2 compatibility for long read

hisat2 compatibility for long read 0 Hi, I am trying to align PacBio transcriptome reads against the genome to count the gene number. For pair end read i used the following workflow: # convert gff to gtf /home/software/cufflinks-2.2.1/gffread xxx.gff -T -o xxx.gtf # build index /home/software/hisat2-2.2.1/hisat2_extract_exons.py xxx.gtf > xxx.exon /home/software/hisat2-2.2.1/hisat2_extract_splice_sites.py…

Continue Reading hisat2 compatibility for long read

How to identify mutations from FASTA sequences?

How to identify mutations from FASTA sequences? 0 I have two full genome sequences (in Fasta format) plus annotation file (in gff format) from same organism. One sequence is the reference genome and another is my test sequence. Would you please suggest me some pipeline or tools ( preferably, R…

Continue Reading How to identify mutations from FASTA sequences?