Tag: GFF

amrfinder not working on loop?

amrfinder not working on loop? 0 Hi, i am trying to run amrfinder on multiple genome as in loop, and it gives following error, and whatever input file I am using in this program are also not shown after it run. #!/bin/bash **for k in /home/bvs/neelam/AMRFINDER_hypo/hyocool/*.fasta;do NAME=$(basename $k .fasta) echo…

Continue Reading amrfinder not working on loop?

What Are The Most Common Stupid Mistakes In Bioinformatics?

Forum:What Are The Most Common Stupid Mistakes In Bioinformatics? 78 While I of course never have stupid mistakes…ahem…I have many “friends” who: forget to check both strands generate random genomic sites without avoiding masked (NNN) gaps confuse genome freezes and even species but I’m sure there are some other very…

Continue Reading What Are The Most Common Stupid Mistakes In Bioinformatics?

failed to find the gene identifier attribute

featureCounts: ERROR: failed to find the gene identifier attribute 1 Hello I made my own gtf file from hmmer results and I used it to calculate abundance of genes from the annotated feature of my gtf file using featureCounts program. The error message that I got is the following: featureCounts…

Continue Reading failed to find the gene identifier attribute

Using RNA-seq to detect pathogen sequences in host tissue

Using RNA-seq to detect pathogen sequences in host tissue 0 Hello all, I have a project that I am working on that I wanted to get some guidance on if possible. Basically, we have sent samples for RNA-seq in which we want to determine infection and levels of infection in…

Continue Reading Using RNA-seq to detect pathogen sequences in host tissue

ChIP-Seq

ChIP-Seq Input Data (Reference Feature)       LiftOver   LiftOver option] body=[We provide on-the fly lift-over of reference data sets between different genome assemblies for broader comparison among annotations.]”> :    Upload custom Data   File Format] body=[All ChIP-seq tools use SGA (Simplified Genome Annotation) files as an internal working format. SGA intput…

Continue Reading ChIP-Seq

Extract transcript ID and gene ID from ITAG4.1_gene_models.gff

Extract transcript ID and gene ID from ITAG4.1_gene_models.gff 0 Hello all, I was hoping to extract the transcript ID and corresponding gene ID from ITAG4.1_gene_models.gff (downloaded from solgenomics.net/ftp/genomes/Solanum_lycopersicum/annotation/ITAG4.1_release/) using R. I have tried different methods: First method: List <- tr2g_gff3(file = directory, write_tr2g = FALSE, get_transcriptome = FALSE, save_filtered_gff =…

Continue Reading Extract transcript ID and gene ID from ITAG4.1_gene_models.gff

How to extract summary statistics from GFF3 /GTF file?

Hi! You could try using the gffutils Python library as an alternative to the AGAT toolkit for extracting summary statistics from GFF3/GTF files. gffutils is a flexible and efficient library for working with GFF and GTF files in a variety of formats. Here’s an example of how to use gffutils…

Continue Reading How to extract summary statistics from GFF3 /GTF file?

Error when converting hmmsearch output to gff file

Error when converting hmmsearch output to gff file 0 Hello, I’m trying to convert a hmmsearch output to gff format. For this, I ran the following: hmmsearch –domtblout dom_results.txt –cpu 10 hydrocarbon.hmm orfs_file.faai > demo.log After getting the dom_Results.txt table, I ran the hmmer2gff program from the mgkit program: hmmer2gff…

Continue Reading Error when converting hmmsearch output to gff file

error when converting hmmer table to off table

Hello, I performed an alignment using hmmer hmmsearch tool using metagenomic contigs as a query and a hydrocarbon database (hydrocarbon.hmm file). I n first instance I first retrieved all ORFs from the contigs and translated them with esl-translate program as following: esl-translate -c 11 input_contigs.fa > translated_orfs.fa After getting the…

Continue Reading error when converting hmmer table to off table

Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli

1. Introduction Shiga toxin-producing Escherichia coli (STEC) are important zoonotic pathogens comprising more than 400 serotypes (Beutin and Fach, 2015). Pathogenic STEC strains such as enterohemorrhagic E. coli (EHEC) may cause hemorrhagic colitis (HC) and hemolytic-uremic syndrome (HUS) in humans. However, it remains difficult to fully define human pathogenic STEC…

Continue Reading Combination of whole genome sequencing and supervised machine learning provides unambiguous identification of eae-positive Shiga toxin-producing Escherichia coli

Bioinformatics Analyst job with Cincinnati Children’s- Bioinformatics

Expected Starting Salary Range: 27.60 – 35.28 SUBFUNCTION DEFINITION: Bioinformatics applies biology, computer science, data science, and statistics to analyze and interpret biological and clinical data. REPRESENTATIVE RESPONSIBILITIES Management and “wrangling” of genomic data. Help manage Xenopus genomes and participate in genome annotation and gene nomenclature. Perform gene orthology…

Continue Reading Bioinformatics Analyst job with Cincinnati Children’s- Bioinformatics

Using a transcriptome from Trinity in Phyluce

Using a transcriptome from Trinity in Phyluce 0 Hi there, I’m working with multiple RNA seq samples. These samples belong to bird blood infected with malaria parasites. The genome reference for the parasite species is not available. Most of my reads belong to bird and I’m not interesting on them…only…

Continue Reading Using a transcriptome from Trinity in Phyluce

Adding ‘gene_name’ attribute to each row of GTF/GFF file (missing for CDS, transcript, and exon rows)

Hello, Can someone please help me with this issue I’m having? Thank you in advance! I have a GFF file, and I have the gene_name attribute in my GFF file, but it’s only present for each gene entry (i.e., it’s absent from the transcript, cds, and exon rows). I want…

Continue Reading Adding ‘gene_name’ attribute to each row of GTF/GFF file (missing for CDS, transcript, and exon rows)

Annotate CDS and UTR given transcript

Annotate CDS and UTR given transcript 1 I am annotating a new genome and I am combining several sources of information for the annotation. It combines a de novo annotation as well as lifting over annotations from closely related species. I have been using GFFCompare (ccb.jhu.edu/software/stringtie/gffcompare.shtml) to merge GFF files….

Continue Reading Annotate CDS and UTR given transcript

How to convert SAM/BAM file to GTF/GFF file?

How to convert SAM/BAM file to GTF/GFF file? 1 Hello, Curious to know if there’s a way to convert SAM/BAM file generated using minimap2 to GTF/GFF file. The purpose is to use it as transcript alignment evidence for EVM. Kindly suggest! Regards, B GTF SAM BAM GFF minimap2 • 32…

Continue Reading How to convert SAM/BAM file to GTF/GFF file?

Query in indexing human genome

Hello , I have to do RNAseq analysis of human cancer cell lines , for that I need to index human genome , as a refrence genome. I index the human genome gff file from thr NCBI.. during some lecture I have heard that ncbi human genome file has some…

Continue Reading Query in indexing human genome

dosen’t show options in bash ln ubuntu

htseq-count : dosen’t show options in bash ln ubuntu 1 foad@Linux:~/Example/Sam$ htseq-count -h usage: htseq-count [options] alignment_file gff_file This script takes one or more alignment files in SAM/BAM format and a feature file in GFF format and calculates for each feature the number of reads mapping to it. See htseq.readthedocs.io/en/master/count.html

Continue Reading dosen’t show options in bash ln ubuntu

Supported Tools – MultiQC

Tool Tool Name Description Removes adapter sequences and trims low quality bases from the 3′ end of reads. Overlapping paired-ended reads can be merged into consensus sequences and adapter sequence can be found for paired-ended data if not known. Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data….

Continue Reading Supported Tools – MultiQC

use ROSE to identify super enhancer

use ROSE to identify super enhancer 0 hey everyone, i want to use ROSE to identify super enhancer and to see if there is difference in the super enhancer after some treatment in lung cancer cell line i see that this is the typical use: [user@cn3107 ~]$ ROSE_main.py -h Usage:…

Continue Reading use ROSE to identify super enhancer

Find Pathogenic Variants

Find Pathogenic Variants 1 Hi dear community, I don’t have any experience in variant calling, and I have to solve this problem: Using the most recent VCF file describing ClinVar variants and a bed/gff file of the coding sequence of curated RefSeq genes, write a script that outputs all the…

Continue Reading Find Pathogenic Variants

PathoFact GFF output

PathoFact GFF output 0 0 Entering edit mode 2 hours ago theadrijasaha • 0 How can I convert PathoFact ARG output to GFF? PathoFact output GFF • 26 views ADD COMMENT • link 2 hours ago by theadrijasaha • 0 Login before adding your answer. Similar Posts Loading Similar Posts…

Continue Reading PathoFact GFF output

hg38 Ig regions

hg38 Ig regions 3 Hi, I’m looking for the Immunoglobulin regions coordinates in hg38 assembly. I want to exclude them from my CNV analysis. I know the hg19 regions but I do not want just to liftover them. Many thanks assembly sequence • 2.0k views You can use Ensembl’s BioMart…

Continue Reading hg38 Ig regions

Count miRNA reads from gtf file

Hi There! I performed count reads by usinf FeatureCounts by using annotation file gff miRBase and Lawson’s gtf annotation file (here). I have 3 replicates for each sample ( n = 3). The count reads based on miRBase annotation shows one outlier replicate in one of my sample on DESEQ2…

Continue Reading Count miRNA reads from gtf file

gff to gtf missing gene id

gff to gtf missing gene id 1 Hi, I was trying my hand at annotating a genome using prokka, and I’ve converted the output gff file to gtf (gffread file.gff -T -o file.gtf) and this is what my gtf file looks like: CP001095.1 prokka transcript 210 1712 . + ….

Continue Reading gff to gtf missing gene id

Are there newer versions of NCBI Protein Clusters?

Are there newer versions of NCBI Protein Clusters? 0 Within NCBI, the Protein Clusters database provides an FTP server in www.ncbi.nlm.nih.gov/proteinclusters/faq/ with text files containing protein clusters. The PCLA cluster for prokaryotic genomes, which is the one I’m interested in, was updated for the last time in 2017. I tried…

Continue Reading Are there newer versions of NCBI Protein Clusters?

Maker Gff3 file issues

Maker Gff3 file issues 1 Hi community, This is really a technical question, I hope it is OK to post it here… I am trying to import the gff3 file from Maker to my Jbrowse to view the annotations. I am using the maker2jbrowse script and getting constant errors. There…

Continue Reading Maker Gff3 file issues

Live-attenuated vaccine sCPD9 elicits superior mucosal and systemic immunity to SARS-CoV-2 variants in hamsters

Ethics statement In vitro and animal work were conducted under appropriate biosafety conditions in a BSL-3 facility at the Institut für Virologie, Freie Universität Berlin, Germany. All animal experiments were performed in compliance with relevant institutional, national and international guidelines for the care and humane use of animals and approved…

Continue Reading Live-attenuated vaccine sCPD9 elicits superior mucosal and systemic immunity to SARS-CoV-2 variants in hamsters

GFF/GTF file error / featureCounts

Hi all, I am trying to generate a count.matrix for sorted bam files, using featureCounts on linux. I have a non-modal organism (bacteria), so I generated the annotation.file using both PROKKA and RAST. I used all the following files in featurecounts; PROKKA.gff, RAST.gff RAST.gtf gffread converted-PROKKA.gtf file But still facing…

Continue Reading GFF/GTF file error / featureCounts

gff3 – Extracting animo acid and nucleotide sequences from KofamScan output and codon alignment

I want to extract the amino acid sequences from KofamScan output, and my workflow is as attached in the picture: For the analysis I am doing, I need to get the animo acid sequences, align them, and do codon alignment with the corresponding nucleotide sequences, so that I can get…

Continue Reading gff3 – Extracting animo acid and nucleotide sequences from KofamScan output and codon alignment

Error at phase 4 when running GeMoMa (homology-based annotation)

Error at phase 4 when running GeMoMa (homology-based annotation) 0 I am using two external GFF’s of related organisms to perform a homology annotation approach for a de novo genome assembly. This is my code and the error output I received at stage 4 – error with the ‘GeMoMa Annotation…

Continue Reading Error at phase 4 when running GeMoMa (homology-based annotation)

featureCounts problem in reading Gff

Hi, I am performing RNA-seq analysis on 12 samples. After mapping the reads with Hisat2, I want to count the number of reads using feaureCounts, but I reencountered a problem in reading the gff file downloaded from TAIR. (I also tried downloading it from Ensmbl, but no difference). featureCounts -p…

Continue Reading featureCounts problem in reading Gff

Extract CDS from maker gff

Extract CDS from maker gff 2 Hello, I have annotated several genomes using the Maker2 pipeline with the goal of estimating dN/dS ratios for many genes. I have the gff files, and I would like to extract just the coding sequences into a fasta file. Previously I have been using…

Continue Reading Extract CDS from maker gff

Link products to their genes

Link products to their genes 0 Hello everyone, In my gff/gtf files, I have complete product names (proteins) for the up/down-regulated genes, and my edfeR-generated csv file contains only the gene_IDs and their statistics. After checking both gff/gtf files, I found that they only contain complete product names but not…

Continue Reading Link products to their genes

Error parsing strand (?) from GFF line

Error parsing strand (?) from GFF line 0 I am trying to assemble RNA transcripts using stringtie and facing the following error. Error parsing strand (?) from GFF line: NC_037304.1 RefSeq gene 58315 59481 . ? . ID=gene-DA397_mgp34;Dbxref=GeneID:36335702;Name=nad1;exception=trans-splicing;gbkey=Gene;gene=nad1;gene_biotype=protein_coding;locus_tag=DA397_mgp34;part=2 my comand is : stringtie -p 8 -G Genome/arab_thaliana.gtf -o Assemble/NR1.gtf –l…

Continue Reading Error parsing strand (?) from GFF line

Convert Abricate output tsv file to gff3 format

Here’s one way using awk, that I think fulfills the requirements. It adds each of the column names (on the first line) to an array to make accessing each of the fields a bit easier. This approach isn’t strictly necessary, but it does make for a more readable solution in…

Continue Reading Convert Abricate output tsv file to gff3 format

Improving conversion of abricate tsv file to gff3 file

Since such a neat solution (abricate tsv to gff3) was provided by Steve, here are few other steps that I am looking to add so that the script progress to logical maturity to be usable by many others. I have two files – (1) fasta file with .fna extension, and…

Continue Reading Improving conversion of abricate tsv file to gff3 file

How to get gene from PSIBLAST resuts

How to get gene from PSIBLAST resuts 1 Hello, I am currently utilizing the local version of PSIBLAST to search for homologous genes by comparing a protein sequence against a protein database. However, I am interested in obtaining the nucleotide sequence from the PSIBLAST results. Although I have access to…

Continue Reading How to get gene from PSIBLAST resuts

Stringtie does not work with NCBI GTF file?

Stringtie does not work with NCBI GTF file? 1 Hi all, I wanted to rerun my DGE analysis to see if there were any differences between HTseq-count -> edgeR and StringTie-Ballgown. However, when I tried to run my stringtie command using the same BAM file, I got an error: “Error:…

Continue Reading Stringtie does not work with NCBI GTF file?

Converting Abricate output (.tsv) to gff3 format

Converting Abricate output (.tsv) to gff3 format 0 Hello Everyone I have a tsv file generated from abricate (github.com/tseemann/abricate). I need to convert them to gff3 format with certain columns retained, certain columns reordered, while other columns deleted. We are trying to use these gff3 files for downstream applications and…

Continue Reading Converting Abricate output (.tsv) to gff3 format

Where do find virulence gene information in a gff/gtf file?

Where do find virulence gene information in a gff/gtf file? 1 Sorry for the rookie question, but I don’t have a ton of experience with genome annotation and microbial genomics. I want to identify the virulence genes in my microbial species of interest from the GTF/GFF file. How do I…

Continue Reading Where do find virulence gene information in a gff/gtf file?

org.biojava.nbio.core.sequence.CDSSequence.getSequenceAsString java code examples | Tabnine

/** * A CDS sequence if negative stranded needs to be reverse complement * to represent the actual coding sequence. When getting a ProteinSequence * from a TranscriptSequence this method is callled for each CDSSequence * {@link www.sequenceontology.org/gff3.shtml} * {@link biowiki.org/~yam/bioe131/GFF.ppt} * @return coding sequence */ public String getCodingSequence() {…

Continue Reading org.biojava.nbio.core.sequence.CDSSequence.getSequenceAsString java code examples | Tabnine

Post about extracting information from GFF

Post about extracting information from GFF 0 hello everyone I want to get some “clean comments” from the GFF file. Is there any software or code that can do this? Any help will be appreciated. GFF • 41 views • link updated 1 hour ago by Juke34 7.7k • written…

Continue Reading Post about extracting information from GFF

Visualising Roary Results

Visualising Roary Results 1 I run roary ( roary -e –mafft -p 32 *.gff) to produce core genome alignment on hundreds of Salmonella sequences and have results. I have the following files in the results: gene_presence_absence.csv gene_presence_absence.Rtab pan_genome_reference.fa accessory_binary_genes.fa.newick accessory_graph.dot core_accessory_graph.dot core_gene_alignment.aln clustered_proteins I would like to visualise the panSNP…

Continue Reading Visualising Roary Results

how to make a .tbi file of .gtf.gz?

how to make a .tbi file of .gtf.gz? 2 Hello, I have a .gtf.gz file which I am going to use in a python code. for using the pysam module in python it requires an indexed file for gtf.gz? How can I index that file? Thank you in advance. tbi…

Continue Reading how to make a .tbi file of .gtf.gz?

HTseq no features

HTseq no features 0 I have got some problem when analyzing my RNAseq data and I would like to seek for a help. Here is the brief description of my pipeline: Obtained fasta file of 150 PE reads from Novaseq platform followed by non-stranded library prep I conducted fastQC and…

Continue Reading HTseq no features

Can ChiPseeker be as highly customizable as “computeMatrix”? for example if I want to plot the distribution of genes on the TAD boundary?

Can ChiPseeker be as highly customizable as “computeMatrix”? for example if I want to plot the distribution of genes on the TAD boundary? 0 @8f91699d Last seen 21 hours ago Hong Kong It’s like “computeMatrix” in deeptools, computeMatrix scale-regions -R TAD.boundaries.bed -S gene.density.bw gives the distribution of the eigenvalues (in…

Continue Reading Can ChiPseeker be as highly customizable as “computeMatrix”? for example if I want to plot the distribution of genes on the TAD boundary?

bwa-mem2 vs htslib – compare differences and reviews?

What are some alternatives? When comparing bwa-mem2 and htslib you can also consider the following projects: minimap2 – A versatile pairwise aligner for genomic and spliced nucleotide sequences bowtie2 – A fast and sensitive gapped read aligner genozip – A modern compressor for genomic files (FASTQ, SAM/BAM/CRAM, VCF, FASTA, GFF/GTF/GVF,…

Continue Reading bwa-mem2 vs htslib – compare differences and reviews?

Unable to extract cds and exons fasta file from exonerate gff file using exonerate-protein2genome-gff-to-fasta.pl

Unable to extract cds and exons fasta file from exonerate gff file using exonerate-protein2genome-gff-to-fasta.pl 0 Hey Can anyone help? I got this error: Use of uninitialized value $orientation in string eq at exonerate-protein2genome-gff-to-fasta.pl line 90, line 164 Line 164 of my gff: genex xnt2h CDSpart 5231111 5234643 . + ….

Continue Reading Unable to extract cds and exons fasta file from exonerate gff file using exonerate-protein2genome-gff-to-fasta.pl

Custom Annotaion file

Custom Annotaion file 0 Hi Everyone, Can anyone please guide me how to generate an annotation file for (5′ and 3′) UTR and CDS (all of them are one GFF/GTF file) from already existing hg38 annotation file ? I did downloaded annotation file from genome.ucsc.edu/cgi-bin/hgTables but firstly its in BED…

Continue Reading Custom Annotaion file

Error while converting GFF file to GTF using AGAT

Error while converting GFF file to GTF using AGAT 0 Hi I am trying to convert a gff file to gtf file which I want to use for STAR. I tried AGAT(latest version) to convet but it gives me a series of error(mailny tow types) .I have attached the error…

Continue Reading Error while converting GFF file to GTF using AGAT

gff format to genome annotation

gff format to genome annotation 0 I am mapping RNAseq transcripts against a genome to annotate it. I am looking at Spaln and GMAP, and they both have two types of gff files as output (GFF3 gene format and GFF3 match format), which one is better to proceed with annotation?…

Continue Reading gff format to genome annotation

TRF output to .gff file

TRF output to .gff file 2 Hello, biostars! I’m trying to get .gff file from Tandem Repeat Finder output. Since TRF can’t do that, I’ve found TRAP tool, which can create .gff. But, TRAP creates as many .gff files as the number of contigs (ok, there is ‘cat’ command). The…

Continue Reading TRF output to .gff file

Strain-level bacterial typing directly from patient samples using optical DNA mapping

Figure 1a provides an overview of the experimental procedure and it is summarized in the following sections. Fig. 1: Schematic overview of high-resolution optical DNA mapping-based bacterial typing. a Experimental pipeline. The DNA, extracted via plug-lysis, is labelled with YOYO-1 and netropsin in a single step. The DNA is confined to…

Continue Reading Strain-level bacterial typing directly from patient samples using optical DNA mapping

rna seq – Which candida albicans fasta and gff file should I use for alignment?

The refseq is Candida albicans SC5314. I assume you are performing a fasta reference based assembly. Its 8 chromosomes are NC_032089.1 to NC_032096.1 inclusively from chromosome 1 to chromosome 7 (NC_032095.1) to chromosome R. Its here. Most of the files you downloaded are SC5314. So I dunno it depends what…

Continue Reading rna seq – Which candida albicans fasta and gff file should I use for alignment?

How to use chado after installation?

How to use chado after installation? 0 Hello, this is the first time I’m having contact with chado and perl, after some problems I managed to install it, however, I don’t know how to continue. GMOD provides documentation for converting gff file to gff3 and other data. However, I am…

Continue Reading How to use chado after installation?

[E::idx_find_and_load] warning in htseq-count

[E::idx_find_and_load] warning in htseq-count 0 I am running the htseq-count but get the [E::idx_find_and_load] warning. The bam files were sorted with name but without index. It is not required for index when running htseq-count, correct? *CODE: htseq-count -f bam \ -s no \ -t exon \ -m union \ -i…

Continue Reading [E::idx_find_and_load] warning in htseq-count

PROKKA.gff file is not compatible with featureCounts

Hi all, I am trying to count the number of reads that map to each gene using FeatureCounts. (RNA-Seq PE, linux) my input; GFF. file generated using Prokka GTF.file generated by NCBI annotation Sorted.bam files generated by bowtie2 and samtools. When I used gtf.file generated by NCBI, featurecounts run without…

Continue Reading PROKKA.gff file is not compatible with featureCounts

Gene names/ids on annotated protein files

Gene names/ids on annotated protein files 0 Hello all, I have annotated around 276 protein files using EggNog. The protein files have the following headers (example from one of the species): head Spodoptera_frugiperda.fa file_1_file_1_g22553.t1 gene=file_1_file_1_g22553 MNRLGMIVDLSHVGENTTRAAIKLSRAPVVFTHSSVYSLCNHKRNVPDDIIHSLKENGGIIMVNFFPDFVKCAPNATISDVAEHFHYIKRMVGADYVGIGGDFDGVNRVPRGLEDVSRYPELFAELLRSGQWTVQELKNLAGLNMLRVMRQVEKVRDEMRTNGVEPEEHPDSPNDNGNCTSNAFYTEYV The annotation file from EggNog has the following: head Spodoptera_frugiperda.softmasked.prot.fa.emapper.annotations file_1_file_1_g22553.t1 13037.EHJ66618 2.39e-121 357.0…

Continue Reading Gene names/ids on annotated protein files

How can I easily remove overlapping transcripts, keeping only longest transcript, in a GFF file.

How can I easily remove overlapping transcripts, keeping only longest transcript, in a GFF file. 2 I have recently annotated a genome using mikado. However, the resultant gff file contains loci that are overlapping but not annotated as isoforms: i.e. chr17 Mikado_loci ncRNA_gene 1014098 1014976 17 – . ID=Fly1;Name=Fly1;multiexonic=True;superlocus=Mikado_superlocus:chr17-:1014059-1015028 chr17…

Continue Reading How can I easily remove overlapping transcripts, keeping only longest transcript, in a GFF file.

What’s the absolute easiest way of visualizing stranded RNA-seq data on custom genomes?

What’s the absolute easiest way of visualizing stranded RNA-seq data on custom genomes? 0 I’ve googled and found nearly every thread on this topic on Biostars and it seems like the consensus to visualizing stranded RNA-seq data is splitting your BAM file into one for each strand. I’m dealing with…

Continue Reading What’s the absolute easiest way of visualizing stranded RNA-seq data on custom genomes?

warning message after HTSeq

warning message after HTSeq 0 I have analyzed RNA-seq data with HTSeq. My command that I used is python -m HTSeq.scripts.count -f bam -r pos -s reverse -t ORF -i group -m union input.bam my.gff > output.txt Warning message is Warning: Mate records missing for 3752 records; first such record:…

Continue Reading warning message after HTSeq

Sort gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS)

Sort gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS) 1 Is it possible to sort a gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS). The order of the featuretypes is important when converting a gff file to a gtf file with gffread. If the…

Continue Reading Sort gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS)

gff2bed fails with “Non-numeric start coordinate. “

gff2bed fails with “Non-numeric start coordinate. “ 0 I am running gff2bed 2.4.41 and get the error: Warning: If your Wiggle data is a significant portion of available system memory, use the –max-mem and –sort-tmpdir options, or use –do-not-sort to disable post-conversion sorting. See –help for more information. Non-numeric start…

Continue Reading gff2bed fails with “Non-numeric start coordinate. “

STAR is running but .sam file size does not increase after hours mapping

STAR is running but .sam file size does not increase after hours mapping 0 Hi there, I’m using STAR with a small genome. My samples are paired. The commands are: For genome indexes STAR –runThreadN 20 –runMode genomeGenerate –genomeDir /path/to/folder/Analyses/STAR/ –genomeFastaFiles /path/to/genome_reference/genome.fna –readFilesCommand zcat path/to/folder/with/giz_samples/R1.fq.gz R2.fq.gz –sjdbGTFfile path/to/genome_reference/genome.gff –genomeSAindexNbases 11…

Continue Reading STAR is running but .sam file size does not increase after hours mapping

HTSeq error processing GFF file

HTSeq error processing GFF file 0 Hello, I am trying to run HTSeq but it tells me that I have a problem in my GFF and GTF file, how can I fix this? enrique@L:~prueba_guess$ htseq-count -f bam SRR214880.bam -s no -i ID -r pos -t exon GCF_902167145.1_Zm-B73-REFERENCE-NAM-5.0_genomic.gff > SRR214880.txt 100000…

Continue Reading HTSeq error processing GFF file

Phylogenetic and AlphaFold predicted structure analyses provide insights for A1 aspartic protease family classification in Arabidopsis

Introduction Proteases regulate various biological processes including protein synthesis and maturation, activity modification, degradation and turnover. Depending on their catalytic mechanisms, these proteases are primarily classified into cysteine, metallo-, serine, threonine and aspartic protease family (Beers et al., 2004). The latter protease family is known as acid protease family because they…

Continue Reading Phylogenetic and AlphaFold predicted structure analyses provide insights for A1 aspartic protease family classification in Arabidopsis

Seqlengths of x contains NA values!

Hello, I would like to use ORFik to determine the coverage of the different ORFs across the maize genome. I have ribo-seq data, the latest annotation file (a GFF3), and the v5 genome fasta file for B73. After running my code, three Large CompressedGRangesLists are created and none of them…

Continue Reading Seqlengths of x contains NA values!

gff file from NCBI RefSeq GCF dataset has an invalid format

Thank you for noticing this. It is indeed an issue in the GFF3 file. The root of the problem is it’s a gene that is impossible to correctly represent in GFF3 because it incorporates sequence from both strands via trans_splicing. The complexity of this gene can be seen on the…

Continue Reading gff file from NCBI RefSeq GCF dataset has an invalid format

unable to open file or unable to determine types for file**

**Error: unable to open file or unable to determine types for file** 0 Error: unable to open file or unable to determine types for file Hi i am trying to run this commands bedtools intersect -a Ec_k12.gff -b target_genes.txt -f 0.5 -wa -wb > genes_with_coordinates.bed and i am getting this…

Continue Reading unable to open file or unable to determine types for file**

“Error parsing strand (?) from GFF line” happenning in gffread, stringtie and cufflinks

“Error parsing strand (?) from GFF line” happenning in gffread, stringtie and cufflinks 0 Hi! I’m working with various genomic data and while trying to use gffread, stringtie and cufflinks I went through the same error: Error parsing strand (?) from GFF line: NC_037304.1 RefSeq gene 58315 59481 . ?…

Continue Reading “Error parsing strand (?) from GFF line” happenning in gffread, stringtie and cufflinks

Mapping a small number of contigs to a reference subsequence

Mapping a small number of contigs to a reference subsequence 1 Hello, Is there a method or a tool to align/map a small number of contigs (obtained with Canu) to a subsequence extracted from a reference genome ? For example,in a similar way to reads to reference assembly (using bwa…

Continue Reading Mapping a small number of contigs to a reference subsequence

Bedtools sort -faidx producing empty output

Hello, I am trying to sort a GFF file using the FASTA index output from samtools faidx. Both of my input files have content, but when I use the command below to sort my GFF, I get no output to the screen or file. bedtools sort -faidx F._oxysporum_f._sp._cubense_UK0001.fna.fai -i ../Mimps/F._oxysporum_f._sp._cubense_UK0001.fna_mimp_hits.gff…

Continue Reading Bedtools sort -faidx producing empty output

genbank sequence format

HHS Vulnerability Disclosure, Help This document is an overview of the Entrez databases, with general information on If you are not sure that the “Save” option in your program will do this for you, use “Save As”, In Excel, select “Save As” from the File menu. optimizations to reduce memory…

Continue Reading genbank sequence format

Bug#1024835: python-pauvre: ftbfs with biopython 1.80

Source: python-pauvre Version: 0.2.3-1 Severity: important Tags: ftbfs Dear Maintainer, python-pauvre fails its build time test suite when built against biopython 1.80 in experimental but builds successfully against biopython 1.79. The relevant part of the log in case of failure shows: I: pybuild base:240: cd /<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pauvre/build; python3.10 -m unittest discover -v test_normal_plotting_scenario (pauvre.tests.test_synplot.libSeq_test_case) This…

Continue Reading Bug#1024835: python-pauvre: ftbfs with biopython 1.80

Bedtools Bam To Bed With Code Examples

Bedtools Bam To Bed With Code Examples With this article, we’ll look at some examples of how to address the Bedtools Bam To Bed problem . bedtools bamtobed [OPTIONS] -i <BAM> As we have seen, a large number of examples were utilised in order to solve the Bedtools Bam To…

Continue Reading Bedtools Bam To Bed With Code Examples

Freebayes-parallel with large bam file – individual threads running for >6 days

Context: I’m trying to call variants on a sequencing project using pooled genotyping-by-sequencing. Pools consist of 94 samples each, alongside a number of individuals. Sequence data was demultiplexed and then aligned to a reference genome using hisat2, and the resultant bams were merged with samtools merge. The problem bam is…

Continue Reading Freebayes-parallel with large bam file – individual threads running for >6 days

Python pandas transforming int to float in gff subsetting

Hey guys, I’ve written this python code. import pandas as pd from Bio import SeqIO import argparse parser= argparse.ArgumentParser(add_help=False) parser.add_argument(“-h”, “–help”, action=”help”, default=argparse.SUPPRESS, help= “Get partial gff given a pattern on Names field”) parser.add_argument(“-g”, help= “-g: gff file”, required = “True”) parser.add_argument(“-l”, help= “-l: list of patterns to search on…

Continue Reading Python pandas transforming int to float in gff subsetting

can gff2 reference used in htseq-count?

Dear all We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples. The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly. However, we were unable to obtain a gff3 version of our target plasmid genome, the…

Continue Reading can gff2 reference used in htseq-count?

Running synteny of 2 strain.

Running synteny of 2 strain. 0 “If i have 2 strain of same species. And i have genomic island regions on excel sheet. And now i want to view the synteny of those regions on both strain. How can i do this? I have used a tool names synvisio there…

Continue Reading Running synteny of 2 strain.

The low successful assignment ratio of FeatureCounts

Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…

Continue Reading The low successful assignment ratio of FeatureCounts

Use RSEM and Bowtie2 to align paired-end sequences

Use RSEM and Bowtie2 to align paired-end sequences 0 I want to use rsem-calculate-expression and bowtie2 aligner to align paired-end sequence based on the following conditions: 2 processors generate BAM file very fast bowtie2 sensitivity append gene/transcript name My code: rsem-refseq-extract-primary-assembly GCF_000001405.31_GRCh38.p5_genomic.fna GCF_000001405.31_GRCh38.p5_genomic.primary_assembly.fna rsem-prepare-reference –gff3 GCF_000001405.31_GRCh38.p5_genomic.gff –bowtie2 –bowtie2-path /bowtie2-2.4.5-py39hd2f7db1_2 –trusted-sources…

Continue Reading Use RSEM and Bowtie2 to align paired-end sequences

Accurate assembly of multi-end RNA-seq data with Scallop2

Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). Article  Google Scholar  Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28,…

Continue Reading Accurate assembly of multi-end RNA-seq data with Scallop2

htseq-count error

htseq-count error 1 Hi, htseq-count -f bam -s yes ~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam ~refgen/gencode.v39.primary_assembly.annotation.gtf > counts.txt I am trying to run htseq-count with command above but in the err file [E::idx_find_and_load] Could not retrieve index file for ‘~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam’ 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines…

Continue Reading htseq-count error

“transcript reads were aligned to the RefSeq transcriptome (downloaded March 2013) using Tophat” was done by orginal author.Please help me to get that 2013 transcriptome and gff data

“transcript reads were aligned to the RefSeq transcriptome (downloaded March 2013) using Tophat” was done by orginal author.Please help me to get that 2013 transcriptome and gff data 1 Greetings all, I have Rna-Seq data generated from celseq protocol.I want to replicate the same trancriptome mapping as previous study.As they…

Continue Reading “transcript reads were aligned to the RefSeq transcriptome (downloaded March 2013) using Tophat” was done by orginal author.Please help me to get that 2013 transcriptome and gff data

Feature count is very low using htseq-count

Feature count is very low using htseq-count 0 Hello all, I performed bbmap on my RNA-seq paired sequence data using following cmd bbmap.sh in1=J2_R1.fastq in2=J2_R2.fastq out=output_J2.sam ref=im4.fasta nodisk The header of generated sam file is @HD VN:1.4 SO:unsorted @SQ SN:k141_1006 LN:2503 @SQ SN:k141_5512 LN:5393 @SQ SN:k141_4772 LN:4387 @SQ SN:k141_3267 LN:4531…

Continue Reading Feature count is very low using htseq-count

Petabase-scale sequence alignment catalyses viral discovery

Serratus alignment architecture Serratus (v0.3.0) (github.com/ababaian/serratus) is an open-source cloud-infrastructure designed for ultra-high-throughput sequence alignment against a query sequence or pangenome (Extended Data Fig. 1). Serratus compute costs are dependent on search parameters (expanded discussion available: github.com/ababaian/serratus/wiki/pangenome_design). The nucleotide vertebrate viral pangenome search (bowtie2, database size: 79.8 MB) reached processing rates…

Continue Reading Petabase-scale sequence alignment catalyses viral discovery

How to label columns in HTSeq output

How to label columns in HTSeq output 0 I’ve been working to process RNAseq data and I’ve used hisat2 to align my reads to the reference genome. When I take those output files and put them into HTSeq-count using the below code, I get a count matrix but the columns…

Continue Reading How to label columns in HTSeq output

Indexing with STAR

Indexing with STAR 0 Hello, I am working with RNA seq data and creating an index of reference genome Gossypium hirsutum by using STAR. STAR asks GTF annotation format while my file is GFF3. According to literature, in order to run GFF file I need to remove –sjdbOverhang 50 and…

Continue Reading Indexing with STAR

For Differential Gene Expression , which indexing format is better: GFF or GTF?

For Differential Gene Expression , which indexing format is better: GFF or GTF? 0 Hello, I am working on DGE and wish to create reference index for mapping. Two file formats are used for it GFF and GTF. My question is: What is the major difference between GTF and GFF?…

Continue Reading For Differential Gene Expression , which indexing format is better: GFF or GTF?

How to retrieve fasta sequence after local blast?

How to retrieve fasta sequence after local blast? 1 Hello, I have created a Blast database using a reference genome. Then, I have performed a local blast search in command line using a gene of interest. I have obtained some hits with the usual Blasting information. Now, I want to…

Continue Reading How to retrieve fasta sequence after local blast?

Convertion Of Gff3 To Gtf

Convertion Of Gff3 To Gtf 3 How do I convert GFF file to a GTF file? Is there any tool available? gtf gff • 79k views The easiest way is to use the gffread program that comes with the Cufflinks software suite (Tuxedo) gffread my.gff3 -T -o my.gtf See gffread…

Continue Reading Convertion Of Gff3 To Gtf

how to identiify real isomers in mirge3.0’s output files.

how to identiify real isomers in mirge3.0’s output files. 0 How do you distinguish/extract ‘real’ isomirnas from the exhaustive output of mirge3.0? Im trying to do a differential expression analysis on the isomers of miRNA in my dataset. Im using mirge3.0 with the -gff and other outputs (basically all of…

Continue Reading how to identiify real isomers in mirge3.0’s output files.

Submit sequence data to NCBI

Data provision and standards. GEO sequence submission procedures are designed to encourage provision of MINSEQE elements: Thorough descriptions of the biological samples under investigation, and procedures to which they were subjected. Thorough descriptions of the protocols used to generate and process the data. Request updates to accessioned records per the…

Continue Reading Submit sequence data to NCBI

How to get enome feature annotation through NCBI api ?

How to get enome feature annotation through NCBI api ? 1 Hi, I wanna get the whole genome annotion result with some information ,like transcript,exon,gene etc , As we know ,NCBI has provided the GFF file containing the above information , but I wanna get the latest content from NCBI…

Continue Reading How to get enome feature annotation through NCBI api ?

Refseq annotation for processed/unprocessed Pseudogenes

Refseq annotation for processed/unprocessed Pseudogenes 0 Hi, I have extracted the pseudogenes from refseq annotation file. However there is no information about the type of the pseudogene being processed/unprocessed in the gff file. on the other hand ensembl/gencode gff files do have this type of information. the problem is not…

Continue Reading Refseq annotation for processed/unprocessed Pseudogenes

How to write gffutils.feature.Feature object to file

How to write gffutils.feature.Feature object to file 0 How do you most efficiently write a collection of gffutils.feature.Feature objects to file, so that you can create a gff3 file from a collection of Feature objects? I am trying to create a gff3 file without the ##FASTA part at the bottom,…

Continue Reading How to write gffutils.feature.Feature object to file

read count to gene

read count to gene 0 I am using this command to get read counts to gene by using the bedtools intersect. samtools view -Shu -q10 -@ 20 UE-2955-CMLib12_sorted.bam | bedtools intersect -c -a GCA_900659725.1_ASM90065972v1_genomic.gff -b stdin > UE-2955-CMLib{i}_intersect_counts2.bed The command work for other files but not for one file. Which…

Continue Reading read count to gene

fetch out common/conserved genes from a bunch of bacteria species

fetch out common/conserved genes from a bunch of bacteria species 0 Hi all, I have a difficulty in determining and fetching out the common/conserved regulator genes from a bunch of species. I fetched out all the regulator genes from each bacteria species according to the GFF annotation. I would like…

Continue Reading fetch out common/conserved genes from a bunch of bacteria species

gffread error

hello I am currently trying to do RNA-seq using public data in brassica juncea. To use htseq-count for making count table, I have to convert gff file which downloaded in brassica database to gtf file. So I used gffread for converting gff file with below command gffread Bju.genome.gff -T -o…

Continue Reading gffread error

Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and documents in this repository are freely available for all uses. Code is available under the MIT license and images, documentations and talks under the Creative Commons No…

Continue Reading Incubator for useful bioinformatics code, primarily in Python and R