Tag: gff3

Extract fasta sequence from gff3 file

Extract fasta sequence from gff3 file 2 Hi everyone, I have a lot of .gff3 files with the CDS features and below with the fasta sequence. This sequence is separated from the CDS features like this: ##FASTA >NZ_NZ_LR130533.1 I would like to extract all the fasta sequence into new fasta…

Continue Reading Extract fasta sequence from gff3 file

The Biostar Herald for Monday, December 11, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, cmdcolin, and was edited by…

Continue Reading The Biostar Herald for Monday, December 11, 2023

Where can I get a list of SNPs mapping overlapping genes in humans?

Given files genes.bed and snps.bed, you could do something like: $ bedmap –echo –echo-map-id –delim ‘\t’ genes.bed snps.bed > answer.bed The file answer.bed will contain the gene annotation and a semi-colon delimited list of SNP identifiers that overlap each gene. In order to get genes.bed, you could use Gencode v44…

Continue Reading Where can I get a list of SNPs mapping overlapping genes in humans?

How to download multiple genome files using command line (MacOS) using datasets

datasets download genome accession –inputfile accessions.txt –include gff3,gbff,rna,cds,protein,genome,seq-report Or you simply specify mutliple accessions on the commandline: datasets download genome accession GCF_000001405.40 GCA_003774525.2 GCA_000001635 Edit: Sorry, I overlooked the –inputfile option. This is necessary unless all accessions are from a common taxon or bioproject. In the first case you can…

Continue Reading How to download multiple genome files using command line (MacOS) using datasets

Fastest way to convert BED to GTF/GFF with gene_ids?

This is probably a duplicated question from: How To Convert Bed Format To Gtf? How to convert original BED file to a GTF ? Converting different annotation file formats (GTF/GFF/BED) to each other How to change scaffold.fasta file or scaffold.bed file to GTF file? Convert bed12 to GFF convert bed12…

Continue Reading Fastest way to convert BED to GTF/GFF with gene_ids?

HTseq reports missing attribute name

HTseq reports missing attribute name 1 Hello, I am running this htseq command htseq-count -r pos -t gene -i gene -s yes -f bam \ /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam \ /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3 > \ /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt However I get this error: Error processing GFF file (line 75 of file /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3): Feature gene:ENSTGUG00000013637 does not contain…

Continue Reading HTseq reports missing attribute name

Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA

Summary Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY,…

Continue Reading Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA

Error with HTseq RNAseq read count – rna-seq

Hi, I am getting error while running HTseq. This is the command and the error: htseq-count -q -f bam -s yes Ac1_mapped/ac1_mappedAligned.bam /global/home/users/catalinacastro/star/genome/genomic_v2.gtf count.txt Error occurred when processing GFF file (line 637338 of file /global/home/users/catalinacastro/star/genome/genomic_v2.gtf): not enough values to unpack (expected 9, got 1) [Exception type: ValueError, raised in init.py:221]…

Continue Reading Error with HTseq RNAseq read count – rna-seq

The Imageable Genome | Nature Communications

For the Imageable Genome project, we developed a data pipeline that identifies texts containing radiotracers, recognizes and extracts names of radiotracers from texts, filters for clinically relevant radiotracers and their associated targets, and translates protein names, i.e. of radiotracer targets, to names of the coding genes. We then downloaded the…

Continue Reading The Imageable Genome | Nature Communications

Braker – gc_content.stderr

Good day! I am new to genome annotation. I want to annotate my genome using the Braker tool. I assembled my sample using SOAPdenovo2 and got a genome size of 1.3Gb, now when I try to annotate my genome I get a GC content error (I have shared the error…

Continue Reading Braker – gc_content.stderr

An extremely fast Non-Overlapping Exon Length calculator written in Rust

Hi all! Introducing the Non-Overlapping Exon Length calculator (NOEL), an extremely fast GTF/GFF per gene exon length extractor written in Rust. See the code and latest updates here: github/alejandrogzi/noel In case you do not want to read the whole text: NOEL outperforms all open-sourced scripts/tools for this task. It can…

Continue Reading An extremely fast Non-Overlapping Exon Length calculator written in Rust

Converting STAR Gene-level alignment to TPM expression

Converting STAR Gene-level alignment to TPM expression 0 Hi, I have recently performed gene-level alignment with STAR on 20 samples with the parameter –quantMode GeneCounts and –outSAMtype BAM SortedByCoordinate. I have the output files ReadsPerGene.out.tab and Aligned.sortedByCoord.out.bam. From this, how can I generate reliable TPM values with either the sorted…

Continue Reading Converting STAR Gene-level alignment to TPM expression

How do I write a correctly formatted gff3 file in R?

Dear all, I am trying to annotate non-coding RNA in a small RNA-seq dataset. The RNACentral gff3 file that I am using has different chromosome identifiers than the genome assembly. I have loaded the gff3 file in R where I changed the chromosome identifiers using the the assembly report and…

Continue Reading How do I write a correctly formatted gff3 file in R?

file conversion from gtf to gff3 for evidence modeler

file conversion from gtf to gff3 for evidence modeler 0 Hi, could you please guide me how to convert the stringtie output file stringtie_transcript.gtf into .gff3 format for the evidence modeler of genome annotation. gff3 stringtie gtf • 162 views • link updated 32 minutes ago by Ram 41k •…

Continue Reading file conversion from gtf to gff3 for evidence modeler

VG autoindex with pangenome constructed using minigraph-cactus

Dear developers, I am trying to construct a reference pangenome of a fungi species. After successfully constructing my pangenome using minigraph-cactus, I am struggling to add my isolates’ annotations. For some background: We have de novo assembled and annotated 11 isolates and used the current reference (which has a chromosomal…

Continue Reading VG autoindex with pangenome constructed using minigraph-cactus

What should I do with STAR two-pass novel splice junctions?

What should I do with STAR two-pass novel splice junctions? 0 Hi, I have a few relatively naive questions which I don’t fully understand. I know that the STAR two-pass mode can detect for novel splice junctions on top of the annotations from GTF/GFF3 files. Let’s say I run a…

Continue Reading What should I do with STAR two-pass novel splice junctions?

ROSE Algorithm: index out of range

Hi again, I am trying to run the ROSE algorithm created by the young lab, url here: younglab.wi.mit.edu/super_enhancer_code.html Specifically, I am running the ROSE_main.py script: younglab.wi.mit.edu/super_enhancer_code.html I created a python 2.7 environment to run the script as it is compatible with python 2.7. When I run the script in ubuntu:…

Continue Reading ROSE Algorithm: index out of range

How Do I Convert From Bed Format To Gff Format?

How Do I Convert From Bed Format To Gff Format? 4 I have a file in GFF format and I need to convert it to BED format. What do I do? bed gff galaxy • 29k views Both formats are tab delimited text files used to represent DNA features in…

Continue Reading How Do I Convert From Bed Format To Gff Format?

How to check RNAseq support for annotated genes?

How to check RNAseq support for annotated genes? 2 Hello All, I have a set of annotated genes in gff3 format and corresponding RNA-seq data. What is the recommended approach and are there specific tools and parameters to determine the percentage of genes supported by the RNA-seq data?” Regards, B…

Continue Reading How to check RNAseq support for annotated genes?

Assistance with Fungal Genome Annotation Using Maker and BLAST

Hello everyone, I’m a new user of Maker and I’m seeking assistance with the protocol I’m using. Currently, I’m working on annotating the genome of a non-model ascomycete fungal species belonging to the Sporocadaceae family. After running the analysis with Maker, I obtained FASTA and GFF files using fasta_merge and…

Continue Reading Assistance with Fungal Genome Annotation Using Maker and BLAST

there are extra regions when calculating Tajima’s D per gene

Hello all, I am new to PopGenome and would like to ask one question that greatly confused me. I was trying to calculate Tajima’s D by gene for my whole genome data. I imported the gff files and subsited the data by “gene”. See my codes below. However, when I…

Continue Reading there are extra regions when calculating Tajima’s D per gene

How to order a gff3 file by coordinates

I have discovered that my gff3 file is not in order at the time of defining the gene, mRNA and CDS. An example LG1 phytozomev10 gene 10835748 10846741 . – . ID=gene00257-v1.0-hybrid.v1.1;Name=gene00257-v1.0-hybrid LG1 phytozomev10 mRNA 10835748 10846741 . – . ID=mrna00257.1-v1.0-hybrid.v1.1;Name=mrna00257.1-v1.0-hybrid;pacid=27244575;longest=1;Parent=gene00257-v1.0-hybrid.v1.1 LG1 phytozomev10 CDS 10846566 10846741 . – 2 ID=mrna00257.1-v1.0-hybrid.v1.1.CDS.1;Parent=mrna00257.1-v1.0-hybrid.v1.1;pacid=27244575…

Continue Reading How to order a gff3 file by coordinates

How to set weight for merge legacy annotations

MAKER – How to set weight for merge legacy annotations 0 Good day. I am new to genome annotation.I am running maker to merge evidence from est, homolog, augustus and braker.The following is maker_opts.ctl: genome=genome.fasta est_gff=transcript.gff3 protein=homolog.gffs pred_gff=Augustus.gff3, Braker.gff3 I have executed the “mpiexec -n 30 maker > maker.out ”…

Continue Reading How to set weight for merge legacy annotations

how to use RNAseq data to assist annotation?

MAKER: how to use RNAseq data to assist annotation? 0 Hello, I am performing a MAKER annotation of a de novo plant genome. I have RNA sequencing reads (Illumina paired-end 150bp) to include in the annotation. However, I am confused about the inputs MAKER allows in the maker_opts.ctl file. I…

Continue Reading how to use RNAseq data to assist annotation?

CDS phase 0,1,2 in GFF format

The question was asked before in Calculate CDS phase in gff3 format ; Negative value in “phase” line of a gff3 file.What does it mean? ; etc… but I still don’t get it. So let’s use an existing GFF3 file: github.com/samtools/bcftools/blob/develop/test/csq/ENST00000580206/short.gff The GFF3 is valid in ‘bcftools csq’ This is…

Continue Reading CDS phase 0,1,2 in GFF format

downloading genomes in fasta format from accession ids

downloading genomes in fasta format from accession ids 3 Hi all, I have a list of accession numbers (GCF/A) and I want to download their complete genomes from NCBI in fasta format. I saw a lot of recommendation to use the NCBI datasets and dataformat tools, is it really the…

Continue Reading downloading genomes in fasta format from accession ids

Using RNAcentral to explore and investigate non-coding RNA sequences

Non-coding RNAs (ncRNAs) are a key molecule for life. They are involved in many complex biological processes, from gene regulation to translation. This complexity has led to an explosion of many different databases with specialised focuses and data. RNAcentral provides users a single entry point into the complexity of ncRNA…

Continue Reading Using RNAcentral to explore and investigate non-coding RNA sequences

How can I transfer gene models to a new assembly?

How can I transfer gene models to a new assembly? 2 Here’s my data: sample_A: Canonical assembly with gene models (sample_A.fasta, sample_A.gff3) sample_B: Mutant and de-novo assembly. No gene models (sample_B.fasta) I want to transfer the gene models from sample_A to sample_B. I thought this would be straightforward but it’s…

Continue Reading How can I transfer gene models to a new assembly?

Converting FASTA/FASTQ file into GFF3/GTF

Converting FASTA/FASTQ file into GFF3/GTF 1 I have tried to convert FASTA/FASTQ file into GFF3/GTF file. Firstly, I converted FASTA/FASTQ file into bam (by samtools) as well as the bed file enter link description here and enter link description here and then converted them into a GFF file. But the…

Continue Reading Converting FASTA/FASTQ file into GFF3/GTF

Mapping RNA-Seq reads onto viral genome

Mapping RNA-Seq reads onto viral genome 0 Hi everyone, I have 6 files of paired-end 75 nt RNA-Seq reads from HEK293 I want to map onto the AAV genome. I got the reference genome as a fasta file and the annotation file as gff3/gtf from NCBI. For mapping onto the…

Continue Reading Mapping RNA-Seq reads onto viral genome

jannovar download problem

jannovar download problem 0 I am trying to convert some HGVS to chrom:pos:ref:alt format. I was thinking to use jannovar. As per the documentation I run: jannovar download -d hg19/refseq which gives me this: Options JannovarDownloadOptions [downloadDir=data, getDataSourceFiles()=[bundle:///default_sources.ini], isReportProgress()=true, getHttpProxy()=null, getHttpsProxy()=null, getFtpProxy()=null, geneIdentifiers=[], outputFile=] Downloading/parsing for data source “hg19/refseq” INFO…

Continue Reading jannovar download problem

reference annotation for the human and mouse genomes in 2023

D942–D949 Nucleic Acids Research, 2023, Vol. 51, Database issue Published online 24 November 2022 doi.org/10.1093/nar/gkac1071 GENCODE: reference annotation for the human and mouse genomes in 2023 Adam Frankish 1,* , Sı́lvia Carbonell-Sala2 , Mark Diekhans 3 , Irwin Jungreis 4,5 , Jane E. Loveland 1 , Jonathan M. Mudge1 ,…

Continue Reading reference annotation for the human and mouse genomes in 2023

Contig order rearranged agat

Contig order rearranged agat 0 Hi, I annotated a genome with prokka and while converting to GTF with agat, I get the following error: => Version of the Bioperl GFF parser selected by AGAT: 3 gff3 reader error level1: No ID attribute found @ for the feature: … 1 warning…

Continue Reading Contig order rearranged agat

gff3ToGenePred

gff3ToGenePred 0 Hello guys, I downloaded gff3ToGenePred and tried to convert the gff3 file in Apricot, but gff3ToGenePred gave an error even though it was loaded. this is the code I am using gff3ToGenePred -genePredExt /Users/uguremre/snpEff/snpEff/stella.gff3 AT_refGene.txt gff3 annovar • 54 views • link updated 2 hours ago by Juke34…

Continue Reading gff3ToGenePred

Different alignment results on mirnaseq data upon using Bowtie vs Bowtie2.

Different alignment results on mirnaseq data upon using Bowtie vs Bowtie2. 0 Hi, I aligned my mirna seq data against hsa.gff3 file using bowtie first. However, upon generating the read count file (using bedtools) and running deseq2 on it, very few mirnas were observed. Also the PCA plot showed too…

Continue Reading Different alignment results on mirnaseq data upon using Bowtie vs Bowtie2.

Extracting exons using GenomicFeatures is different from manual extraction

If I try to extract the length of all exons (also those overlapping) using the GenomicFeatures R package with this code and this gencode file library(GenomicFeatures) txdb <- makeTxDbFromGFF(“tables/gencode.v43.basic.annotation.gtf.gz”, format = “gtf”) exons.list.per.gene <- exonsBy(txdb, by = “gene”) sort(width(exons.list.per.gene)[[“ENSG00000000003.15”]]) the result is [1] 75 84 99 108 135 189 189…

Continue Reading Extracting exons using GenomicFeatures is different from manual extraction

RSEM implementation

RSEM implementation 0 I have the virus genome(fasta) and gff file and I am trying to prepare-reference through the following commands: rsem-prepare-reference –gff3 KT992094.1.gff3 KT992094.1.fasta or rsem-prepare-reference –gff3 KT992094.1.gff \ –gff3-genes-as-transcripts \ –bowtie \ KT992094.1.fasta \ ref/virus But it’s saying: Invalid number of arguments! How can I solve this issue?…

Continue Reading RSEM implementation

A pangenome reference of 36 Chinese populations

Populations and samples For Phase I of the CPC project, we selected 68 samples from 731 individuals with genomes deep-sequenced using next-generation sequencing. Following a previous study5, we applied a procedure to quantitatively evaluate the genetic diversity coverage based on principal component analysis results. We selected individuals using a statistic…

Continue Reading A pangenome reference of 36 Chinese populations

snpEff.config Error

snpEff.config Error 1 Hello guys I have a problem with the snpEff. config. I have an apricot gff3 file I converted it to gtf file and then I have a reference apricot fasta file. to build snpEff.config I used the command below step by step Firstly, I create a new…

Continue Reading snpEff.config Error

gff3 to gtf

gff3 to gtf 1 Hello everyone, I am trying to convert gff3 file to gtf file for that I used the command below:agat_convert_sp_gff2gtf.pl -i input.gff3 -o genes.gtf But it did not work. Can anyone help me with this issue? gtf gff3 • 28 views • link updated 1 hour ago…

Continue Reading gff3 to gtf

Rename GFF3 File

Rename GFF3 File 1 I have a GFF3 file that looks like this: X_Chr1 maker exon 225515 226772 . – . ID=X-6_Chr1v1_00045.1:13;Parent=X_Chr1v1_00045.1 X_Chr1 maker exon 227294 227414 . – . ID=X-6_Chr1v1_00045.1:12;Parent=X_Chr1v1_00045.1 X_Chr1 maker exon 227583 227973 . – . ID=X-6_Chr1v1_00045.1:11;Parent=X_Chr1v1_00045.1 X_Chr1 maker exon 228164 228232 . – . ID=X-6_Chr1v1_00045.1:10;Parent=X_Chr1v1_00045.1 I…

Continue Reading Rename GFF3 File

How to sort gff3 according to chromosome order?

How to sort gff3 according to chromosome order? 1 Hello, Curious to know on how to sort the gff3 file according to its chromosome while keeping its parent (gene) and child features (mRNA, cds and exon) intact: input example: Chr6 EVM gene 212579245 212580018 . + . ID=evm.TU.Chr6.3631;Name=EVM prediction Chr6.3631…

Continue Reading How to sort gff3 according to chromosome order?

featurecounts not working on mirbase annotation file

featurecounts not working on mirbase annotation file 0 Hello I am trying to analyze miRNA-seq data but I am having problems with the mapping. I always get pretty much 0 counts with the built in annotation file, so I got one from miRBase. However, I always get an error when…

Continue Reading featurecounts not working on mirbase annotation file

MacOS Quicklook plugin for gtf and gff3 files?

MacOS Quicklook plugin for gtf and gff3 files? 2 Does anyone know of any MacOS Quicklook plugins that can handle gtf and/or gff3 files? Google searches are turning up nothing. MacOS plugin Quicklook gtf gff3 • 432 views • link updated 24 minutes ago by Ram 39k • written 1…

Continue Reading MacOS Quicklook plugin for gtf and gff3 files?

ChIP-Seq

ChIP-Seq Input Data (Reference Feature)       LiftOver   LiftOver option] body=[We provide on-the fly lift-over of reference data sets between different genome assemblies for broader comparison among annotations.]”> :    Upload custom Data   File Format] body=[All ChIP-seq tools use SGA (Simplified Genome Annotation) files as an internal working format. SGA intput…

Continue Reading ChIP-Seq

Perl debugging help – miRWoods

Hello, I was wondering if anyone with Perl experience could help me debug a miRWoods? I tried reaching out the authors via e-mail with no response, and issues on GitHub are turned off so I’d be super grateful if anyone could provide any insight. When I run miRWoods I get…

Continue Reading Perl debugging help – miRWoods

Extract transcript ID and gene ID from ITAG4.1_gene_models.gff

Extract transcript ID and gene ID from ITAG4.1_gene_models.gff 0 Hello all, I was hoping to extract the transcript ID and corresponding gene ID from ITAG4.1_gene_models.gff (downloaded from solgenomics.net/ftp/genomes/Solanum_lycopersicum/annotation/ITAG4.1_release/) using R. I have tried different methods: First method: List <- tr2g_gff3(file = directory, write_tr2g = FALSE, get_transcriptome = FALSE, save_filtered_gff =…

Continue Reading Extract transcript ID and gene ID from ITAG4.1_gene_models.gff

IGV custom tracks from gff3 files; how to customize feature blocks “shape”?

IGV custom tracks from gff3 files; how to customize feature blocks “shape”? 1 Hi, In IGV, I am using gff3 files to visualize the genomic location of features that I identified through my experiments. The features are visualized as rectangular “boxes” with strand direction shown as arrow heads. I would…

Continue Reading IGV custom tracks from gff3 files; how to customize feature blocks “shape”?

SnpEff Error

SnpEff Error 1 Hello guys, I. run this code : snpEff Prunus_armeniaca_cv_Stella.gff3.gz output.vcf > output.txt I am getting this Error! Could you pls help me with this issue? java.lang.RuntimeException: Property: ‘Prunus_armeniaca_cv_Stella.gff3.gz.genome’ not found at org.snpeff.interval.Genome.<init>(Genome.java:104) at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:784) at org.snpeff.snpEffect.Config.readConfig(Config.java:751) at org.snpeff.snpEffect.Config.init(Config.java:529) at org.snpeff.snpEffect.Config.<init>(Config.java:116) at org.snpeff.SnpEff.loadConfig(SnpEff.java:429) at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:889) at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:875) at…

Continue Reading SnpEff Error

How to extract summary statistics from GFF3 /GTF file?

Hi! You could try using the gffutils Python library as an alternative to the AGAT toolkit for extracting summary statistics from GFF3/GTF files. gffutils is a flexible and efficient library for working with GFF and GTF files in a variety of formats. Here’s an example of how to use gffutils…

Continue Reading How to extract summary statistics from GFF3 /GTF file?

Homer detailed annotation

Homer detailed annotation 1 Dear, I used HOMER annotatePeaks.pl to annotate my peaks. Here is the format for my code: annotatePeaks.pl peak.bed ref.fa -gff3 ref.gff3 > PeakAnno.txt. But, I don’t know why it is “NA” for the columns of “Focus Ratio/Region Size” and Detailed Annotation””? I am more interested in…

Continue Reading Homer detailed annotation

Issue about generating EMBL Flat file for ENA submission

Issue about generating EMBL Flat file for ENA submission 0 Hello all! I am trying to generate an EMBL flat file to submit an annotated assembly to ENA. I am using EMBLmyGFF3 to generate the flat file from the whole genome FASTA file and the GFF3 file. I am getting…

Continue Reading Issue about generating EMBL Flat file for ENA submission

Detection of Burkholderia pseudomallei with CRISPR-Cas12a based on specific sequence tags

1. Introduction Melioidosis is a tropical disease caused by the aerobic, Gram-negative motile bacillus which is classified as a category B biological agent by the Centers for Disease Control and Prevention (CDC) of America (1, 2). It is a highly pathogenic endemic zoonotic disease in many tropical countries, particularly in…

Continue Reading Detection of Burkholderia pseudomallei with CRISPR-Cas12a based on specific sequence tags

snpeff not recognizes Gff3 file

snpeff not recognizes Gff3 file 0 I made database with a different genome version Zea_mays B73v4, I provide an annotation file gff3 of the same version, but when I run the command in the snpEff database, and output is generated. The Genes. text file contains the Gene IDs of the…

Continue Reading snpeff not recognizes Gff3 file

How to convert GTF output of TSEBRA to gff3 file as an input for EVM ?

How to convert GTF output of TSEBRA to gff3 file as an input for EVM ? 0 Hello, Curious if anyone have experience using the TSEBRA GTF output in EMV. The GTF file generated by TSEBRA gives error while converting to GFF3 format to be used as an input for…

Continue Reading How to convert GTF output of TSEBRA to gff3 file as an input for EVM ?

ANNOVAR – Bioinformatics DB

ANNOVAR is a software tool that annotates single nucleotide variants (SNVs) and insertions/deletions. This tool is particularly useful in the field of genetics research, where high-throughput sequencing platforms generate massive amounts of genetic variation data. However, it can be a challenge to pinpoint a small subset of functionally essential variants…

Continue Reading ANNOVAR – Bioinformatics DB

FeatureCounts tool

FeatureCounts tool 0 0 Entering edit mode 2 hours ago Vikram • 0 Can we use a annotation file in GFF3 format in FeatureCounts tool ? file annotation FeatureCounts • 25 views ADD COMMENT • link 2 hours ago by Vikram • 0 Login before adding your answer. Similar Posts…

Continue Reading FeatureCounts tool

Maker Gff3 file issues

Maker Gff3 file issues 1 Hi community, This is really a technical question, I hope it is OK to post it here… I am trying to import the gff3 file from Maker to my Jbrowse to view the annotations. I am using the maker2jbrowse script and getting constant errors. There…

Continue Reading Maker Gff3 file issues

Antismash on Fasta files

Hello, you can provide FASTA files to it ########### antiSMASH 6.1.1 ############# usage: antismash [–taxon {bacteria,fungi}] [–output-dir OUTPUT_DIR] [–output-basename OUTPUT_BASENAME] [–reuse-results PATH] [–limit LIMIT] [–minlength MINLENGTH] [–start START] [–end END] [–databases PATH] [–write-config-file PATH] [–without-fimo] [–executable-paths EXECUTABLE=PATH,EXECUTABLE2=PATH2,…] [–allow-long-headers] [-v] [-d] [–logfile PATH] [–list-plugins] [–check-prereqs] [–limit-to-record RECORD_ID] [-V] [–profiling] [–skip-sanitisation] [–skip-zip-file]…

Continue Reading Antismash on Fasta files

Can’t install Transdecoder –

Can’t install Transdecoder – 0 I was trying to install TransDecoder to do Transcriptome annotation, but when I run make test, this shows up. I’ve tried to install the module but it is not working. Is there any way around it? Can’t locate URI/Escape.pm in @INC (you may need to…

Continue Reading Can’t install Transdecoder –

gff3 – Extracting animo acid and nucleotide sequences from KofamScan output and codon alignment

I want to extract the amino acid sequences from KofamScan output, and my workflow is as attached in the picture: For the analysis I am doing, I need to get the animo acid sequences, align them, and do codon alignment with the corresponding nucleotide sequences, so that I can get…

Continue Reading gff3 – Extracting animo acid and nucleotide sequences from KofamScan output and codon alignment

Convert Abricate output tsv file to gff3 format

Here’s one way using awk, that I think fulfills the requirements. It adds each of the column names (on the first line) to an array to make accessing each of the fields a bit easier. This approach isn’t strictly necessary, but it does make for a more readable solution in…

Continue Reading Convert Abricate output tsv file to gff3 format

Improving conversion of abricate tsv file to gff3 file

Since such a neat solution (abricate tsv to gff3) was provided by Steve, here are few other steps that I am looking to add so that the script progress to logical maturity to be usable by many others. I have two files – (1) fasta file with .fna extension, and…

Continue Reading Improving conversion of abricate tsv file to gff3 file

list of old gene name in C. elegans

Blog:list of old gene name in C. elegans 1 Hi, Gene of C. elegans may have two different names. For instance, WBGene00006993 has two locus name: (1) “zyg-8” and, (2) old/other names “apo-1” (www.wormbase.org/species/c_elegans/gene/WBGene00006993#0-9e-3). I have compiled (1)WB id, (2)locus, and (3)cosmid id of all genes from C. elegans GFF3…

Continue Reading list of old gene name in C. elegans

Converting Abricate output (.tsv) to gff3 format

Converting Abricate output (.tsv) to gff3 format 0 Hello Everyone I have a tsv file generated from abricate (github.com/tseemann/abricate). I need to convert them to gff3 format with certain columns retained, certain columns reordered, while other columns deleted. We are trying to use these gff3 files for downstream applications and…

Continue Reading Converting Abricate output (.tsv) to gff3 format

Converting an output de-novo transcriptome assembled with Trinity to a .gff3 file

Converting an output de-novo transcriptome assembled with Trinity to a .gff3 file 2 Hello! I’ve de-novo assembled a transcriptome from Trinity, resulting into Trinity.fasta, whose headers look like this: >TRINITY_DN29256_c0_g1_i1 len=323 path=[0:0-322] Followed, in the next line, by the sequence. To run an external downstream analysis with a R script,…

Continue Reading Converting an output de-novo transcriptome assembled with Trinity to a .gff3 file

org.biojava.nbio.core.sequence.CDSSequence.getSequenceAsString java code examples | Tabnine

/** * A CDS sequence if negative stranded needs to be reverse complement * to represent the actual coding sequence. When getting a ProteinSequence * from a TranscriptSequence this method is callled for each CDSSequence * {@link www.sequenceontology.org/gff3.shtml} * {@link biowiki.org/~yam/bioe131/GFF.ppt} * @return coding sequence */ public String getCodingSequence() {…

Continue Reading org.biojava.nbio.core.sequence.CDSSequence.getSequenceAsString java code examples | Tabnine

How to perform synteny alignments and plots only with a gene?

How to perform synteny alignments and plots only with a gene? 0 Hi everyone, I’m trying to perform synteny alignments and plots for a gene of interest and its exons. I have two genomes in FASTA format and their corresponding annotations in GFF3 format. Does anyone know some software that…

Continue Reading How to perform synteny alignments and plots only with a gene?

Error while converting GFF file to GTF using AGAT

Error while converting GFF file to GTF using AGAT 0 Hi I am trying to convert a gff file to gtf file which I want to use for STAR. I tried AGAT(latest version) to convet but it gives me a series of error(mailny tow types) .I have attached the error…

Continue Reading Error while converting GFF file to GTF using AGAT

gff3 file format

gff3 file format 1 Can I use the gff3 format file as a reference genome? I added a screenshot photo of how can I find a reference genome in this picture? gff3 genome reference • 61 views • link updated 1 hour ago by GenoMax 126k • written 2 hours…

Continue Reading gff3 file format

In addition to the chado, are there other biological database schemas?

In addition to the chado, are there other biological database schemas? 0 I would like to know, what are the other existing biological database schemes, in addition to the chado? edition: I’m participating in a project, and they asked me to create a database for plants that use ontologies, a…

Continue Reading In addition to the chado, are there other biological database schemas?

gff format to genome annotation

gff format to genome annotation 0 I am mapping RNAseq transcripts against a genome to annotate it. I am looking at Spaln and GMAP, and they both have two types of gff files as output (GFF3 gene format and GFF3 match format), which one is better to proceed with annotation?…

Continue Reading gff format to genome annotation

TRF output to .gff file

TRF output to .gff file 2 Hello, biostars! I’m trying to get .gff file from Tandem Repeat Finder output. Since TRF can’t do that, I’ve found TRAP tool, which can create .gff. But, TRAP creates as many .gff files as the number of contigs (ok, there is ‘cat’ command). The…

Continue Reading TRF output to .gff file

Detection of Streptococcus pyogenes M1UK in Australia and characterization of the mutation driving enhanced expression of superantigen SpeA

Walker, M. J. et al. Disease manifestations and pathogenic mechanisms of Group A Streptococcus. Clin. Microbiol. Rev. 27, 264–301 (2014). Article  PubMed  PubMed Central  Google Scholar  Carapetis, J. R., Steer, A. C., Mulholland, E. K. & Weber, M. The global burden of group A streptococcal diseases. Lancet Infect. Dis. 5,…

Continue Reading Detection of Streptococcus pyogenes M1UK in Australia and characterization of the mutation driving enhanced expression of superantigen SpeA

How to use chado after installation?

How to use chado after installation? 0 Hello, this is the first time I’m having contact with chado and perl, after some problems I managed to install it, however, I don’t know how to continue. GMOD provides documentation for converting gff file to gff3 and other data. However, I am…

Continue Reading How to use chado after installation?

Genome data visualization

Genome data visualization 0 Hi, Please I need help with producing visualization for genomic DNA regions such as seen in these figures I obtained from a publication: The other image also shows the regions of a chromosome by color. I just need information on the right tools (not IGV) that…

Continue Reading Genome data visualization

PROKKA.gff file is not compatible with featureCounts

Hi all, I am trying to count the number of reads that map to each gene using FeatureCounts. (RNA-Seq PE, linux) my input; GFF. file generated using Prokka GTF.file generated by NCBI annotation Sorted.bam files generated by bowtie2 and samtools. When I used gtf.file generated by NCBI, featurecounts run without…

Continue Reading PROKKA.gff file is not compatible with featureCounts

Sort gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS)

Sort gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS) 1 Is it possible to sort a gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS). The order of the featuretypes is important when converting a gff file to a gtf file with gffread. If the…

Continue Reading Sort gff3 on chromosome, position and then featuretype (gene, mRNA, exon, CDS)

Converting GFF3 and FASTA files to GenBank format – Job in Data Science And Analytics

Find more Data Mining And Management Remote Jobs posted recently Worldwide Posted at – Feb 6, 2023 Toogit Instant Connect Enabled I have GFF3 files (annotation) for my bacterial genomes. I want a script that can be used to convert this GFF3 and its fasta file into Genbank file. Thanks….

Continue Reading Converting GFF3 and FASTA files to GenBank format – Job in Data Science And Analytics

mVISTA annotation

mVISTA annotation 0 Hello Biostars, I have been trying to use mVISTA for the comparing the chloroplast DNA. For this purpose I used the NCBI References as input and downloaded the annotation in GFF3 format for Arabidopsis thaliana (MZ323108) as a Reference Sequence. However, the result does not show the…

Continue Reading mVISTA annotation

Seqlengths of x contains NA values!

Hello, I would like to use ORFik to determine the coverage of the different ORFs across the maize genome. I have ribo-seq data, the latest annotation file (a GFF3), and the v5 genome fasta file for B73. After running my code, three Large CompressedGRangesLists are created and none of them…

Continue Reading Seqlengths of x contains NA values!

gff file from NCBI RefSeq GCF dataset has an invalid format

Thank you for noticing this. It is indeed an issue in the GFF3 file. The root of the problem is it’s a gene that is impossible to correctly represent in GFF3 because it incorporates sequence from both strands via trans_splicing. The complexity of this gene can be seen on the…

Continue Reading gff file from NCBI RefSeq GCF dataset has an invalid format

Retrieve specific fasta sequences from a group of assemblies

Retrieve specific fasta sequences from a group of assemblies 0 Hi all, Sorry if this question has been addressed before but I haven’t been able to find a solution to this. I have a lot of assemblies (around 800) and I would like to retrieve the fasta sequence for a…

Continue Reading Retrieve specific fasta sequences from a group of assemblies

error making Txdb from GTF and fasta files

Hello, I would like to use ORFik to map Ribo-reads to different ORFs in the maize genome. The latest version of the genome is Zm-B73-REFERENCE-NAM-5.0.fa. The annotation file is a GFF3. I have the genome fasta file, the fasta fai file, and the GFF3 file. The ORFik package uses GTF…

Continue Reading error making Txdb from GTF and fasta files

How to convert VCF (with possible predicted gene effects) to protein fasta/MSA

How to convert VCF (with possible predicted gene effects) to protein fasta/MSA 1 How to convert VCF (with possible predicted gene effects) and multiple samples to protein fasta/MSA Input: VCF (possibly with already gene/protein effects predicted via e.g. SnpEff) GFF3 (for the reference protein sequence and maybe to predict effects)…

Continue Reading How to convert VCF (with possible predicted gene effects) to protein fasta/MSA

genbank sequence format

HHS Vulnerability Disclosure, Help This document is an overview of the Entrez databases, with general information on If you are not sure that the “Save” option in your program will do this for you, use “Save As”, In Excel, select “Save As” from the File menu. optimizations to reduce memory…

Continue Reading genbank sequence format

can gff2 reference used in htseq-count?

Dear all We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples. The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly. However, we were unable to obtain a gff3 version of our target plasmid genome, the…

Continue Reading can gff2 reference used in htseq-count?

Use RSEM and Bowtie2 to align paired-end sequences

Use RSEM and Bowtie2 to align paired-end sequences 0 I want to use rsem-calculate-expression and bowtie2 aligner to align paired-end sequence based on the following conditions: 2 processors generate BAM file very fast bowtie2 sensitivity append gene/transcript name My code: rsem-refseq-extract-primary-assembly GCF_000001405.31_GRCh38.p5_genomic.fna GCF_000001405.31_GRCh38.p5_genomic.primary_assembly.fna rsem-prepare-reference –gff3 GCF_000001405.31_GRCh38.p5_genomic.gff –bowtie2 –bowtie2-path /bowtie2-2.4.5-py39hd2f7db1_2 –trusted-sources…

Continue Reading Use RSEM and Bowtie2 to align paired-end sequences

Htseq is giving me 0 counts using the GFF3 of miRBase

Hello! I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0…

Continue Reading Htseq is giving me 0 counts using the GFF3 of miRBase

genbank to GTF in galaxy

genbank to GTF in galaxy 0 Hi all, I am working on galaxy and have a genome file in genbank format. To use featurecounts for my RNAseq, I need to convert the genbank format to a GTF format because that’s the format the featurecounts tool in galaxy expects. Now, I…

Continue Reading genbank to GTF in galaxy

What is RNAcentral? | RNAcentral

RNAcentral is a database of non-coding RNA sequences that aggregates ncRNA data from over 40 member resources known as Expert Databases.1 Non-coding RNAs Similar to mRNAs, non-coding RNAs (ncRNAs) are transcribed from DNA but are not translated into proteins. NcRNAs are found in all organisms and have a broad range…

Continue Reading What is RNAcentral? | RNAcentral

computeMatrix in deeptool is Running with no result

computeMatrix in deeptool is Running with no result 0 Hi All, I wonder if someone can help me in explaining what to input on the -R <bed file> argument of the code below? computeMatrix scale-regions -S <bigwig file(s)> -R <bed file> -b 1000 what I did for example, I download…

Continue Reading computeMatrix in deeptool is Running with no result

Indexing with STAR

Indexing with STAR 0 Hello, I am working with RNA seq data and creating an index of reference genome Gossypium hirsutum by using STAR. STAR asks GTF annotation format while my file is GFF3. According to literature, in order to run GFF file I need to remove –sjdbOverhang 50 and…

Continue Reading Indexing with STAR

Senior Bioinformatics Scientist II/ Staff Bioinformatics Scientist

Inscripta was founded in 2015 and recently launched the world’s first benchtop Digital Genome Engineering platform. The company is growing aggressively, investing in its leadership, team, and technology with a recent $150mm financing round led by Fidelity and TRowe price. The company’s advanced CRISPR-based platform, consisting of an instrument, reagents,…

Continue Reading Senior Bioinformatics Scientist II/ Staff Bioinformatics Scientist

Convertion Of Gff3 To Gtf

Convertion Of Gff3 To Gtf 3 How do I convert GFF file to a GTF file? Is there any tool available? gtf gff • 79k views The easiest way is to use the gffread program that comes with the Cufflinks software suite (Tuxedo) gffread my.gff3 -T -o my.gtf See gffread…

Continue Reading Convertion Of Gff3 To Gtf

Adding repeats in a genome fasta at a particular location without messing up the annotations?

Adding repeats in a genome fasta at a particular location without messing up the annotations? 0 I want to add a bunch of expanded repeats in a genome fasta file, for eg. 100 ATTs at a particular location eg Chr1-1:2. How do I that and at the same time update…

Continue Reading Adding repeats in a genome fasta at a particular location without messing up the annotations?

biopython – Updating the GFF3 + Fasta to GeneBank code

I’m trying to convert gff3 and fasta into a gbk file for usage in Mauve. I’ve found a solution but the code is outdated: “””Convert a GFF and associated FASTA file into GenBank format. Usage: gff_to_genbank.py <GFF annotation file> <FASTA sequence file> “”” import sys import os from Bio import…

Continue Reading biopython – Updating the GFF3 + Fasta to GeneBank code

Change separator just between specific columns

I am trying to change the separator just between columns 1 and 9. After that, I would like to maintain the original separator. Those are first lines of my file both when directly reading it and when od -c file is executed: #description: evidence-based annotation of the human genome (GRCh38),…

Continue Reading Change separator just between specific columns

How to assess structural variation in your genome, and identify jumping transposons

Prerequisites Data An annotated genome Long reads Repeat annotation Software minimap2 samtools bedtools – for comparisons only tabix – for visualization only Installation 1 2 3 /work/gif/remkv6/USDA/04_TEJumper conda create -n svim_env –channel bioconda svim source activate svim_env Map your long reads to your genome with minimap My directory locale 1…

Continue Reading How to assess structural variation in your genome, and identify jumping transposons

EXOM-seq counting

EXOM-seq counting 0 Hi everyone, Does anyone know where to download the human Annotating Genomes with GFF3 or GTF files. I want to apply featureCounts to quantify read counts in the bam file in the command line. featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt mapping_results_SE.bam Best, AD expression…

Continue Reading EXOM-seq counting