Categories
Tag: GFF
Ribo-Seq Samtools pileup
Ribo-Seq Samtools pileup 0 I have the results of samtools mpileup. ref|NC_001133| 32065 A 17 …..^”.^”.^”.^”.^!.^”.^”.^”.^”.^”.^”.^”. ref|NC_001133| 32066 G 18 ……………… C@1C.1CCCCCCCCCCCC ref|NC_001133| 32067 A 22 …………………. CC98?C91?CCC;;CCCC;C;C ref|NC_001133| 32068 T 21 ………………… CCCCCCCCCCCCCCCCCCCCC What I want to do is use the gene start and stop position in the reference…
The landscape of genomic structural variation in Indigenous Australians
Cohorts Saliva and/or blood samples were collected from consenting individuals among four NCIG-partnered communities: Tiwi Islands (comprising the Wurrumiyanga, Pirlangimpi and Millikapiti communities), Galiwin’ku, Titjikala and Yarrabah, between 2015 and 2019. Non-Indigenous comparison data, generated from unrelated Australian individuals of European ancestry, was drawn from two existing biomedical research cohorts:…
Extract fasta sequence from gff3 file
Extract fasta sequence from gff3 file 2 Hi everyone, I have a lot of .gff3 files with the CDS features and below with the fasta sequence. This sequence is separated from the CDS features like this: ##FASTA >NZ_NZ_LR130533.1 I would like to extract all the fasta sequence into new fasta…
overlapping duplicate dispersed_repeat feature in stringtie
GFF Error: overlapping duplicate dispersed_repeat feature in stringtie 0 Hi. I got following error when I use stringtie. with repeatmasker annotation gff file and RNA-seq bam files which is already sorted with samtools. GFF Error: overlapping duplicate dispersed_repeat feature (ID=461) GFF Error: overlapping duplicate dispersed_repeat feature (ID=712) GFF Error: overlapping…
how to run the compare_genomes for comparative analysis
how to run the compare_genomes for comparative analysis 0 Hi, I am interested to compare the 10 genomes for comparative analysis. I have genome, cds, protein and gff files for this analysis. I want to ask if someone has the experience of running the compare_genomes tool for this. I have…
Convert NCBI Downloaded files to ANNOVAR format
Convert NCBI Downloaded files to ANNOVAR format 0 I have been trying to understand from the ANNOVAR documentation and other sites the steps needed to make these files from NCBI available to ANNOVAR. I admit to being new to bioinformatics, but have been a software developer for 30+ years. My…
Where can I get a list of SNPs mapping overlapping genes in humans?
Given files genes.bed and snps.bed, you could do something like: $ bedmap –echo –echo-map-id –delim ‘\t’ genes.bed snps.bed > answer.bed The file answer.bed will contain the gene annotation and a semi-colon delimited list of SNP identifiers that overlap each gene. In order to get genes.bed, you could use Gencode v44…
Annotation Visualization IGV
Annotation Visualization IGV 0 I was wondering if anyone had any insight into visualizing a GFF in IGV. I have some reads aligned to a reference sequence to visualize coverage and would like to include the annotations in that visualization as well for ease of presentation. I cant seem to…
How To Install bedtools on Debian 11
In this tutorial we learn how to install bedtools on Debian 11. bedtools is suite of utilities for comparing genomic features Introduction In this tutorial we learn how to install bedtools on Debian 11. What is bedtools bedtools is: The BEDTools utilities allow one to address common genomics tasks such…
AssemblyMAFFromAnchorWavePlugin IndexOutOfBoundsException
AssemblyMAFFromAnchorWavePlugin IndexOutOfBoundsException 0 Hello, I’m attempting to create a test database with a smaller genome just to confirm I can get the pipeline running. I’m using PHG 1.8 with singularity. I was able to successfully run MakeDefaultDirectoryPlugin, CreateValidIntervalsFilePlugin, and MakeInitialPHGDBPipelinePlugin. Running AssemblyMAFFromAnchorWavePlugin yields an “IndexOutOfBoundsException” error, but the cause is…
Annotation GTF/GFF Arabidopsis thaliana
Annotation GTF/GFF Arabidopsis thaliana 0 Hello, this is my first time working with Arabidopsis and I am quantifying with featureCounts as follows: featureCounts -p –countReadPairs -t exon -g gene_id -a ../genome_arabidopsis/Arabidopsis_thaliana.TAIR10.57.gtf -o SRR14059988.txt ../alignment_hisat2/SRR14059988_sorted.bam However, in my counts I am having counts associated with long non conding, ribosomals, mitochondrial and…
How to resolve the error of protein lacking a stop codon when using GenomeThreader for homology prediction?
How to resolve the error of protein lacking a stop codon when using GenomeThreader for homology prediction? 0 Dear all,the error message and running process are as follows. Thank you for your answers. makeblastdb -in pudorinus.fa -parse_seqids -dbtype nucl -out index/pu& nohup tblastn -query all.pep.fa -out pu.blast -db index/pu -outfmt…
Fastest way to convert BED to GTF/GFF with gene_ids?
This is probably a duplicated question from: How To Convert Bed Format To Gtf? How to convert original BED file to a GTF ? Converting different annotation file formats (GTF/GFF/BED) to each other How to change scaffold.fasta file or scaffold.bed file to GTF file? Convert bed12 to GFF convert bed12…
Which program, tool, or strategy do you use to visualize genomic rearrangements?
Which program, tool, or strategy do you use to visualize genomic rearrangements? 5 Which program, tool, or strategy do you use to visualize genomic rearrangements? In relation to my master thesis I’m working on tools to visualize fusion genes. In that regard I’m interested in any and all strategies and…
Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m5C at GCU motifs
Introduction Oxford Nanopore Technologies (ONT) direct RNA sequencing (Fig 1A) enables detection of RNA modifications. A modified base produces an altered electrical current and/or dwell time relative to a canonical base that can be detected with algorithms (Garalde et al, 2018; Smith et al, 2019; Workman et al, 2019). Figure…
Adding functional annotation and meta-data of MAKER/BRAKER GFF
Adding functional annotation and meta-data of MAKER/BRAKER GFF 0 Happy new year, 2023. I am doing genome assembly and annotation. I have a problem doing the functional annotation part of it since I am new and still learning this. I have done the following: Assembled the genome (using Verkko and…
HTseq reports missing attribute name
HTseq reports missing attribute name 1 Hello, I am running this htseq command htseq-count -r pos -t gene -i gene -s yes -f bam \ /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam \ /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3 > \ /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt However I get this error: Error processing GFF file (line 75 of file /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3): Feature gene:ENSTGUG00000013637 does not contain…
Best practices for unstranded sequences in featureCounts
Hi everyone, I’m using featureCounts to analyze some RNA-Seq data, but I have several doubts in the use with unstranded library. First, when I analyze some SRA sequences or when I don’t know the library type, I use Salmon to know it with the next command: salmon quant -p 32…
Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA
Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA 0 Dear Biostar Community I’m currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements. For this, I’ve downloaded the variants file (VCF) and the…
Issues while running htseq-count
Issues while running htseq-count 0 My data is Candida glabrata and when i use htseq-count, no read is mapped to the gene_id. Thank you for your time and help. Foad htseq-count GSNO_SRR1582646.sam Candida_glabrata_genome.gtf > GSNO_SRR1582646.count 10975 GFF lines processed. 8843 alignment record pairs processed. head GSNO_SRR1582646.count gene-CAGL0A00165g 0 gene-CAGL0A00187g 0…
Error with HTseq RNAseq read count – rna-seq
Hi, I am getting error while running HTseq. This is the command and the error: htseq-count -q -f bam -s yes Ac1_mapped/ac1_mappedAligned.bam /global/home/users/catalinacastro/star/genome/genomic_v2.gtf count.txt Error occurred when processing GFF file (line 637338 of file /global/home/users/catalinacastro/star/genome/genomic_v2.gtf): not enough values to unpack (expected 9, got 1) [Exception type: ValueError, raised in init.py:221]…
Error with HTseq RNAseq read count
Error with HTseq RNAseq read count 0 Hi, I am getting error while running HTseq. This is the command and the error: htseq-count -q -f bam -s yes Ac1_mapped/ac1_mappedAligned.bam /global/home/users/catalinacastro/star/genome/genomic_v2.gtf count.txt Error occurred when processing GFF file (line 637338 of file /global/home/users/catalinacastro/star/genome/genomic_v2.gtf): not enough values to unpack (expected 9, got…
Should unique gene names/transcript IDs be used for ribosomal gene copies in a GTF/GFF file?
Should unique gene names/transcript IDs be used for ribosomal gene copies in a GTF/GFF file? 0 Hi, I have a GTF/GFF transcriptome that includes ribosomal sequences annotated from barrnap. I end up with ribosomal sequences that are present with the same gene IDs / transcript IDs at different sites and…
Htseq-count reads with missing mate encountered
Htseq-count reads with missing mate encountered 0 Hello. I ran this HTseq command htseq-count -r name -t gene -i gene -s yes -f bam /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/ncbi_dataset/data/GCF_003957565.2/genomic.gff > /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt and got the error Warning: 72583723 reads with missing mate encountered. 80015507 alignment record pairs processed. Is there a setting I am…
Htseq Count
Hello, I have ran htseq-count numerous times and continue to get the same error. That NONE of my genes are counted as seen here. ZXDC 0 ZYG11B 0 ZYX 0 ZZEF1 0 ZZZ3 0 __no_feature 70257177 __ambiguous 0 __too_low_aQual 1509790 __not_aligned 3970775 __alignment_not_unique 4277765 However, I have a very high…
Filter a BED file based on genome coordinates for gene names
Filter a BED file based on genome coordinates for gene names 0 Hi, I have BED file with certain regions of interests that looks like this: chr1 0 91923 chr1 323234 4596845 … with the start and end coordinates for each gene for the respective chromosome. But I want to…
Bioconductor – rtracklayer
DOI: 10.18129/B9.bioc.rtracklayer R interface to genome annotation files and the UCSC genome browser Bioconductor version: Release (3.6) Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may…
An extremely fast Non-Overlapping Exon Length calculator written in Rust
Hi all! Introducing the Non-Overlapping Exon Length calculator (NOEL), an extremely fast GTF/GFF per gene exon length extractor written in Rust. See the code and latest updates here: github/alejandrogzi/noel In case you do not want to read the whole text: NOEL outperforms all open-sourced scripts/tools for this task. It can…
papain family cysteine protease containing protein, maker-scaffold1702_size30647-snap-gene-0.14 (gene) Tigriopus kingsejongensis
Associated RNAi Experiments Homology BLAST of papain family cysteine protease containing protein vs. L. salmonis genes Match: EMLSAG00000006045 (supercontig:LSalAtl2s:LSalAtl2s327:400616:404607:-1 gene:EMLSAG00000006045 transcript:EMLSAT00000006045 description:”augustus_masked-LSalAtl2s327-processed-gene-4.0″) HSP 1 Score: 590.497 bits (1521), Expect = 0.000e+0Identity = 283/525 (53.90%), Postives = 368/525 (70.10%), Query Frame = 0 Query: 49 GHVARPLGKSPPNFVRDPPPRTTPPAQWLWNNVNETNFLTVSRNQHLPTYCGSCWAHAATSSLSDRIKIARQGAWPDINLAPQVLISCGPGDGCHGGEAGDANAYMHAQGITDETCSIYRARGQDNGLPCSKLEICSTCE—SKCYQPQHFFTYRVDEFHDVEGESNGEQEANMMAEIHHRGPISCGIAVTQALV-NYTGGLFHDKTGAQEIDHDISVVGYGVDEGTQEKYWLIRNSWGTYWGEQGFFRLIRGVNNLGIESGTCSWATPADTWSDAARE—RAAILSNEITLQKP——LWKQLWTVVADFVDNTRDTDLFRRLKLMQKGCKKLSSPRVPVVNIRPRPQDYVSTADLPEALDWRSVNGTNFLSWSVNQHLPVYCGSCWAQAGLSSLADRFTIADRKRFANLALSVQYILNCQAGGSCHGGDAFPLYAFIQKQGVPDVTCQPYEALDEGPLTDCSKPSKLVCKDCTWPPPEPGQEGNCWAKEKFHRYYVDEYNGVEGADNMKKEILERGPVT 560 GH+ R G+…
Removing all genome annotations from a list of sequences in Geneious
Removing all genome annotations from a list of sequences in Geneious 0 I have a folder in Geneious which contains over a hundred read mapping files, showing rna-seq reads mapping to genomic scaffolds. Each one of these files is also annotated with multiple gff files, which I will use for…
Map genome positions onto protein coordinates?
I am looking for a way to do the following 1) reliably find a protein structure e.g. pdb file or pre-computed alphafold results that is associated with a particular gene/transcript isoform. I found a way to do this somewhat for human genes using biomart, but i’d like to be able…
How to make a proteome file
How to make a proteome file 0 I have the fasta files and genome annotation (gff) files for a number of species, and I am now trying to create proteome files for these species. I have tried extracting and translating only CDS sequences, and also only protein-coding gene sequences, however…
How do I write a correctly formatted gff3 file in R?
Dear all, I am trying to annotate non-coding RNA in a small RNA-seq dataset. The RNACentral gff3 file that I am using has different chromosome identifiers than the genome assembly. I have loaded the gff3 file in R where I changed the chromosome identifiers using the the assembly report and…
Proper HTSeq usage on bacterial genome. Don’t quite understand –t
Proper HTSeq usage on bacterial genome. Don’t quite understand –t 1 Hi everyone, I’m trying to run HTSeq on a group of BAM files generated from the alignment of an RNAseq illumina reads mapped to a reference genome. The reference genome is the sequence with highest quality available and was…
BBTools 39.03 released!
Hi Everyone, I just released a new version of BBTools (39.03). There are some exciting new features like neural networks. But let me list them: 1) New program called “bbcrisprfinder.sh”. It finds CRISPRs… designed for short reads in metagenomes (but it’s also OK on full genomes). It works better if…
BBTools 39.03 releaed!
Hi Everyone, I just released a new version of BBTools (39.03). There are some exciting new features like neural networks. But let me list them: 1) New program called “bbcrisprfinder.sh”. It finds CRISPRs… designed for short reads in metagenomes (but it’s also OK on full genomes). It works better if…
VG autoindex with pangenome constructed using minigraph-cactus
Dear developers, I am trying to construct a reference pangenome of a fungi species. After successfully constructing my pangenome using minigraph-cactus, I am struggling to add my isolates’ annotations. For some background: We have de novo assembled and annotated 11 isolates and used the current reference (which has a chromosomal…
ROSE Algorithm: index out of range
Hi again, I am trying to run the ROSE algorithm created by the young lab, url here: younglab.wi.mit.edu/super_enhancer_code.html Specifically, I am running the ROSE_main.py script: younglab.wi.mit.edu/super_enhancer_code.html I created a python 2.7 environment to run the script as it is compatible with python 2.7. When I run the script in ubuntu:…
Converting GFF to GTF
Converting GFF to GTF 0 Hello, I am having trouble transferring my gff file to a gtf. I have tried using gffread, gffcompare, and rtracklayer, which all have left me with the same or no output. Here are my files. Please help! gff gtf • 38 views • link updated…
Errors running genome polishing with Arrow
Errors running genome polishing with Arrow 0 Dear Biostars community, I am performing a genome polishing using Arrow, but I am getting errors immediately I launch the script. I have tried to use Arrow though gcpp version 2.0.2-2.0.2 (installed via bioconda), and using variantCaller version 2.3.3 Using gcpp, I got…
How Do I Convert From Bed Format To Gff Format?
How Do I Convert From Bed Format To Gff Format? 4 I have a file in GFF format and I need to convert it to BED format. What do I do? bed gff galaxy • 29k views Both formats are tab delimited text files used to represent DNA features in…
Converting from BED to SAF/GFF
I believe that SAF format use 1-based coordinates that are closed on both ends. Here is how I got this conclusion. First make some toy data. $ cat genome.fa >chr1 AATTCCGGAAAATTTTCCCCGGGGAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCC $ cat reads.fa >q1 AAAATTTTCCCCGGGGAAAAAAAAAAAAAAAAAACC Map reads to the genome: $ STAR –runMode genomeGenerate –genomeDir test_star –genomeFastaFiles genome.fa –genomeSAindexNbases…
HG38 / GRCh38 Human Y Chromosome hg38/GRCh38: 101 bp from chrY:14,000,000..14,000,100
International Society of Genetic GenealogyISOGG YBrowseHuman Y Chromosome Pangenome Browser ISOGGResourcesYSNP-TreeSpeakers ListMeetings/EventsYbrowse Raw Data File Bookmark this Share these tracks Export as… …low-res PNG image …editable SVG image …GFF annotation table …FASTA sequence file Get chrom sizes Reset to defaults Help Help with this browser About GBrowse… About this database……
The blackcap (Sylvia atricapilla) genome reveals a recent accumulation of LTR retrotransposons
The genome assembly was performed with the pipeline v1.5 of the Vertebrate Genomes Project (VGP) and can be found under NCBI BioProject PRJNA558064, accession number GCA_009819655.1, for further details on the sample collection and assembly see Ishigohoka et al.9. In brief, a female blackcap from mainland Spain was caught to…
TPM from STAR output without re-allign the file using RSEM or Salmon
Hi, I want to get the TPM files from aligned files generate with STAR and reading I found out that the easiest way is using RSEM or Salmon. My code for the alignment is /Users/c/STAR/bin/MacOSX_x86_64/STAR runThreadN 4 –genomeDir /Users/c/Desktop/Human_genome_index –readFilesIn /Users/c/Desktop/test/C1D20_R1_001_paired.fastq /Users/c/Desktop/test/C1D20_R2_001_paired.fastq –quantMode TranscriptomeSAM GeneCounts –outFileNamePrefix C1D20 –outSAMtype BAM SortedByCoordinate…
Dataset’s name in BioMart for S. pombe
Dataset’s name in BioMart for S. pombe 2 Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart to create TranscriptDB? cheers, S.pombe BioMart dataset • 3.6k views Looks like you figured out another way of getting what…
Bacterial Pangenome Analysis
Hello, Current Scenario: I am currently engrossed in the analysis of gram-positive pathogenic bacterial genomes, having meticulously selected approximately 300 genomes for my research. To annotate these genomes, I employed Prokka, which yielded intriguing results, indicating a gene count ranging from 6,000 to 7,000 genes per genome. Subsequently, I harnessed…
Debian — Details of package python3-htseq in bookworm
Python3 high-throughput genome sequencing read analysis utilities HTSeq can be used to performing a number of common analysis tasks when working with high-throughput genome sequencing reads: * Getting statistical summaries about the base-call quality scores to study the data quality. * Calculating a coverage vector and exporting it for visualization…
how to identify uniq genes between two gff files.
how to identify uniq genes between two gff files. 2 I have two GFF files of the same species obtained from different annotation methods, and I want to identify unique genes by comparing both GFF files. Thank you Genes Uniq Annotation GFF • 93 views • link updated 17 minutes…
Finding sequences in unannotated genomes using reference coordinates
Finding sequences in unannotated genomes using reference coordinates 0 Hey Stars! I have a really confounding issue at hand. I am working on extracting upstream regions of genes from 100 different genomes of A. thaliana. The problem being, I have one reference genome for TAIR10 version (which has an annotated…
How to check RNAseq support for annotated genes?
How to check RNAseq support for annotated genes? 2 Hello All, I have a set of annotated genes in gff3 format and corresponding RNA-seq data. What is the recommended approach and are there specific tools and parameters to determine the percentage of genes supported by the RNA-seq data?” Regards, B…
How to download genomes and proteins from JGI in bulk via the command line?
How to download genomes and proteins from JGI in bulk via the command line? 0 I’m trying to figure out how to download from JGI in bulk. In particular, I’d like to grab the genome and MycoCosm (mycocosm.jgi.doe.gov/). I honestly have no idea where to start. I see that there…
gffread outputs empty gtf file
gffread outputs empty gtf file 1 Hi, I’ve been trying to convert my prokka output in gff format to gtf format to be able to use for my hisat-stringtie analysis. However, using gffreads to convert yields an empty gtf file. Im not sure if im going wrong somewhere. Any help…
Assistance with Fungal Genome Annotation Using Maker and BLAST
Hello everyone, I’m a new user of Maker and I’m seeking assistance with the protocol I’m using. Currently, I’m working on annotating the genome of a non-model ascomycete fungal species belonging to the Sporocadaceae family. After running the analysis with Maker, I obtained FASTA and GFF files using fasta_merge and…
there are extra regions when calculating Tajima’s D per gene
Hello all, I am new to PopGenome and would like to ask one question that greatly confused me. I was trying to calculate Tajima’s D by gene for my whole genome data. I imported the gff files and subsited the data by “gene”. See my codes below. However, when I…
convert bed12 to sorted gtf
convert bed12 to sorted gtf 1 Hello I m trying to convert bed12 to sorted gtf but output file ‘Precapture_uniq.gff’ is empty i m very new for this work if you can help me to solve this i appreciate it. awk -f bed12togff Postcapture_uniq_chr.bed12 | sort -k1,1 -k4,4n -k5,5n “$@”…
Is there a tool that sorts gtf files?
gff3sort.pl seems to make sure lines having no “Parent=” attribute comes before those having it, if chrom and start position are the same. I think with unix standard program it should go like this: $ (grep -v “Parent=” sortme.gtf;grep “Parent=” sortme.gtf)| sort -k1,1 -k4,4n -s EDIT: Should’nt we have to…
htseq-count reports count values for deleted genes
htseq-count reports count values for deleted genes 0 I am using htseq-count on BAM files from a bacterial species. We are comparing WT strains as well as two strains with genes knocked out. The knockouts have been verified with whole genome sequencing, but do retain the first 15 and last…
Searching a tool to modify annotation files.
Searching a tool to modify annotation files. 0 In two different projects I need to modify annotation files. For instance I need to split a gene into two independent ones following evidence that they are separate transcriptional units. I also need to create a new alternative isoform of a gene…
Genome-wide DNA methylation patterns in bumble bee (Bombus vosnesenskii) populations from spatial-environmental range extremes
Orr, H. A. The genetic theory of adaptation: A brief history. Nat. Rev. Genet. 6, 119–127 (2005). Article CAS PubMed Google Scholar Dillon, M. E. & Lozier, J. D. Adaptation to the abiotic environment in insects: the influence of variability on ecophysiology and evolutionary genomics. Curr. Opin. Insect Sci. 36,…
Top 25 Bioconductor Interview Questions and Answers
Bioconductor is an open-source software project that provides tools for the analysis and comprehension of high-throughput genomic data. It’s a powerful tool, widely used in bioinformatics and computational biology to process and analyze intricate biological data. Bioconductor’s strength lies in its vast array of packages specifically tailored for genomics research,…
Convert bed12 to GFF
Convert bed12 to GFF 2 There are several posts online about converting a GFF/GTF to BED12 but is there any way to go the other direction and convert BED12 to GFF? bed gff bed12 • 5.0k views Login before adding your answer. Traffic: 1618 users visited in the last hour…
How to set weight for merge legacy annotations
MAKER – How to set weight for merge legacy annotations 0 Good day. I am new to genome annotation.I am running maker to merge evidence from est, homolog, augustus and braker.The following is maker_opts.ctl: genome=genome.fasta est_gff=transcript.gff3 protein=homolog.gffs pred_gff=Augustus.gff3, Braker.gff3 I have executed the “mpiexec -n 30 maker > maker.out ”…
how to use RNAseq data to assist annotation?
MAKER: how to use RNAseq data to assist annotation? 0 Hello, I am performing a MAKER annotation of a de novo plant genome. I have RNA sequencing reads (Illumina paired-end 150bp) to include in the annotation. However, I am confused about the inputs MAKER allows in the maker_opts.ctl file. I…
Differential Expression using Isoseq-supplemented reference transcriptome
Differential Expression using Isoseq-supplemented reference transcriptome 1 Hi all, I have a dataset of Illumina short read RNA-Seq data from (n = 6 per group) three different mouse genotypes, and paired PacBio Isoseq data from a subset of these (n = 2 per group). I have processed the IsoSeq data…
Confusion about transcript ablation
I’m analyzing the WES data of a patient, after calling variants by GATK, I use Ensembl Variant Effect Predictor (VEP) to annotate my vcf file. Here is one record from the output file: #Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra chr11_64341844_GTTGTGGTCTGAGGTCTTGGGCCATCAGTGATGTCACAACCAGATGGCCCAAGACCCCAGACCACAACCCCATGTCTGGT/- chr11:64341844-64341923- ENSG00000278359…
CDS phase 0,1,2 in GFF format
The question was asked before in Calculate CDS phase in gff3 format ; Negative value in “phase” line of a gff3 file.What does it mean? ; etc… but I still don’t get it. So let’s use an existing GFF3 file: github.com/samtools/bcftools/blob/develop/test/csq/ENST00000580206/short.gff The GFF3 is valid in ‘bcftools csq’ This is…
build databses for genome using snpEff
build databses for genome using snpEff 1 hello, please i got this error when i try building a database for date plam genome Total: 363391 markers added. Create exons from CDS (if needed): …………………………………………………………+………………………………………………………………………………………………………. Exons created for 138 transcripts. Deleting redundant exons (if needed): Total transcripts with deleted exons: 0…
How are duplicated genes named under GTF file?
How are duplicated genes named under GTF file? 0 Hi, Some genes in the genome are known to be duplicated, hence there are multiple copies of the same protein-coding sequence but at different loci. My question is, how are these duplicated genes named under GENCODE or Refseq annotation (gtf or…
Evolutionary genomics of camouflage innovation in the orchid mantis
Sample collection Captive breeding individuals of H. coronatus (Mantodea, Hymenopodidae) hatched from the same ootheca that was collected from the Xishuangbanna rainforest, Yunnan Province, China in 2018. Individuals of D. lobata (Mantodea, Deroplatyidae) were collected from a captive breeding center in Beijing, China in 2018. All individuals were housed in semitransparent…
Converting FASTA/FASTQ file into GFF3/GTF
Converting FASTA/FASTQ file into GFF3/GTF 1 I have tried to convert FASTA/FASTQ file into GFF3/GTF file. Firstly, I converted FASTA/FASTQ file into bam (by samtools) as well as the bed file enter link description here and enter link description here and then converted them into a GFF file. But the…
A genome visualization python package for comparative genomics
Tool:pyGenomeViz – A genome visualization python package for comparative genomics 0 pyGenomeViz is a genome visualization python package for comparative genomics implemented based on matplotlib. This package is developed for the purpose of easily and beautifully plotting genomic features and sequence similarity comparison links between multiple genomes. It supports genome…
Which refseq_protein db to choose for zingiberaceae
Which refseq_protein db to choose for zingiberaceae 0 Hello everyone, I am trying to blast a .gff file from which I extracted all protein sequences to a protein database from NCBI. The crop I have data on is from the zingiberaceae and I was wondering which database from this link…
Segmentation fault (core dumped) while running Augustus gene finder on Ubuntu
Segmentation fault (core dumped) while running Augustus gene finder on Ubuntu 1 Hi am Latha,am running Augustus using plant genome contigs. Though memory and space on my computer are sufficient for running this tool, am getting the below error:- kulaganathan@kulab:~/busco-master/augustus.2.5.5$ augustus –strand=both –species=rice –maxDNAPieceSize=20000000 sandalnew.fa >output1.gff replaced tx with 0…
jannovar download problem
jannovar download problem 0 I am trying to convert some HGVS to chrom:pos:ref:alt format. I was thinking to use jannovar. As per the documentation I run: jannovar download -d hg19/refseq which gives me this: Options JannovarDownloadOptions [downloadDir=data, getDataSourceFiles()=[bundle:///default_sources.ini], isReportProgress()=true, getHttpProxy()=null, getHttpsProxy()=null, getFtpProxy()=null, geneIdentifiers=[], outputFile=] Downloading/parsing for data source “hg19/refseq” INFO…
How to extract protein sequences from a .gff file
Hello everyone! I am a beginner with bioinformatics but at the company I work at we have a genome assembly of one of our crops. I wanted to annotate the genome and to do so I used a piece of python code in ubuntu. I used the Augustus Arabidopsis database…
Investigating open reading frames in known and novel transcripts using ORFanage
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016). Google Scholar Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023). Google Scholar Pertea,…
GTF file tomato
GTF file tomato 1 HI Everybody I’d like to run the mapping of tomato reads; But not able to find the GTF file. Is there a way to locate this format or convert the GFF file into GTF. tomato GTF GFF • 40 views Visit : www.ncbi.nlm.nih.gov/datasets/taxonomy/4081/ Click on Download…
Counting RNA-seq reads for more features than GTF files contain
Hi all, I have an RNA-seq dataset, and I would like to generate read counts for a wide variety of features, for example, transposable elements, telomeres, centromeres, multiple RNA subtypes, and introns. As far as I can tell, GTF files only permit a very restricted number of features in their…
Gtf or GFF with target genes regulated by ncRNAs
Gtf or GFF with target genes regulated by ncRNAs 0 I was wondering if there is a gtf containing target genes regulated by ncRNAs. In databases like GENECODE i can download a gff/gtf with the gene_ids which corresponds to the id of the lncRNA. Is it possible to find or…
How can i approach to this problem, pls help. Reference genome assembly.
How can i approach to this problem, pls help. Reference genome assembly. 0 Hello everyone, i need help to do this work. I need a workflow to to do this work. Just someone tell me what i have to do. in my views i have to do map all the…
Rnammer Gives Mysterious Error
Rnammer Gives Mysterious Error 3 Does anyone have experience with RNAmmer? On my new machine it gives an error after running it every time. I have tried to edit the file to understand it better and so the error message might be slightly different than yours. I altered SIG{INT} to…
Insertion sequence transposition inactivates CRISPR-Cas immunity
Primers, plasmids, bacterial strains, and growth conditions Primers, plasmids, and strains used in this study are listed in Tables S1–S3, respectively. Escherichia coli strains MG1655, DH10B, MDS42, and their derivates were cultured in LB medium at 37 °C, with shaking at 220 rpm. All the recombinants containing the temperature-sensitive pSC101-derived plasmid were cultured…
Help with htseq -count read counts
Hello I am doing a transcriptome analysis on Pseudomonas putida and I have been trying to do a read count using Htseq -count. The program always give an error. I have tried different genome references (fna) and annotation files (gtf ang gff) but it does not work. The mapping works…
Contig order rearranged agat
Contig order rearranged agat 0 Hi, I annotated a genome with prokka and while converting to GTF with agat, I get the following error: => Version of the Bioperl GFF parser selected by AGAT: 3 gff3 reader error level1: No ID attribute found @ for the feature: … 1 warning…
snpEff and SIFT calculation
snpEff and SIFT calculation 0 Hi, I annotated my VCF files using snpEff by creating new database (I use own assembly and gtf file). I would like also to calculate SIFT (prediction of consequence in missense variant). I found, that I can do this using snpSift, but I see that…
adding features to gtf file using agat tool function
I m trying to use this agat which adds new attributes from tsv to gtf file. My file formats are as such input tsv which is my reference file gene_id Entrez_ID ENSCAFG00845006432 399518 ENSCAFG00845002136 399530 ENSCAFG00845029798 399544 ENSCAFG00845011460 399545 ENSCAFG00845001610 399653 ENSCAFG00845013158 403157 ENSCAFG00845014982 403168 ENSCAFG00845021967 403170 ENSCAFG00845019241 40340 Next…
Prediction of Ribosomal RNA Genes Using RNAmmer Software
Introduction Ribosomal RNA (rRNA) genes are known to be an integral part of ribosome synthesis machinery hence been studied extensively. Due to their repetitive nature, evolutionary converseness, and ubiquitous distribution /omnipresence, these genes are playing a key role in varying functions and mechanisms including maintenance of genome integrity, control of…
Construct a pantranscriptome reference with two haplotypes from a single sample.
Construct a pantranscriptome reference with two haplotypes from a single sample. 0 Hello, I am trying to construct a graph reference for rpvg using two haplotypes from a single sample. I created a GFA file from two haplotype from single sample with pggb. Then, I generated VCF file and graph.pg…
Stringtie problem
Stringtie problem 0 I have found a problem while doing the part of estimation using stringtie technique, I work on TB bacterium genome and its Gff file ( I only found annotation file in gff on ensemble) , it gives me an erro that tells” can not find gene ID…
Extracting exons using GenomicFeatures is different from manual extraction
If I try to extract the length of all exons (also those overlapping) using the GenomicFeatures R package with this code and this gencode file library(GenomicFeatures) txdb <- makeTxDbFromGFF(“tables/gencode.v43.basic.annotation.gtf.gz”, format = “gtf”) exons.list.per.gene <- exonsBy(txdb, by = “gene”) sort(width(exons.list.per.gene)[[“ENSG00000000003.15”]]) the result is [1] 75 84 99 108 135 189 189…
How to obtain gtf file for plant genomes?
How to obtain gtf file for plant genomes? 1 I would like to perform RNA-seq analysis for a plant genome. For which I need to downloaded genome and gtf files of the plant. However, NCBI database has gff file instead of gtf file. Even Ensembl Plants database also has gff…
RSEM implementation
RSEM implementation 0 I have the virus genome(fasta) and gff file and I am trying to prepare-reference through the following commands: rsem-prepare-reference –gff3 KT992094.1.gff3 KT992094.1.fasta or rsem-prepare-reference –gff3 KT992094.1.gff \ –gff3-genes-as-transcripts \ –bowtie \ KT992094.1.fasta \ ref/virus But it’s saying: Invalid number of arguments! How can I solve this issue?…
Visualize mitochondrial genome
Visualize mitochondrial genome 1 Hi all, I have a simple question. I want to visualize the mitochondrial genome together with its genes, the typical image that you find when you look for mitochondrial genome at google: www.ecosia.org/images?q=mitochonrial%20genome I already have the fasta file and the annotation done and I have…
A pangenome reference of 36 Chinese populations
Populations and samples For Phase I of the CPC project, we selected 68 samples from 731 individuals with genomes deep-sequenced using next-generation sequencing. Following a previous study5, we applied a procedure to quantitatively evaluate the genetic diversity coverage based on principal component analysis results. We selected individuals using a statistic…
amrfinder not working on loop?
amrfinder not working on loop? 0 Hi, i am trying to run amrfinder on multiple genome as in loop, and it gives following error, and whatever input file I am using in this program are also not shown after it run. #!/bin/bash **for k in /home/bvs/neelam/AMRFINDER_hypo/hyocool/*.fasta;do NAME=$(basename $k .fasta) echo…
What Are The Most Common Stupid Mistakes In Bioinformatics?
Forum:What Are The Most Common Stupid Mistakes In Bioinformatics? 78 While I of course never have stupid mistakes…ahem…I have many “friends” who: forget to check both strands generate random genomic sites without avoiding masked (NNN) gaps confuse genome freezes and even species but I’m sure there are some other very…
failed to find the gene identifier attribute
featureCounts: ERROR: failed to find the gene identifier attribute 1 Hello I made my own gtf file from hmmer results and I used it to calculate abundance of genes from the annotated feature of my gtf file using featureCounts program. The error message that I got is the following: featureCounts…
Using RNA-seq to detect pathogen sequences in host tissue
Using RNA-seq to detect pathogen sequences in host tissue 0 Hello all, I have a project that I am working on that I wanted to get some guidance on if possible. Basically, we have sent samples for RNA-seq in which we want to determine infection and levels of infection in…
ChIP-Seq
ChIP-Seq Input Data (Reference Feature) LiftOver LiftOver option] body=[We provide on-the fly lift-over of reference data sets between different genome assemblies for broader comparison among annotations.]”> : Upload custom Data File Format] body=[All ChIP-seq tools use SGA (Simplified Genome Annotation) files as an internal working format. SGA intput…