Tag: FASTA

Get mRNA sequence from amino acid sequence using BioPython

Get mRNA sequence from amino acid sequence using BioPython 1 I have a sequence of amino acids like “MVLLV” and I want to know what mRNA corresponds to that, I’m using biopython for this, but I just know the back_transcribe(), to revert the RNA to DNA, how can I do…

Continue Reading Get mRNA sequence from amino acid sequence using BioPython

Contigs number vs NNN gap % in WGS

Contigs number vs NNN gap % in WGS 0 Hi all, I have two fasta files of bacterial whole genome sequence assembly (same genome) generated by spades generated by spades > extra scaffolding step using multi-csar, by comparing it to 5 close genomes. quast statistics as below : Genome 1…

Continue Reading Contigs number vs NNN gap % in WGS

Ubuntu Manpage: gt-encseq-encode – Encode sequence files (FASTA/FASTQ, GenBank, EMBL) efficiently.

Provided by: genometools_1.6.5+ds-2_amd64 NAME gt-encseq-encode – Encode sequence files (FASTA/FASTQ, GenBank, EMBL) efficiently. SYNOPSIS gt encseq encode sequence_file [sequence_file [sequence_file …]] DESCRIPTION -showstats [yes|no] show compression results (default: no) -ssp [yes|no] output sequence separator positions to file (default: yes) -des [yes|no] output sequence descriptions to file (default: yes) -sds [yes|no]…

Continue Reading Ubuntu Manpage: gt-encseq-encode – Encode sequence files (FASTA/FASTQ, GenBank, EMBL) efficiently.

Ubuntu Manpage: Bio::Roary::External::Makeblastdb – Wrapper around NCBIs makeblastdb command

Provided by: roary_3.13.0+dfsg-1_all NAME Bio::Roary::External::Makeblastdb – Wrapper around NCBIs makeblastdb command VERSION version 3.13.0 SYNOPSIS Take in a fasta file and create a temporary blast database. use Bio::Roary::External::Makeblastdb; my $blast_database= Bio::Roary::External::Makeblastdb->new( fasta_file => ‘contigs.fa’, exec => ‘makeblastdb’ ); $blast_database->run(); METHODS output_database Returns the path to the temporary blast database files…

Continue Reading Ubuntu Manpage: Bio::Roary::External::Makeblastdb – Wrapper around NCBIs makeblastdb command

FASTQ to FASTA Converter

About the tool The FASTA format is a text-based format for representing nucleotide or peptide sequences. The FASTQ format additionally includes the corresponding quality scores. This tool allows you to convert FASTQ files to FASTA. The resulting FASTA file will contain only the sequence data from the input FASTQ file….

Continue Reading FASTQ to FASTA Converter

NifH database for taxonomic assignment in qiime2 – General Discussion

JThurston (Josh Thurston) January 19, 2024, 8:05pm 1 Hi all! I’m currently working through a Qiime2 pipeline analysing Illumina miseq paired-end amplicon data. I’ve successfully analysed bacterial (16s) amplicons from importing, filtering through to taxonomic assignment. However, I also have amplicon data for a functional gene (nifH; nitrogenase for N2…

Continue Reading NifH database for taxonomic assignment in qiime2 – General Discussion

Problem with DRAGEN RNAseq hashtable directory

Problem with DRAGEN RNAseq hashtable directory 1 Dear all, Recently I wrote a code to work with DRAGEN and RNAseq pipeline. I use this command: /opt/edico/bin/dragen -f -l \ -r refdir \ -1 ${forward} \ -2 ${reverse} \ -a ${gtf} \ –output-dir output/${sample} \ –output-file-prefix ${sample} \ –RGID ${sample}_group_id \…

Continue Reading Problem with DRAGEN RNAseq hashtable directory

Remove Uncharacterized chromosomes before alignment in chipseq

Remove Uncharacterized chromosomes before alignment in chipseq 0 Hi, I have a question. i am processing a Chipseq experiment on mm10 genome. I did quality check, trimming, alignment, duplicate removal. The “problem” Is that I did not remove Uncharacterized chromosomes from reference fasta genome. I was planning to remove them…

Continue Reading Remove Uncharacterized chromosomes before alignment in chipseq

From nucleotide or proteine sequences to EC number using biopython

From nucleotide or proteine sequences to EC number using biopython 0 Hi, if I have a fasta file containing nucleotide sequences or proteines sequences is it possible to get EC number using biopython for example 1.1.1.169 1.1.1.205 1.1.1.25 1.1.1.302 1.1.1.330 1.1.1.34 ps : I’m working on fungus so I need…

Continue Reading From nucleotide or proteine sequences to EC number using biopython

Reference genome, BWA and right algorithm

Reference genome, BWA and right algorithm 1 Hello I’m using BWA to create the index for aligning some rna-seq fastq. First thing I did was download hg38.fa.align.gz from UCSC Then I: gzip -d hg38.fa.align.gz sudo apt-get install bwa Here comes the problem. BWA instructions reccomend bwtsw algorithm, but when I…

Continue Reading Reference genome, BWA and right algorithm

How to trim miRNA reads?

How to trim miRNA reads? 1 Hi there, I am new to bioinformatics. I am trying to prepare fasta.gz files for uploading onto CPSS, a websever for miRNA-seq datasets. My data is from Gene Omnibus db. Basically the sample fasta file appears like this: ;>SRR1658346.1 HISEQ1:187:D0NWFACXX:3:1101:2565:2050 length=51 ATCATACAAGGACAATTTCTTTTAACGTCGTATGCCGTCTTCTGCTTGNAA >SRR1658346.2 HISEQ1:187:D0NWFACXX:3:1101:2654:2232…

Continue Reading How to trim miRNA reads?

Remove sequences from a fasta file with IDs from a text file using Python

a python beginner here. I have a fasta file with 2500+ sequences, and after doing some analysis I want to remove around 200+ sequences based on the matching IDs. Now, I have one fasta file (as sample.fa) and a text file with a list of IDs for the sequences that…

Continue Reading Remove sequences from a fasta file with IDs from a text file using Python

extract sequence from genome by region

extract sequence from genome by region 2 fasta • 277 views • link updated 1 hour ago by Ram 42k • written 6 hours ago by gernophil ▴ 40 you can integrate Bioio in Python by using subprocess module. subprocess is a built-in Python module. here is an example: import…

Continue Reading extract sequence from genome by region

Bwa-mem2 indexing not working down stream

Bwa-mem2 indexing not working down stream 0 I am using bwa-mem2 to create my index and to map my reads to the reference genome. This is my indexing code #!/bin/bash #SBATCH -J index #SBATCH -A gts-rro3 #SBATCH -N 1 –ntasks-per-node=24 #SBATCH –mem-per-cpu=8G #SBATCH -t 1:00:00 #SBATCH -o index.out cd $SLURM_SUBMIT_DIR…

Continue Reading Bwa-mem2 indexing not working down stream

DADA2 formatted 16S rRNA gene sequences for both bacteria & archaea

Description This version is to stay up to date with the improvements and increase in 16S rRNA gene sequences (SSU) added to the GTDB release 214.1.  Please read this post for the stats on the updates. gtdb.ecogenomic.org/stats/r214 . There has been no change to the RDP-RefSeq reference database If anyone…

Continue Reading DADA2 formatted 16S rRNA gene sequences for both bacteria & archaea

SnapGene Version 7.1.1

SnapGene 7.1.1 was released on December 18, 2023. Fixes Fix a regression that could result in searches for queries longer than 4000 bp failing Ensure files with standard FASTA file extensions are opened as sequences regardless of whether they include a FASTA description. Fixed a crash that could occur when…

Continue Reading SnapGene Version 7.1.1

Filter.seqs error: Sequences are not all the same length

jpits December 15, 2023, 8:53am 1 Hi Mothur, I am trying to analyze samples from v3-v4 region following the SOP, I have alligned the sequences and screened the alligned fasta. Then comes the filter.seqs step. Here is the sumary.seqs of my input to filter.seqs and the filter.seqs command as I…

Continue Reading Filter.seqs error: Sequences are not all the same length

how to get sequences by location when pyfaidx.Fasta(read_long_names=True) creates keys from FastaRecord (dmel r6 genome build)?

I’m attempting to use pyfaidx to index the dmel r6 genome build so that I can get actual sequences from tuples like (chromosome, start,end). In the pyfaidx documentation, they describe this process using pyfaidx.Fasta, which is able to access sequences using discrete chromosome locations: >>> genes = Fasta(‘tests/data/genes.fasta’) >>> genes[‘NM_001282543.1’][200:230]…

Continue Reading how to get sequences by location when pyfaidx.Fasta(read_long_names=True) creates keys from FastaRecord (dmel r6 genome build)?

Q&A Report from the workshop_ _Exploring EMBL-EBI sequence analysis tools and managing bioinformatics workflows | PDF | Sequence Alignment

  Q&A Report from the workshop: QuestonWha is he bes msa ool?clusal 2 and clusal omega are he sameHow would we ener multple sequences? because here is only one inpu boxCould he legend explaining symbiols (*, -,…) be shown in he resul window?Wha is he max number of sequences one…

Continue Reading Q&A Report from the workshop_ _Exploring EMBL-EBI sequence analysis tools and managing bioinformatics workflows | PDF | Sequence Alignment

Tax4Fun2 package are not found and github repository is not maintained anymore

Installation: Tax4Fun2 package are not found and github repository is not maintained anymore 5 Hi everyone! I have tried to find the R package Tax4Fun2 from the paper (pubmed.ncbi.nlm.nih.gov/33902725/) . This R package lets analizes the microbiome in an easy way to predict functional profiles from metagenomic 16S rRNA data….

Continue Reading Tax4Fun2 package are not found and github repository is not maintained anymore

A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Mouse breeding and husbandry All experimental procedures related to the use of mice were approved by the Institutional Animal Care and Use Committee of the AIBS, in accordance with NIH guidelines. Mice were housed in a room with temperature (21–22 °C) and humidity (40–51%) control within the vivarium of the AIBS…

Continue Reading A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking

Prediction with AlphaFold2 and AlphaFold-Multimer For each PDB the release date in the Protein Data Bank34 was recorded. AlphaFold 2 (2.2.2) was run setting the –max_template_date flag to be the day before the release date of the PDB and the –model_preset to be either monomer for AF or multimer for…

Continue Reading Accurate prediction of protein assembly structure by combining AlphaFold and symmetrical docking

convert VCF to gVCF

Your question is not completely clear, but since the most sensible ways to understand it have the same answer, I’m gonna go with that. I have the exact reference fasta used for generating the VCFs TLDR: You don’t have enough information to do this with just VCFs and reference fasta….

Continue Reading convert VCF to gVCF

Explanation of definition lines for Trinity .fasta and .SuperTrans.fasta files

Explanation of definition lines for Trinity .fasta and .SuperTrans.fasta files 0 Hi folks, I assembled a transcriptome in Trinity v2.8.5 using the –include_supertranscripts parameter. These are the deflines for the .fasta file: >TRINITY_DN8_c3_g1_i1 len=330 path=[0:0-329] >TRINITY_DN8_c1_g1_i1 len=271 path=[0:0-270] >TRINITY_DN8_c2_g1_i1 len=357 path=[0:0-356] >TRINITY_DN8_c0_g1_i4 len=2132 path=[0:0-1596 2:1597-1673 3:1674-1734 4:1735-1789 8:1790-1797 9:1798-1927 11:1928-2025…

Continue Reading Explanation of definition lines for Trinity .fasta and .SuperTrans.fasta files

How to query NCBI to extract Virus fasta files using BioPython?

How to query NCBI to extract Virus fasta files using BioPython? 1 Hi ! I want to extract the genome fasta files of 30 samples automatically using python script from here www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239&host=bacteria. I want the virusus that have has host bacteria and I am using BioPython Package. Entrez.email = “mail”…

Continue Reading How to query NCBI to extract Virus fasta files using BioPython?

Extract fasta sequence from gff3 file

Extract fasta sequence from gff3 file 2 Hi everyone, I have a lot of .gff3 files with the CDS features and below with the fasta sequence. This sequence is separated from the CDS features like this: ##FASTA >NZ_NZ_LR130533.1 I would like to extract all the fasta sequence into new fasta…

Continue Reading Extract fasta sequence from gff3 file

Metadata for RNAseq project analysing differential expression in Culex pipiens mosquitoes infected by two avian Plasmodium species

Título:  Autor:  Garrigós, Marta; Ylla, Guillem CSIC ORCID; Martínez de la Puente, Josué CSIC ORCID; Figuerola, Jordi CSIC ORCID ; Ruiz-López, María José CSIC ORCID Palabras clave:  TranscriptomesAvian malariaCulexGene expresion Fecha de publicación:  12-dic-2023 Editor:  DIGITAL.CSIC Citación:  Garrigós, Marta; Ylla, Guillem; Martínez de la Puente, Josué; Figuerola, Jordi; Ruiz-López, María…

Continue Reading Metadata for RNAseq project analysing differential expression in Culex pipiens mosquitoes infected by two avian Plasmodium species

ubuntu – Medaka: unrecognized command ‘tools’ and samtools not found

When trying to run medaka_consensus in ubuntu, I am getting the following error. I installed into a virtualenv to run on ubuntu. (medaka) ubuntu:~/medaka$ medaka_consensus -i combined.fastq -d curated.fasta -t -o ~/medaka 10 -m r941_sup_plant_g610 TF_CPP_MIN_LOG_LEVEL is set to ‘3’ [main] unrecognized command ‘tools’ Attempting to automatically select model version….

Continue Reading ubuntu – Medaka: unrecognized command ‘tools’ and samtools not found

phylogenetics – How to remove third codon positions from a charset in iqtree?

Yes its easy #nexus BEGIN SETS; charset atp6.fasta = 1-651\3, 2-651\3; charset atp8.fasta = 652-729\3, 653-729\3; charset cob.fasta = 730-1749\3, 731-1749\3; # … so on for every gene END; So for atp6 you will get 1,2,4,5,7,8,10,11 …. up to 651, and so on for each gene. Thus the residues at…

Continue Reading phylogenetics – How to remove third codon positions from a charset in iqtree?

Long PCRSeq – Microsynth – CH

    Explore expanded possibilities with Microsynth’s Long PCRSeq, leveraging the cutting-edge long-read sequencing technology from Oxford Nanopore Technologies (ONT) to sequence clonal linear DNA ranging from 600 bp to 50 kb in length. Conveniently accessible for samples in tubes and 96-well plates, this service builds upon the capabilities of…

Continue Reading Long PCRSeq – Microsynth – CH

how to merge human reference genome and GTF file with a custom sequence.

Hello Biostars, I am looking for some guidance on how to merge some files for my rna-bulk sequencing analysis. Let me start by describing the problem: I recieved an mRNA sequence of 4775 characters which I would like to merge with the human reference genome that I download from NCBI…

Continue Reading how to merge human reference genome and GTF file with a custom sequence.

What is the troubleshoot for this error: conversion of .SRA to FASTA file on command prompt?

I am getting this error message after using the following code: C:\sratoolkit.3.0.7-win64\sratoolkit.3.0.7-win64\bin>fastq-dump –fasta SRR1658345 Error: 2023-12-11T06:08:04 fastq-dump.3.0.7 err: timeout exhausted while waiting condition within process system module – failed SRR1658345 ============================================================= An error occurred during processing. A report was generated into the file ‘C:\Users\Hp/ncbi_error_report.txt’. If the problem persists, you may…

Continue Reading What is the troubleshoot for this error: conversion of .SRA to FASTA file on command prompt?

SRA toolkit (NCBI) – sra to fasta

SRA toolkit (NCBI) – sra to fasta 1 Dear all, At the moment I’m trying to download sequences from the Sequence Read Archive (SRA) from NCBI and put them into fasta format. For this I downloaded the SRA-toolkit of NCBI and used the following code: set PATH=%PATH%;C:\Users\Admin\Desktop\sratoolkit.2.9.0-win64\sratoolkit.2.9.0-win64\bin prefetch –max-size 100000000…

Continue Reading SRA toolkit (NCBI) – sra to fasta

Renaming fasta files with their headers

Renaming fasta files with their headers 1 Hi I have around 85 gene sequences in individual fasta files. I’d like to rename each file with their header name containing the gene name in [gene=]. For each header, I only want what is in-between the brackets. I’m trying to do this…

Continue Reading Renaming fasta files with their headers

vcfdist: accurately benchmarking phased small variant calls in human genomes

The affine gap design space for selecting variant representations As demonstrated in Fig. 1, the main issue with a difference-based format such as VCF is that often there are multiple reasonable sets of variant calls that can be used to represent the same final sequence relative to a reference FASTA. Since…

Continue Reading vcfdist: accurately benchmarking phased small variant calls in human genomes

Genome sequence and characterization of a novel Pseudomonas putida phage, MiCath

Bacterial strains We used P. putida strains S12, DOT-T1E, F1 (kindly gifted by Grant Rybnicky), ATCC 12633 (purchased from ATCC), JUb85 (kindly provided by Samuel Buck), EM383 (kindly gifted by Huseyin Tas), p106 (kindly provided by Carey-Ann Burnham), and KT2440 (obtained from lab stocks). An overnight culture of each P….

Continue Reading Genome sequence and characterization of a novel Pseudomonas putida phage, MiCath

How to create interval list from reference fasta or dict file?

How to create interval list from reference fasta or dict file? 3 I am using GATK pipeline on WGS data. My BAM files is aligned to GRCh38 from GENCODE. So I want to create interval file for this GRCh38 instead of download from GATKbundle, because some of their contigs have…

Continue Reading How to create interval list from reference fasta or dict file?

GetPileupSummaries intervals-list with Targeted Sequencing?

GetPileupSummaries intervals-list with Targeted Sequencing? 0 Hi! I am applying the GetPileUpSummaries, for somatic variant calling starting from targeted sequencing .fasta. I aligned the file with the GrCh38 reference. And currently I am at the GetPileUpSummariesStep. gatk –java-options -Xmx200G GetPileupSummaries \ -I $RECBAM \ -L ???? \ -O $OUTPUT \…

Continue Reading GetPileupSummaries intervals-list with Targeted Sequencing?

Apply BSQR for Targeted Sequencing

Apply BSQR for Targeted Sequencing 0 Hi! I am performing variant calling starting from a fasta resulting targeted sequencing of ~320 cancer genes, I followed the GATK best practices aligning with the GrCh38 reference. For the Apply Base Quality Score Recalibration, which files should I use for the “–known-sites” given…

Continue Reading Apply BSQR for Targeted Sequencing

How can I obtain the DNA sequences of each CDS for several genbank files?

How can I obtain the DNA sequences of each CDS for several genbank files? 0 Hello, I want to obtain DNA sequences of all the CDS from multiple genbank files in one fasta file. I tried several solutions with Biopython but nothing is working for me. I tried for exemple…

Continue Reading How can I obtain the DNA sequences of each CDS for several genbank files?

Need help to find FASTA sequence from dbSNP

Need help to find FASTA sequence from dbSNP 0 Hello, I am trying to find a fasta sequence to locate SNP using db SNP but I could not find the fasta sequence. The sequence that I want is shown in picture. Could anyone tell me the steps how to reach…

Continue Reading Need help to find FASTA sequence from dbSNP

Issues while running blastx

Issues while running blastx 1 Hi, I face this problem when I run blastx command in linux. blastx -db ~/Downloads/uniprot_sprot.dat -query ../../../trinity_out_dir.Trinity.fasta -num_threads 2 -max_target_seqs 1 -outfmt 6 > balstx.outfmt6 Warning: [blastx] Examining 5 or more matches is recommended BLAST Database error: No alias or index file found for protein…

Continue Reading Issues while running blastx

sequence alignment – BioPython bootstrap is not reliable?

I think this is a bug. It seems to work if you do this, creating an equivalent Alignment object instead of a MultipleSeqAlignment to give the bootstrap step: from Bio.Align import Alignment alignment2 = Alignment(list(alignment)) consensus_tree = bootstrap_consensus(alignment=alignment2, times=50, tree_constructor=constructor, consensus=majority_consensus) bootstrap_consensus calls bootstrap_trees, which makes however many randomly shuffled…

Continue Reading sequence alignment – BioPython bootstrap is not reliable?

Aligning sequences with multiple genetic codes!

Aligning sequences with multiple genetic codes! 0 Hello everyone! I am doing a project on duplicated genes and I have a major difficulty on how to align sequences that use different genetic codes. I work with fasta files that contain sequences of protein coding genes, every fasta file includes genes…

Continue Reading Aligning sequences with multiple genetic codes!

Calculate GC content for entire chromosome

If you’re comfortable using Python, I’ve created a script that calculates the GC content and GC-skew for each contig, scaffold, or chromosome in a fasta file. This is specifically designed for generating data for a circos plot. To use the script, make sure you have Biopython installed in your conda…

Continue Reading Calculate GC content for entire chromosome

How to remove 3rd codon positions in a multiple sequence allignment?

How to remove 3rd codon positions in a multiple sequence allignment? 0 Hello everyone. I need to build a phylogenetic tree using IQ-TREE, starting from a sequence alignment in CODON format of several invertebrates. These are my charsets: BEGIN SETS; charset atp6.fasta = 1-651; charset atp8.fasta = 652-729; charset cob.fasta…

Continue Reading How to remove 3rd codon positions in a multiple sequence allignment?

How to convert UNITE dataset into ecoPCR format to perform insilico PCR?

How to convert UNITE dataset into ecoPCR format to perform insilico PCR? 1 Hi all, I want to perform the insilico analysis to test the ITS primers against UNITE datasets. I have the reference seq and taxonomy files. But I am unable to convert it into a format required by…

Continue Reading How to convert UNITE dataset into ecoPCR format to perform insilico PCR?

FASTA – Packages – Package Control

Let Sublime Text know better of FastA format. FastA format is a commonly used text-based format for representing biological sequence data, such as DNA or protein sequences. It consists of a single-line sequence header, which begins with a > character, followed by one or more lines of sequence data. The…

Continue Reading FASTA – Packages – Package Control

BLAST: overflow error

Hi, I’m using blastn in BLAST 2.11.0 and it keeps failing for specific sequences for a reason that I’m yet to understand. Any lead on what he problem might be? The error message is Error: NCBI C++ Exception: T0 “/tmp/BLAST/2.11.0/gompi-2020b/ncbi-blast-2.11.0+-src/c++/src/serial/objistrasnb.cpp”, line 499: Error: (CSerialException::eOverflow) byte 132: overflow error ( at…

Continue Reading BLAST: overflow error

Error in blast+

Error in blast+ 0 Hello, I have a problem with creating a local database (blast+) I downloaded NCBI BLAST and then put a fasta file in the bin folder. Later I opened this folder in PowerShell and wrote a command “makeblastdb -in ownBLASTdb.fasta -out DataBase -dbtype prot -parse_seqids”. I got…

Continue Reading Error in blast+

File mismatch detected after align.seqs or screen.seqs – Commands in mothur

Hello, I’ve seen the topic of file mismatch opened on this forum multiple times, but have not seen a solution to my issue. I have V4 region 16S sequences, and I’m using the silva v148.1 reference alignment. After cleaning up my sequences (using the SOP until the alignment point), and…

Continue Reading File mismatch detected after align.seqs or screen.seqs – Commands in mothur

Fastest way to convert BED to GTF/GFF with gene_ids?

This is probably a duplicated question from: How To Convert Bed Format To Gtf? How to convert original BED file to a GTF ? Converting different annotation file formats (GTF/GFF/BED) to each other How to change scaffold.fasta file or scaffold.bed file to GTF file? Convert bed12 to GFF convert bed12…

Continue Reading Fastest way to convert BED to GTF/GFF with gene_ids?

Yes .. BBMap can do that!

NOTE: This collection was originally posted at SeqAnswers.com. Creating a copy here to preserve the information.Part I is available here: Yes .. BBMap can do that! – Part I : bbmap (aligner), bbduk (scan/trim), repair (fix PE reads) and reformat (format conversions)Part II is available here: Yes .. BBMap can…

Continue Reading Yes .. BBMap can do that!

r-bioc-phyloseq 1.22.3-1

/usr/ root:root 0o755 /usr/lib/ root:root 0o755 /usr/lib/R/ root:root 0o755 /usr/lib/R/site-library/ root:root 0o755 /usr/lib/R/site-library/phyloseq/ root:root 0o755 /usr/lib/R/site-library/phyloseq/CITATION text/plain root:root 0o644 606 bytes /usr/lib/R/site-library/phyloseq/data/ root:root 0o755 /usr/lib/R/site-library/phyloseq/data/datalist text/plain root:root 0o644 44 bytes /usr/lib/R/site-library/phyloseq/data/enterotype.RData application/x-xz root:root 0o644 190.7 KB /usr/lib/R/site-library/phyloseq/data/esophagus.RData application/x-xz root:root 0o644 1.8 KB /usr/lib/R/site-library/phyloseq/data/GlobalPatterns.RData application/x-xz root:root 0o644 425.4 KB /usr/lib/R/site-library/phyloseq/data/soilrep.RData application/x-xz root:root 0o644 104.9 KB /usr/lib/R/site-library/phyloseq/DESCRIPTION text/plain…

Continue Reading r-bioc-phyloseq 1.22.3-1

Which program, tool, or strategy do you use to visualize genomic rearrangements?

Which program, tool, or strategy do you use to visualize genomic rearrangements? 5 Which program, tool, or strategy do you use to visualize genomic rearrangements? In relation to my master thesis I’m working on tools to visualize fusion genes. In that regard I’m interested in any and all strategies and…

Continue Reading Which program, tool, or strategy do you use to visualize genomic rearrangements?

Beginner questions: Working step for organism identification

Kanapol November 30, 2023, 9:57am 1 Hello everyone, I am the beginner for this area of study, and I have no one to ask these simple questions. I want to identify organism using NGS sequencing. I read from tutorial and I known that I have to make contigs and filter…

Continue Reading Beginner questions: Working step for organism identification

Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m5C at GCU motifs

Introduction Oxford Nanopore Technologies (ONT) direct RNA sequencing (Fig 1A) enables detection of RNA modifications. A modified base produces an altered electrical current and/or dwell time relative to a canonical base that can be detected with algorithms (Garalde et al, 2018; Smith et al, 2019; Workman et al, 2019). Figure…

Continue Reading Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m5C at GCU motifs

How to estimate -g genome-size for Flye for de novo genome or for –meta mode when using Metaflye?

How to estimate -g genome-size for Flye for de novo genome or for –meta mode when using Metaflye? 0 A few questions regarding the -g genome-size argument for Flye and MetaFlye: How much does this value influence the performance and output of the assemblies? How can I estimate the genome…

Continue Reading How to estimate -g genome-size for Flye for de novo genome or for –meta mode when using Metaflye?

Extracting only soft/hard clipped reads from a bam file

Extracting only soft/hard clipped reads from a bam file 4 Hello all! I am working on some data but need a little bit of help with a bit of an unusual task. We are looking at where lentiviral DNA has inserted itself in our host genome, and to do this…

Continue Reading Extracting only soft/hard clipped reads from a bam file

Can you help me to download list of miRNA from a SRA under a bioproject ?

Can you help me to download list of miRNA from a SRA under a bioproject ? 0 Hello, After reading this paper: I would like to get all miRNA they found and it seems there is a bioproject with data containing list of miRNA : Data Availability … And miRNA…

Continue Reading Can you help me to download list of miRNA from a SRA under a bioproject ?

Longitudinal detection of circulating tumor DNA

Analysis of Roche KAPA Target Enrichment kit experimental data obtained on an Illumina sequencing system is most frequently performed using a variety of publicly available, open-source analysis tools. The typical variant calling analysis workflow consists of sequencing read quality assessment, read filtering, mapping against the reference genome, duplicate removal, coverage…

Continue Reading Longitudinal detection of circulating tumor DNA

Blastn DB issue

Hello ! I’m currently trying to develop a local core-genome MLST tool using a combination of a huge genes database and blastn but I came into outputs I can’t explain. Here’s what I have: My genome: genome.fasta My database, comprised of ~1000 genes and n alleles: db/GENE01.fasta: 1500 sequences. db/GENE02.fasta:…

Continue Reading Blastn DB issue

ESRP1 controls biogenesis and function of a large abundant multiexon circRNA | Nucleic Acids Research

Abstract While the majority of circRNAs are formed from infrequent back-splicing of exons from protein coding genes, some can be produced at quite high level and in a regulated manner. We describe the regulation, biogenesis and function of circDOCK1(2–27), a large, abundant circular RNA that is highly regulated during epithelial-mesenchymal…

Continue Reading ESRP1 controls biogenesis and function of a large abundant multiexon circRNA | Nucleic Acids Research

Viral genes not showing up in combined mouse+virus alignment

Viral genes not showing up in combined mouse+virus alignment 1 I created a combined MHV-A59 and mm10 fasta and GTF file using the linux cat command. The last two entries of the mm10 and first two of the A59 of the combined GTF looks like this: I then made a…

Continue Reading Viral genes not showing up in combined mouse+virus alignment

Bcftools consensus when reference is a deletion

Bcftools consensus when reference is a deletion 1 Hello, I am trying to call a consensus on a VCF file like so: bcftools consensus species.vcf.gz -f Reference.fasta –absent N > Consensus.fasta Error: The site SUPER_1:173197 overlaps with another variant, skipping… I looked at this site and included the previous site…

Continue Reading Bcftools consensus when reference is a deletion

The difference blastn output when using subject and db options

The difference blastn output when using subject and db options 0 I have blasted the candidate transposable element to my genome. When I use (db) command 1 (details below) then the output is small and different from the command 2 using subject parameter with same query . If anybody has…

Continue Reading The difference blastn output when using subject and db options

How can I map coordinates between two strains of yeast?

How can I map coordinates between two strains of yeast? 0 I have two fasta files of genome sequences of two different strains of yeast. I’m looking for a way to map coordinates between these two strains. The reason is that I want to download and use some data from…

Continue Reading How can I map coordinates between two strains of yeast?

I made an error when using metawrap to binning

I made an error when using metawrap to binning 1 my code metawrap binning -o bin_out -t 24 -m 200 -a all_contig/all_merge.fasta –metabat2 –maxbin2 –concoct all_fastq/*fastq Error reported as follows sorting the SRR10492802 alignment file [bam_sort_core] merging from 24 files and 24 in-memory blocks… [E::sam_hdr_sanitise] Malformed SAM header at line…

Continue Reading I made an error when using metawrap to binning

Ancient diversity in host-parasite interaction genes in a model parasitic nematode

Van Valen, L. A new evolutionary law. Evol. Theory 1, 1–30 (1973). Google Scholar  Woolhouse, M. E. J., Webster, J. P., Domingo, E., Charlesworth, B. & Levin, B. R. Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nat. Genet. 32, 569–577 (2002). Article  CAS  PubMed  Google…

Continue Reading Ancient diversity in host-parasite interaction genes in a model parasitic nematode

Species coverage in the NCBI protein NR database ?

Hi Biostars, I am currently trying to build a Eukaryote version of the NCBI NR database and I am not really sure that I fully understand how the NR is implemented. Here is the code that I’m using to do so : #!/usr/bin/bash ############## # DOWNLOAD FULL NR ############## baseURL=”https://ftp.ncbi.nlm.nih.gov/blast/db/”…

Continue Reading Species coverage in the NCBI protein NR database ?

Python Tools for Genomic Data Analysis: From Sequences to Structures | by Bao Tram Duong | Nov, 2023

Analyzing genomic data, from sequences to structures, is a critical aspect of bioinformatics. Python has a rich ecosystem of tools and libraries specifically designed for genomic data analysis. Here’s an overview of key tools and libraries for various stages of genomic data analysis: Description: Biopython is a comprehensive open-source collection…

Continue Reading Python Tools for Genomic Data Analysis: From Sequences to Structures | by Bao Tram Duong | Nov, 2023

Bioinformatics Programming with Biopython: Advanced Biopython Techniques for Computational Biology | by Bao Tram Duong | Nov, 2023

Biopython is an open-source collection of Python tools for computational biology and bioinformatics. It provides modules and classes to work with biological data such as DNA, RNA, protein sequences, structures, and more. Biopython aims to make it easy for developers to access and manipulate biological data in a programmatic way….

Continue Reading Bioinformatics Programming with Biopython: Advanced Biopython Techniques for Computational Biology | by Bao Tram Duong | Nov, 2023

Metagenome-assembled genomes reveal greatly expanded taxonomic and functional diversification of the abundant marine Roseobacter RCA cluster | Microbiome

Diversity of the RCA cluster and genome characteristics The phylogenomic analysis yielded three major clades within the RCA cluster (Fig. 1) Genomes of the three clades were relatively distinct with appr. < 70% average nucleotide identity (ANI), resulting in the proposal of three genera, the known genus Planktomarina, and two new genera without…

Continue Reading Metagenome-assembled genomes reveal greatly expanded taxonomic and functional diversification of the abundant marine Roseobacter RCA cluster | Microbiome

issue in RNA -seq analysis

Forum:issue in RNA -seq analysis 0 hello all. i am working on RNA seq analysis. i would like to know following things: first i downloaded genome fasta file for non-coding rna from ensembl and got the gtf file for hg38 from there itself. performed hist2 and got 17% alignment for…

Continue Reading issue in RNA -seq analysis

Pre.cluster in unaligned, or cluster split? – Commands in mothur

Hi Pat, I am confused about aligned/unaligned in Pre.cluster, where I see you answering that it needs aligned, since in the pre.cluster: “### align When using unaligned sequences, the pre.cluster command allows you to select between two alignment methods – gotoh and needleman – needleman is the default setting: *…

Continue Reading Pre.cluster in unaligned, or cluster split? – Commands in mothur

a desktop tool for processing FASTA files containing DNA and protein sequences

Tool:SEDA (SEquence DAtaset builder): a desktop tool for processing FASTA files containing DNA and protein sequences 4 Dear community members, We present SEDA, an open source application for processing FASTA files containing DNA and protein sequences. The source code is available at GitHub and a complete user manual is available…

Continue Reading a desktop tool for processing FASTA files containing DNA and protein sequences

Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA

Summary Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY,…

Continue Reading Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA

Analysis of nucleoporin 107 overexpression

Introduction Lung cancer is one of the most common types of cancer worldwide and the leading cause of cancer death.1 The main category of lung cancer is non-small cell lung cancer, accounting for about 85%, and lung adenocarcinoma, as a kind of non-small cell lung cancer, is the most frequently…

Continue Reading Analysis of nucleoporin 107 overexpression

How to filter .fasta file based on conditional statement

How to filter .fasta file based on conditional statement 2 Hi all, I have a .fasta file resulting from vsearch clustering. The sequences in the .fasta file look like: >centroid=211650b5-4541-47e4-a7a4-3659962f9818;seqs=2236 GAGATGATGATGATATAATT the “seqs” parameter in the sequence header, reflects the number of reads of that cluster consensus that was present…

Continue Reading How to filter .fasta file based on conditional statement

Phenotypic drug-susceptibility profiles and genetic analysis based on whole-genome sequencing of Mycobacterium avium complex isolates in Thailand

Abstract Mycobacterium avium complex (MAC) infections are a significant clinical challenge. Determining drug-susceptibility profiles and the genetic basis of drug resistance is crucial for guiding effective treatment strategies. This study aimed to determine the drug-susceptibility profiles of MAC clinical isolates and to investigate the genetic basis conferring drug resistance using…

Continue Reading Phenotypic drug-susceptibility profiles and genetic analysis based on whole-genome sequencing of Mycobacterium avium complex isolates in Thailand

Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA

Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA 0 Dear Biostar Community I’m currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements. For this, I’ve downloaded the variants file (VCF) and the…

Continue Reading Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA

Is there a good software to generate test genomics data?

Is there a good software to generate test genomics data? 0 For example if I input a reference genome FASTA can I get simulated FASTQ files for ONT sequencing or PacBio sequencing runs that could have produced that data? I’m trying to migrate over from Snakemake to Nextflow but from…

Continue Reading Is there a good software to generate test genomics data?

did pilon improve my genome?

did pilon improve my genome? 0 my doubt is if my sequence has actually improved? refseq is the reference sequence. polished is the output from pilon and contigs.fasta is my file generated from spades. pls help I used the command Java -Xmx2048m -jar pilon-1.24.jar –genome refseq.fasta –frags sorted bam.bam –output…

Continue Reading did pilon improve my genome?

How to align a genome fasta file from NCBI to a reference genome?

How to align a genome fasta file from NCBI to a reference genome? 1 Hi, I want to align a genome published on NCBI (scaffold level) to a reference genome (chromosome level). I though bwa mem could work, but it is stuck, running for too long without any error message….

Continue Reading How to align a genome fasta file from NCBI to a reference genome?

how to sort fasta file according to a header file

how to sort fasta file according to a header file 2 Hi! I have two files: one is protein fasta file (a.fa) & another is header.txt. I want to get my sequences in the same order as the header file. How can I do this? fasta • 84 views •…

Continue Reading how to sort fasta file according to a header file

Salmon (or other pseudo-mappers) for multi-species RNAseq read filtering

Hello all, Background: I’ve inherited a new RNAseq data set and am thinking about updating my approaches (last time I did this I was using HISAT and Cuffdiff). I’d like some opinions on best strategies to disentangle/filter out parasite microbe reads from infected host reads before preforming a differential gene…

Continue Reading Salmon (or other pseudo-mappers) for multi-species RNAseq read filtering

Is there a way to query Ensembl to get all 3’UTRs from all species?

Is there a way to query Ensembl to get all 3’UTRs from all species? 1 I am trying to obtain stats on how many 3’UTRs are annotated in Ensembl. I would really like to download the as many annotated 3’UTRs as possible from as many species as possible and find…

Continue Reading Is there a way to query Ensembl to get all 3’UTRs from all species?

Ancestral Allele FASTA sequence aligned with Candidate Gene Region

Ancestral Allele FASTA sequence aligned with Candidate Gene Region 0 Hello, I have a .fa file of the entire chromosome 15 of the human ancestor and a candidate gene region located on the same chromosome. I was wondering is if possible to filter the excess gene region so that I…

Continue Reading Ancestral Allele FASTA sequence aligned with Candidate Gene Region

fasta – Get a certain gene sequence from bam/vcf and reference

I need to get a fasta sequence of a certain gene for a certain worm strain that is different from reference. I have a reference genome, BAM for the strain of interest, and coordinates of the gene. I know that vcftools can convert bam to fasta, but I do not…

Continue Reading fasta – Get a certain gene sequence from bam/vcf and reference

How to correct the position of my primers

I have been having issue getting my primers to match the expected output. For instance the bases of the first primer starting on the left are off by 4 base pairs to the right of the correct position. This is what I am suppose to get: ttggcagttgggaccgttta This is what…

Continue Reading How to correct the position of my primers

Using Primer3 with python to genotype a SNP at a particular position

I’m trying to figure out how to genotype an SNP at a particular position. Honestly I’m not sure what that means, but here are the instructions: “The idea is that the user wants to genotype a SNP at a particular position. To do this they will need to amplify a…

Continue Reading Using Primer3 with python to genotype a SNP at a particular position

[main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”.

[main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”. 1 Hi, I tried to annotate chromosome with prefix “chr” in a fasta file like: sed ‘s/^>/>chr/’ human_g1k_v37.fasta > human_g1k_v37.annotate.fasta However, after that, I failed to view header of the new fasta file: samtools view -H human_g1k_v37.annotate.fasta >>> [main_samview] fail to read…

Continue Reading [main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”.

Saponin treatment for eukaryotic DNA depletion alters the microbial DNA profiles by reducing the abundance of Gram-negative bacteria in metagenomics analyses

INTRODUCTION Microbiome research, especially the detection of microorganisms by molecular techniques, has become a fundamental tool for investigating host-associated bacteria, such as those harbored by veterinary or human clinical samples[1,2]. Next-generation sequencing (NGS) approaches now enable the identification of slow-growing, non-cultivable, or non-viable bacteria contained in clinical specimens without relying…

Continue Reading Saponin treatment for eukaryotic DNA depletion alters the microbial DNA profiles by reducing the abundance of Gram-negative bacteria in metagenomics analyses

standalone blastx of queries without NR annotations

standalone blastx of queries without NR annotations 0 I have to identify long non-coding RNA transcripts in Arabidopsis thaliana using expressed sequence tags. For that I have downloaded ESTs from dbEST NCBI. Next I need to blastx ESTs to nr database but can not be done through online database due…

Continue Reading standalone blastx of queries without NR annotations

Add braces to the DNA sequence for re quested primer position.

Add braces to the DNA sequence for re quested primer position. 0 Good afternoon Currently I’m working on this bit of code for the purpose of input DNA code that’s attempting to isolate a specific DNA based on it’s location upstream and downstream. The issue I am having is with…

Continue Reading Add braces to the DNA sequence for re quested primer position.

AVX error with Deepvariant caller

I installed Deepvariant 1.6.0 on a: Dell poweredge R910, 128 Gb of RAM, 36 cores. The is machine has windows 10 LTSC installed on it. However, also installed on it is: Virtual box 6.0 Ubuntu 20.04 LTS Controller ID: Vboxguestaddtions_6.06 Deepvariant was installed on ubuntu 20.04 LTS using docker: github.com/google/deepvariant…

Continue Reading AVX error with Deepvariant caller

Appropriate genome reference for converting TCGA VCF files to MAF

Appropriate genome reference for converting TCGA VCF files to MAF 0 I have a directory of MAF files obtained from TCGA and I want to convert it to VCF format. Reference: GRCh38.d1.vd1 Reference Sequence Source: gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files maf2vcf.pl –input-maf maf/* –output-dir VCF -ref-fasta /home/melchua/.vep/homo_sapiens/GRCh38/GRCh38.d1.vd1.fa.tar.gz Traceback: Use of uninitialized value $lines in…

Continue Reading Appropriate genome reference for converting TCGA VCF files to MAF

Converting Bam To Fastq

Converting Bam To Fastq 5 Any suggestions on good programs or scripts to convert a BAM file back to a fastq? I have found some scripts but wanted to ask for advice before I go too far down the wrong path. next-gen-sequencing fastq • 40k views Use SamToFastq UPDATE 2023:…

Continue Reading Converting Bam To Fastq

Python Bioinformatics Libraries. | kandi

Researchers in molecular biology and computational biology need bioinformatics libraries. They offer many tools to understand biological information.  Scientists can focus on their research because these libraries simplify complicated computational tasks. Let’s delve into the world of bioinformatics libraries and explore their significance.       A bioinformatics library is a collection of…

Continue Reading Python Bioinformatics Libraries. | kandi

format error, unexpected A at line 1

bcftools mipileup error: format error, unexpected A at line 1 0 I had a problem using bcftools. After using the command line(below): there is some error in my results. The error message stated: “Note: none of –samples-file, –ploidy or –ploidy-file given, assuming all sites are diploid [E::fai_build_core] Format error, unexpected…

Continue Reading format error, unexpected A at line 1