Tag: PHRED

Different relatedness estimates by PLINK and VCFTOOLS despite same method

According to the vcftools manual, specifying the “–relatedness2” flag allows calculating relatedness statistics using the method by Manichaikul et al., BIOINFORMATICS 2010 (doi:10.1093/bioinformatics/btq559). That is, based on KING. According to the PLINK manual, PLINK uses the same method to calculate relatedness when specifying the flag “–make-king-table”. So, although both PLINK…

Continue Reading Different relatedness estimates by PLINK and VCFTOOLS despite same method

invalid deflate data (invalid code lengths set)

I am trying to trim paired end reads using Trim-Galore. I have made sure that the files match based on the total reads processed in the output txt file from trim-galore. One of the files trimmed correctly but when I try some of the others the total written and quality…

Continue Reading invalid deflate data (invalid code lengths set)

Metagenomic analysis of Mesolithic chewed pitch reveals poor oral health among stone age individuals

The specific environmental/history/collection context The Huseby Klev materials were unearthed and collected by archaeologists (including two of the co-authors of this article) during the excavation of this coastal hunter-fisher-gatherer site in the 90s50. The material assemblage was rich and well preserved: human bones, animal bones, plant remains and pieces of…

Continue Reading Metagenomic analysis of Mesolithic chewed pitch reveals poor oral health among stone age individuals

MetaSPAdes genome assembly (shotgun metagenome singleend) – usegalaxy.org support

Busrak December 11, 2023, 3:31pm 1 Hi friends, Metagenomic single end raw data with cut adapt‘Maximum error rate 0.3’Match times: 1Minimum overlap length:3minimum lenght: 15Max N: 0.3Max expected errors: 30 parameters.Then I aligned the host genome with gallus gallus with BBmap tool. I want to Assembly unmapped read. However, MetaSPAdes…

Continue Reading MetaSPAdes genome assembly (shotgun metagenome singleend) – usegalaxy.org support

Resolving over clustered NGS with Q-scores

Resolving over clustered NGS with Q-scores 0 I have just received data from an NGS run that I suspect was over clustered. Read 1 is a 24 bp barcode of the following pattern YSKRYSKRYSKRYSKRYSKRYSKR Following the 24 bp barcode, the sequence should be the same for every read. Read 2…

Continue Reading Resolving over clustered NGS with Q-scores

Integrative taxonomy of Metastrongylus spp. in wild boars from Brazil | Parasites & Vectors

Study areas The samples were collected from wild boars hunted in rural properties from the municipalities of São Simão, Monte Azul, Paraíso, Colina, Matão, Bebedouro e Monte Alto (São Paulo), Ipiranga (Paraná), and Santo Antônio das Missões (Rio Grande do Sul) (Fig. 1). Fig. 1 Sampling collection sites of wild boars…

Continue Reading Integrative taxonomy of Metastrongylus spp. in wild boars from Brazil | Parasites & Vectors

Eigen_phred_coding values interpretation

Eigen_phred_coding values interpretation 0 I have problem with understanding how to interpret Eigen_phred score. I understand the main goal of this tool and know that score value would look similary to phred quality score: Q=-10\ \log _{{10}}P. But I haven’t been able to find the P value. Only phred scaled…

Continue Reading Eigen_phred_coding values interpretation

Validation of expressions given input sections – CWL Questions

mvdbeek November 10, 2023, 9:13am 1 Is it possible to catch expressions that access (potentially) undefined variables before runtime ? Take as an example the following diff as applied to github.com/common-workflow-library/bio-cwl-tools/blob/91c42fb809ce18eafe16155cca0abf362270c0fe/fastp/fastp.cwl: diff –git a/fastp/fastp.cwl b/fastp/fastp.cwl index 575c91f..b6be4eb 100755 — a/fastp/fastp.cwl +++ b/fastp/fastp.cwl @@ -19,7 +19,7 @@ baseCommand: fastp arguments: -…

Continue Reading Validation of expressions given input sections – CWL Questions

Spades Log not updating – ran overnight for 6 MB file, still running

Spades Log not updating – ran overnight for 6 MB file, still running 0 I am trying to run spades on a multi-fastq file of short sequences, but it seems to be stuck in one spot the whole night. This is what the spades log has been the entire night,…

Continue Reading Spades Log not updating – ran overnight for 6 MB file, still running

Obtain phred scores for each read

Obtain phred scores for each read 0 Hello, I have some nanopore reads that have a complex insertion. I would like to extract the phred quality scores for each base within the read to assess the quality of the insertion sequence. I have only found tools that can obtain the…

Continue Reading Obtain phred scores for each read

Inferring bacterial transmission dynamics using deep sequencing genomic surveillance data

Study design Experiments were performed in accordance with the New Zealand Animal Welfare Act (1999) and institutional guidelines provided by the University of Auckland Animal Ethics Committee, which reviewed and approved these experiments under application R1003. We did not use any specific randomisation process to allocate animals to a particular…

Continue Reading Inferring bacterial transmission dynamics using deep sequencing genomic surveillance data

No samples in .vcf file.

I am trying to convert my vcf file into a BED format file.  When I use this command: plink –vcf merge.bacteria.vcf.gz –make-bed –out merge.bacteria.vcf.bed  I get the following error stating:  PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License…

Continue Reading No samples in .vcf file.

Mycobacterium tuberculosis Sub Lineage 4.2.2/SIT149 as DR

Introduction Antimicrobial resistance is a hidden global pandemic that shattered over 4.9 million people in 2019 alone, and the burden is highest, mainly in low-resource settings.1 Drug-resistant tuberculosis (DR-TB) caused by Mycobacterium tuberculosis (Mtb) complex (MTBC), which is resistant to one or more anti-TB drugs, is a leading global public…

Continue Reading Mycobacterium tuberculosis Sub Lineage 4.2.2/SIT149 as DR

How to split a folder full of pod5 by phred score?

How to split a folder full of pod5 by phred score? 0 Hi, I’m working on ONT data as pod5 files. Usually, the sequencing device divides the pod5 files in two output folders pod5_pass and pod5_fail. However, this time something failed on that step so I was left with all…

Continue Reading How to split a folder full of pod5 by phred score?

Solved Next Generation Sequencing Questions(a) single-end

Next Generation Sequencing Questions (a) single-end sequencing run is performed using 100 cycles. In the resulting FASTQ files, how many characters will be present in the fourth line of each file? (b) A Phred quality Score of 40 implies a base call accuracy of __? a. 1 error in 10…

Continue Reading Solved Next Generation Sequencing Questions(a) single-end

Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system

Uncultured microorganisms constitute a significant proportion of microbial populations in an ecosystem and play a vital role in its functioning1. The challenges associated with cultivating these microbes have constrained access to the vast phylogenetic and functional diversity they possess. However, recent advancements in metagenomics have opened a new window to…

Continue Reading Metagenome sequencing and recovery of 444 metagenome-assembled genomes from the biofloc aquaculture system

R: Import samtools ‘pileup’ files.

R: Import samtools ‘pileup’ files. readPileup {Rsamtools} R Documentation Import samtools ‘pileup’ files. Description Import files created by evaluation of samtools’ pileup -cv command. Usage readPileup(file, …) ## S4 method for signature ‘connection’ readPileup(file, …, variant=c(“SNP”, “indel”, “all”)) Arguments file The file name, or connection, of the pileup output file…

Continue Reading R: Import samtools ‘pileup’ files.

Most sensible way to find private SNPs from a multisamples vcf with bcftools

Hello, this question is somehow complementary to what I asked yesterday here: Using bcftools to find unique alt homozygous sites Now let’s say I want to find the SNPs 0/1 unique to the sample D3A350g_bcftools2 (see below) I know I can use bcftools view -s D3A350g_bcftools2.bcf -x all_bcftools2_merged.vcf But there…

Continue Reading Most sensible way to find private SNPs from a multisamples vcf with bcftools

Prediction tools summary – zero values

Prediction tools summary – zero values 0 Hi, I am doing summary of 15 prediction tools for my filtered variations into one overal result to check patogenicity of that tools. I have 5 numerical predictions scores and 10 with letters as their values under ACMG recommendation. The 5 numerical prediction…

Continue Reading Prediction tools summary – zero values

Using bcftools to find unique alt homozygous sites

Hello, I have a vcf with 20 samples. I want to find for each sample the sites that are 1/1, only in that sample (so other samples must have genotypes 0/1 or 0/0). I know I can use filters such as GT=”aa”‘ However, how do I say GT=”aa” for sample…

Continue Reading Using bcftools to find unique alt homozygous sites

Assembly and phylogeographical analysis of novel Taenia solium mitochondrial genomes suggest stratification within the African-American genotype | Parasites & Vectors

Genome assembly and annotation Each genome was assembled following a reference-based strategy. To that end, the reads from each sample were trimmed by quality. Phred quality scores before trimming were greater than Q30 (Additional file 1: Table S1) for all samples. Quality-trimmed reads from the Peruvian and Mexican isolates were…

Continue Reading Assembly and phylogeographical analysis of novel Taenia solium mitochondrial genomes suggest stratification within the African-American genotype | Parasites & Vectors

Viruses | Free Full-Text | Optimizing the Conditions for Whole-Genome Sequencing of Avian Reoviruses

1. Introduction Avian orthoreoviruses (avian reoviruses, ARVs) are a persistent challenge to poultry producers in the United States and globally. Infection with ARVs has been associated with a variety of symptoms and syndromes in commercial poultry, including tenosynovitis/viral arthritis, enteric symptoms such as watery diarrhea, respiratory symptoms, myocarditis, viral hepatitis,…

Continue Reading Viruses | Free Full-Text | Optimizing the Conditions for Whole-Genome Sequencing of Avian Reoviruses

Data Import Issue detectRUNS R

I am running into an issue when importing data with detectRUNS in R. The following commands to import PLINK files have not been successful, and result in blank data frames. genotypeFilePath <- system.file(“extdata”, “genome.ped”,package=”detectRUNS”) mapFilePath <- system.file(“extdata”, “genome.map”, package=”detectRUNS”) head(genotypeFilePath) [1] “” The PLINK data are correctly formatted. OR I…

Continue Reading Data Import Issue detectRUNS R

Ubuntu Manpage: samtools phase – call and phase heterozygous SNPS

Provided by: samtools_1.10-3_amd64 NAME samtools phase – call and phase heterozygous SNPS SYNOPSIS samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] in.bam DESCRIPTION Call and phase heterozygous SNPs. OPTIONS -A Drop reads with ambiguous phase. -b STR Prefix of BAM output. When this option is in use,…

Continue Reading Ubuntu Manpage: samtools phase – call and phase heterozygous SNPS

Annovar doesnt output CADD scores

Hi, I followed the Annovar tutorial with the default dataset (avsnp147, ExAC and dbnsfp30a). The tutorial can be found here: annovar.openbioinformatics.org/en/latest/user-guide/startup/ The resulting vcf contained all the expected format and data, including CADD scores. Then, I decided to repeat this using gnomad211_exome,avsnp150, and dbnsfp42c datasets instead of those above, but…

Continue Reading Annovar doesnt output CADD scores

How to Add Mutations to the sequence

How to Add Mutations to the sequence 1 I have list of BDQ resistance mutations and I want to add those into the genome of MTB sequence or resistance gene. some are nucleotide mutations some are protein I am not really sure how to add them and make a BDQ…

Continue Reading How to Add Mutations to the sequence

Confusion about transcript ablation

I’m analyzing the WES data of a patient, after calling variants by GATK, I use Ensembl Variant Effect Predictor (VEP) to annotate my vcf file. Here is one record from the output file: #Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra chr11_64341844_GTTGTGGTCTGAGGTCTTGGGCCATCAGTGATGTCACAACCAGATGGCCCAAGACCCCAGACCACAACCCCATGTCTGGT/- chr11:64341844-64341923- ENSG00000278359…

Continue Reading Confusion about transcript ablation

Genome-wide analysis and characterization of the LRR-RLK gene family provides insights into anthracnose resistance in common bean

Identification of PvLRR-RLK genes From the kinome of P. vulgaris30, 1203 PKs were identified. Of these, only the proteins endowed with the transmembrane kinase and LRR domains were retained (Supplementary Table S1). All PvLRR-RLKs obtained were analyzed for redundancy following the criterion of maintaining the largest variants in the case…

Continue Reading Genome-wide analysis and characterization of the LRR-RLK gene family provides insights into anthracnose resistance in common bean

Metagenomes Assembles Genomes from cultivated freshwater bacterial communities

This dataset represents 122 Metagenomes-Assembled Genomes (MAGs) that were reconstructed from 20 individual microcosms in the context of understanding microbial community assembly processes. The cultivation media consisted in Artificial Lake Water (ALW) enriched with glucose and cellobiose (See details in Le Moigne et al., 2023, Ecology). The microcosms (200 mL)…

Continue Reading Metagenomes Assembles Genomes from cultivated freshwater bacterial communities

Sequencing 101: SBB sequencing – PacBio

Since the inception of next generation sequencing (NGS) more than a decade ago, short-read sequencing accuracy has seen only marginal improvement. Having achieved a level of precision thought to be “good enough” for most applications, much of NGS development has been focused on optimizing for cost and throughput. As a…

Continue Reading Sequencing 101: SBB sequencing – PacBio

Liftedover vcf header/contig compatibility

I have a collaborator that has lifted over their hg19 files to hg38 using Crossmap. The first step in the workflow they need to run is a simple bcftools filter for variant quality. They are getting an unknown file type error. Are there any obvious problems with this header that…

Continue Reading Liftedover vcf header/contig compatibility

Evolutionary histories of breast cancer and related clones

Data reporting No statistical methods were used to determine the sample size. The experiments were not randomized. Pathologists were blinded to the genetic alterations in each sample during histopathological evaluation. Participants and materials We enroled 207 female patients with breast cancer who underwent surgery at the Kyoto University Hospital and…

Continue Reading Evolutionary histories of breast cancer and related clones

Microorganisms | Free Full-Text | Whole-Genome Sequencing of Mycobacterium tuberculosis Isolates from Diabetic and Non-Diabetic Patients with Pulmonary Tuberculosis

1. Introduction Globally, about 10 million people are estimated to have developed tuberculosis (TB) in 2020 with eight countries accounting for two-thirds of the global total, with India reporting the largest proportion at 26% [1]. Type 2 diabetes mellitus (DM) is becoming a major public health problem globally, especially in…

Continue Reading Microorganisms | Free Full-Text | Whole-Genome Sequencing of Mycobacterium tuberculosis Isolates from Diabetic and Non-Diabetic Patients with Pulmonary Tuberculosis

Upcycling rice yield trial data using a weather-driven crop growth model

Phenotype data We obtained yield datasets for rice (Oryza sativa L.) from 207,331 trials with 8524 cultivars during the 38 years from 1980 to 2017. The data were obtained from field trials at 110 public agricultural experimental stations in Japan conducted by the Institute of Crop Science of the National…

Continue Reading Upcycling rice yield trial data using a weather-driven crop growth model

sequence analysis – Can files with different R1 and R2 lengths be trusted?

I received paired end amplicon sequence from a LAB with different lengths for R1 (320) and R2(280). Should I trust this lab to sequence other samples? Also, I had to do a trimming over the R2 at 220(because Phred score was below 30) in the qiime2/DADA2 pipeline. Do you think…

Continue Reading sequence analysis – Can files with different R1 and R2 lengths be trusted?

Ancient dolphin genomes reveal rapid repeated adaptation to coastal waters

Ethics We confirm our research complies with all relevant ethical regulations and was approved by the animal ethics committee of the School of Biology at the University of St Andrews on 26 July 2018 www.st-andrews.ac.uk/research/environment/committees/awerb/. The three new contemporary dolphin samples analysed in this study were collected under the relevant…

Continue Reading Ancient dolphin genomes reveal rapid repeated adaptation to coastal waters

Converting string to numerical in bcftools

Converting string to numerical in bcftools 0 Hi everyone, I am using bcftools to filter variants from a VCF file. The variants from this VCF file have been annotated using ANNOVAR. I would like to filter variants having a CADD score > 20 in a field named “CADD_phred” which has…

Continue Reading Converting string to numerical in bcftools

CIMB | Free Full-Text | A Metagenome from a Steam Vent in Los Azufres Geothermal Field Shows an Abundance of Thermoplasmatales archaea and Bacteria from the Phyla Actinomycetota and Pseumonadota

1. Introduction One characteristic of many geothermal fields is the presence of steam vents, i.e., fumaroles, that consist of permanent emissions of steam and gases from the subsoil due to magmatic activity or groundwater geothermal heating [1]. Consequently, the steam has temperatures above 70 °C, wet conditions, and a concentration…

Continue Reading CIMB | Free Full-Text | A Metagenome from a Steam Vent in Los Azufres Geothermal Field Shows an Abundance of Thermoplasmatales archaea and Bacteria from the Phyla Actinomycetota and Pseumonadota

SARS-CoV-2 Viral Sample Alignment and Variant Visualization

There is a growing need for undergraduate students to learn cutting-edge concepts in genomics data science, including performing analysis on the cloud instead of a personal computer. This lesson aims to introduce a mutant detection bioinformatics pipeline based on a publicly available genetic sample of SARS-CoV-2. Students will be introduced…

Continue Reading SARS-CoV-2 Viral Sample Alignment and Variant Visualization

FASTQ Phred33 average base quality score

FASTQ Phred33 average base quality score 2 I have a FASTQ dataset where I’m trying to find the average base quality score. I found this old link that helped somewhat (www.biostars.org/p/47751/). Here is my script (I’m trying to stick to awk, bioawk or python): bioawk -c fastx ‘{print “>”$name; print…

Continue Reading FASTQ Phred33 average base quality score

variant filtering

variant filtering 0 Hello How and with what scripts can I apply the following filters in a file that includes all variants of the genome? Please explain in detail i want remove Variants with phred-scaled scores below 20 and variants with genotypic qualities (GQ) of less than 20, SNPs within…

Continue Reading variant filtering

BBDuk Guide – DOE Joint Genome Institute

“Duk” stands to Decontamination Using Kmers. BBDuk was made to combine many common data-quality-related trimming, filtering, and masking actions into an single high-performance tool. It are capable of quality-trimming or filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer…

Continue Reading BBDuk Guide – DOE Joint Genome Institute

A pipeline for sample tagging of whole genome bisulfite sequencing data using genotypes of whole genome sequencing | BMC Genomics

Sample collection and WGS DNA samples were obtained from the CNSR-III [17], a nationwide prospective registry for patients presented to hospitals with acute ischaemic cerebrovascular events between August 2015 and March 2018 in China. Written informed consent was obtained from all patients or legally authorized representatives before entering the study….

Continue Reading A pipeline for sample tagging of whole genome bisulfite sequencing data using genotypes of whole genome sequencing | BMC Genomics

differential splicing between two groups for 5 genes

differential splicing between two groups for 5 genes 1 I have an experimental group where I observed the presence of a handful of genes (5 genes) that I was not expecting to see. To assess the validity, I look at the read alignments on IGV and the read phred scores…

Continue Reading differential splicing between two groups for 5 genes

All question mark quality scores on several studies

All question mark quality scores on several studies 0 I’ve stumbled upon several shotgun studies where all sample bases are ? (so 63 in phred+33). The first time I thought they might have been tampered with, but having just downloaded samples from 7 studies and 5 of them end up…

Continue Reading All question mark quality scores on several studies

Next-Generation Sequencing (NGS)- Definition, Types, Applications, Limitations

What is Next-Generation Sequencing (NGS)? Next-Generation Sequencing (NGS), also known as high-throughput sequencing, has revolutionized the field of genomics and molecular biology by allowing the sequencing of thousands to millions of DNA molecules simultaneously. It encompasses a range of different sequencing technologies, all aimed at producing large amounts of sequence…

Continue Reading Next-Generation Sequencing (NGS)- Definition, Types, Applications, Limitations

Reads with highest MAPQ values from SAM files are showing mismatches to reference sequence and IGV classified them as supplementary reads

Hi all, I am expressing a GFP synonymous variant library in human cells and sequencing its RNA on the nanopore and I am having some trouble analysing the data. Initially, I basecalled all the fast5 files using the super accuracy model in the guppy basecaller, then I discarded the reads…

Continue Reading Reads with highest MAPQ values from SAM files are showing mismatches to reference sequence and IGV classified them as supplementary reads

Whole-genome sequencing of Listeria monocytogenes isolated from the first listeriosis foodborne outbreak in South Korea

Introduction Although globalization has provided opportunities for consumers to enjoy a wide range of products and expanded global food trade, the complexity of the international food supply has contributed to an increase in foodborne outbreaks (Quested et al., 2010; Hussain and Dawson, 2013). Worldwide efforts have ensured food safety by…

Continue Reading Whole-genome sequencing of Listeria monocytogenes isolated from the first listeriosis foodborne outbreak in South Korea

convert fasta to fastq without quality score input file

Here’s another beginner BioPython question from me… I’m running some genome assemblies for someone who has some new Illumina sequence data and also had done some sequencing a few years ago. They have some Sanger and 454 sequences (a couple thousand sequences with a couple thousand base pairs for each)…

Continue Reading convert fasta to fastq without quality score input file

Babraham Bioinformatics – FastQC A Quality Control tool for High Throughput Sequence Data

FastQC Function AMPERE quality control tool for elevated throughput sequence data. Your Java What A match Java Runtime Ecology This Picard BAM/SAM Libraries (included in download) Code Maturation Robust. Mature code, but feedback exists comprehended. Code Released No, under GPL v3 or later. Initial Contact Simon Andrews Download Now Views…

Continue Reading Babraham Bioinformatics – FastQC A Quality Control tool for High Throughput Sequence Data

VEP/ CADD error – ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.

Dear Biostars, I am having a confusing issue with my CADD plugin. This is confusing because when I run VEP for my whole trio – all the plugins work fine. However when I try to run CADD for individual – pivoted files – it no longer does and I get…

Continue Reading VEP/ CADD error – ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.

How to trim reads for Chip Seq analysis

How to trim reads for Chip Seq analysis 1 Hi I am doing a Chip-seq analysis. How do you cut for base sequence content in the final part where some bases tend to go down and some bases up ??? on the galaxy platform. Is it necessary to continue the…

Continue Reading How to trim reads for Chip Seq analysis

PinAPL.py – – Antibody Capture and CRISPR Guide Capture Analysis -Software …

Enter a project name for your analyze runner. This name will help you identify insert final in case yours do manifold runs in a brawl. Provision of an email site exists optional, but desires rented you safely close the browser during the analysis and receive a notification following verwirklichung. Upload…

Continue Reading PinAPL.py – – Antibody Capture and CRISPR Guide Capture Analysis -Software …

Prior metabolite extraction fully preserves RNAseq quality and enables integrative multi-‘omics analysis of the liver metabolic response to viral infection

Introduction The metabolome is an incredibly diverse collection of small molecules (<1,500 Da) in biological systems involved in virtually every cellular process, including cellular energy production, macromolecule synthesis, epigenetic modifications, cell signalling and more (for recent reviews see [Citation1–6]). It responds rapidly (in seconds) to both internal (signalling, allostery) and external…

Continue Reading Prior metabolite extraction fully preserves RNAseq quality and enables integrative multi-‘omics analysis of the liver metabolic response to viral infection

phred encoding issue in public dataset

Hello,I’d like to use a public dataset from SRA, this is one of the runs. I’ll put here some sample data, the first two reads in R1: @ERR2204072.1 HWI-ST1450:172:C6H19ANXX:7:2315:16228:9537/1 ATTACCATCAGAATTGTACTGTTCTGTATCCCACCAGCAATGTCTAGGAATGCCTGTTTCTCCACAAAGTGTTTAC + %%$%%())))&)’))))))))))())()())))))))))()&&&)#)))))))))’)))))))))))()&&%&))) @ERR2204072.2 HWI-ST1450:172:C6H19ANXX:7:1104:8419:82653/1 GTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTTGTTAACTTGCCGTCAGCCTTTTCTTTG + %%&&&))))))))))))))))())))))))))))))()))))))))&)&))%))))))))(%)))))))))))))! A quick look would rule out phred64; but if those were actual phred33-encoded…

Continue Reading phred encoding issue in public dataset

What filters do I use on my variant calls (vcf.gz) file for imputation?

What filters do I use on my variant calls (vcf.gz) file for imputation? 0 Hi! After about 2 full days of research and reading so many papers, I am still super stuck on this question: What site filters do I need to use on my vcf file to prepare it…

Continue Reading What filters do I use on my variant calls (vcf.gz) file for imputation?

Tools to merge overlapping paired-end reads

Introduction In very simple terms, current sequencing technology begins by breaking up long pieces of DNA into lots more short pieces of DNA. The resultant set of DNA is called a “library” and the short pieces are called “fragments”. Each of the fragments in the library are then sequenced individually…

Continue Reading Tools to merge overlapping paired-end reads

Bioinformatics Analysis of Small RNA Sequencing

Small RNAs are important functional molecules in organisms, which have three main categories: microRNA (miRNA), small interfering RNA (siRNA), and piwi-interacting RNA (piRNA). They are less than 200 nt in length and are often not translated into proteins. Small RNA generally accomplishes RNA interference (RNAi) by forming the core of…

Continue Reading Bioinformatics Analysis of Small RNA Sequencing

MapSplice2 gives error if the thread count (-p value) is greater than 2

MapSplice2 gives error if the thread count (-p value) is greater than 2 1 Hello! I get a multi-threading error while using MapSplice2. All the reference fasta files and index files were generated accordingly, as mentioned in the website. (www.netlab.uky.edu/p/bioinfo/MapSplice2UserGuide) I get an error after it evaluates given files/parameters but…

Continue Reading MapSplice2 gives error if the thread count (-p value) is greater than 2

Ubuntu Manpage: samtools-phase – call and phase heterozygous SNPs

Provided by: samtools_1.16.1-1_amd64 NAME samtools-phase – call and phase heterozygous SNPs SYNOPSIS samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] in.bam DESCRIPTION Call and phase heterozygous SNPs. OPTIONS -A Drop reads with ambiguous phase. -b STR Prefix of BAM output. When this option is in use, phase-0…

Continue Reading Ubuntu Manpage: samtools-phase – call and phase heterozygous SNPs

wrong quality plots in fastqc output

wrong quality plots in fastqc output 1 Good morning, I simulated reads based on the reference genome using samtools wgsim wgsim -N 30000000 -1 151 -2 151 -r 0 -R 0 -X 0 -e 0 genome.fasta Sample_R1.fastq Sample_R2.fastq and obtained fastq files with such content: @DQ898156.1_36602_37076_0:0:0_0:0:0_0/1 CTGTAGTCTGGCACTGCAAAAACAGGATACAGGTGTATATATGATATATATATATGTGTGGACATGTTGTGTATAAAGAACGAAAAAATGCGGATATGGTCGAATGGTAAAATTTCTCTTTGCCAAGGAGAAGATGCGGGTTCGATTCCCG + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII @DQ898156.1_147753_148277_0:0:0_0:0:0_1/1…

Continue Reading wrong quality plots in fastqc output

Genetic association analysis of 77,539 genomes reveals rare disease etiologies

Motivation for developing a sparse RDB Computational approaches for discovering the etiologies of rare diseases typically depend on the analysis of a heterogeneous set of files, each of which can be very large and follow a distinct convention. Genotypes, for example, are ordinarily stored in VCFs containing data for one…

Continue Reading Genetic association analysis of 77,539 genomes reveals rare disease etiologies

Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?

Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding? 9 Is there a simple tool I can use to quickly find out if a FASTQ file is in Sanger or Phred64 encoding? Ideally something that tells me ‘Encoding XX’ somewhere the terminal output. fastq tools • 46k…

Continue Reading Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?

Genome- and transcriptome-wide splicing associations with alcohol use disorder

Samples RNA-seq We used the same publicly available data source of human post-mortem brain samples as Van Booven et al.7, which were collected from the New South Wales Brain Tissue Resource Center. Van Booven et al.7 also performed differential splicing, but they used different methods, included individuals from disparate ancestral…

Continue Reading Genome- and transcriptome-wide splicing associations with alcohol use disorder

To Q40 and Beyond: Sequencing’s Accuracy Revolution is Happening Now

NEW YORK – During beta testing for Element Biosciences’ new sequencer last year, one of the customers quickly ran into a problem when trying it out with 10x Genomics’ single-cell assays. 10x’s Cell Ranger software, used for single-cell sequencing data analysis, was aborting runs and spitting out error messages. The reason?…

Continue Reading To Q40 and Beyond: Sequencing’s Accuracy Revolution is Happening Now

Navigating the Bioinformatics Workflow for Whole Exome Sequencing: A Step-by-Step Guide

Next-generation sequencing (NGS), which makes millions to billions of sequence reads at a fast rate, has greatly sped up genomics research. At the moment, Illumina, Ion Torrent/Life Technologies, 454/Roche, Pacific Bioscience, Nanopore, and GenapSys are all NGS platforms that can be used. They can produce reads of 100–10,000 bp in…

Continue Reading Navigating the Bioinformatics Workflow for Whole Exome Sequencing: A Step-by-Step Guide

A heterophil/lymphocyte-selected population reveals the phosphatase PTPRJ is associated with immune defense in chickens

Ethics statement and animals All animals and experimental protocols used in this study were approved by the Beijing Institute of Animal Science, Chinese Academy of Agricultural Sciences (the scientific research department responsible for animal welfare issues) (No.: IASCAAS-AE20140615). In this study, experimental chickens (JXH) were selected on H/L, with the…

Continue Reading A heterophil/lymphocyte-selected population reveals the phosphatase PTPRJ is associated with immune defense in chickens

Illumina Novaseq 6000 base quality values

How does one interpret the quality score in the FASTQ (or BAM) results coming out from the Illumina Novaseq 6000 Sequencer and DRAGEN pipeline. Any ideas or pointers? Occur ASCII ASC-to-Num PHRED Q value? 82 * (42-33) or 9 Q10? Q0? 65 5 20 152 7 22 37377 : (58-33)…

Continue Reading Illumina Novaseq 6000 base quality values

how to seperate VEP INFO column into seperate columns

I have a vcf files like below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT treatmentSample chr1 857100 . C T 1756.06 PASS AC=2;AF=1;AN=2;DP=60;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=29.27;SOR=1.812;CSQ=chr1:857100|T|SNV|ENSG00000228794|ENST00000445118|LINC01128||1|MODIFIER|non_coding_transcript_exon_variant||||5/5|||||||||||||||||| GT:AD:DP:GQ:PL 1/1:0,60:60:99:1770,180,0 Does anyone know how to seperate INFO columns into different columns? And also how to separate treatmentSample column following the FORMAT ORDER? I…

Continue Reading how to seperate VEP INFO column into seperate columns

A 10-year microbiological study of Pseudomonas aeruginosa strains revealed the circulation of populations resistant to both carbapenems and quaternary ammonium compounds

P. aeruginosa bacterial strains Reference strains Four well-described and genome-available reference strains were used in the present study, ATCC27853 and ATCC15442, obtained from the American Type Culture Collection (ATCC), and PAO1 and PA14, from the collection of Institut Pasteur (Paris, France). Strain ATCC15442 is recommended for disinfectant susceptibility testing44, strain…

Continue Reading A 10-year microbiological study of Pseudomonas aeruginosa strains revealed the circulation of populations resistant to both carbapenems and quaternary ammonium compounds

AGBT Sessions Shine Spotlight on Long-Read Sequencing

This story includes reporting by Huanjia Zhang. NEW YORK – As the reigning Nature Methods method of the year, long-read sequencing featured prominently in many of the talks at this year’s Advances in Genome Biology and Technology meeting, held in Hollywood, Florida, last week. Pacific Biosciences presented new data from…

Continue Reading AGBT Sessions Shine Spotlight on Long-Read Sequencing

Issue with VCF format while using Pharmcat

Hello everybody, I am using pharmcat tool’s prerprocessor feature to preprocessmy vcf file using the command > python3 pharmcat_vcf_preprocessor.py -vcf sample.vcf But I think there is some issue with my vcf file as this command outputs an error > Reading samples from sample.vcf … Saving output to . > >…

Continue Reading Issue with VCF format while using Pharmcat

Wild deer (Pudu puda) from Chile harbor a novel ecotype of Anaplasma phagocytophilum | Parasites & Vectors

Rar V, Tkachev S, Tikunova N. Genetic diversity of Anaplasma bacteria: twenty years later. Infect Genet Evol. 2021;91:104833. Google Scholar  Atif FA. Alpha proteobacteria of genus Anaplasma (Rickettsiales: Anaplasmataceae): epidemiology and characteristics of Anaplasma species related to veterinary and public health importance. Parasitology. 2016;143:659–85. Google Scholar  Battilani M, de Arcangeli…

Continue Reading Wild deer (Pudu puda) from Chile harbor a novel ecotype of Anaplasma phagocytophilum | Parasites & Vectors

How to Calulate Allele Frequency from a VCF File?

I have a VCF file with 200 samples (mitochondrial genome of Plasmodium falciparum). Here is a pic to take a look at: And a few relevant lines from the actual file: ##INFO=<ID=AC,Number=A,Type=Integer,Description=”Allele count in genotypes, for each ALT allele, in the same order as listed”> ##INFO=<ID=AF,Number=A,Type=Float,Description=”Allele Frequency, for each ALT…

Continue Reading How to Calulate Allele Frequency from a VCF File?

Hypersaline Lake Urmia: a potential hotspot for microbial genomic variation

Physico-chemical features of Lake Urmia Sampling was performed during the period of lowest rainfall and input volume in the year when the lake water reached the highest salt concentration (locations shown in Fig. 1, Supplementary Table S1). The measured ionic composition of the collected brine showed the typical composition of halite-dominated…

Continue Reading Hypersaline Lake Urmia: a potential hotspot for microbial genomic variation

Pregap4 – Table of Contents

Organisation of the Pregap4 Manual Introduction Summary of the Files used and the Processing Steps Introduction to the Pregap4 User Interface Introduction to the Files to Process Window Introduction to the Configure Modules Window Introduction to the Textual Output Window Introduction to Running Pregap4 Pregap4 Menus Pregap4 File menu Pregap4…

Continue Reading Pregap4 – Table of Contents

Annotating with CADD, gnomad, Clinvar & dbNSFP on UKB RAP – Feature Requests

dint May 9, 2022, 1:33pm #1 i’m just wondering if you can specify cadd, gnomad, clinvar and dbNSFP options when annotating with hail on dxjupyterlab_spark_cluster o the UKB RAP? From the hail website, the following command can be used on your matrix file to annotate with these features: db =…

Continue Reading Annotating with CADD, gnomad, Clinvar & dbNSFP on UKB RAP – Feature Requests

Frontiers | Divergence With Gene Flow and Contrasting Population Size Blur the Species Boundary in Cycas Sect. Asiorientales, as Inferred From Morphology and RAD-Seq Data

Introduction Incipient species are critical for evolutionary biologists to study speciation, but they also challenge taxonomy due to gene flow or ancestral polymorphism. The former and contrasting population size lead to larger intraspecific than interspecific variations, a phenomenon called the species-definition anomaly zone (Jiao and Yang, 2021). The latter results…

Continue Reading Frontiers | Divergence With Gene Flow and Contrasting Population Size Blur the Species Boundary in Cycas Sect. Asiorientales, as Inferred From Morphology and RAD-Seq Data

Help me understand the Nanopore fastqc results

Help me understand the Nanopore fastqc results 2 Hi, I have got my first Nanopore sequencing data and the first step was to see if the data is good. Has anyone has any experience with this kind of data and can tell me how to interpret the results. The whole…

Continue Reading Help me understand the Nanopore fastqc results

(ERR): bowtie2-align exited with value 13

bowtie2 – (ERR): bowtie2-align exited with value 13 1 I am trying to run bowtie2. but following error are occuring everytime bowtie2 –very-fast-local -x bowtie -q -1 R1.fastq -2 R2.fastq -s aligned.sam Saw ASCII character 10 but expected 33-based Phred qual. terminate called after throwing an instance of ‘int’ Aborted…

Continue Reading (ERR): bowtie2-align exited with value 13

Should I trim adapter sequences and filter by phred score, before alignment by salmon? : bioinformatics

First, trimming adapters is definitely necessary as they are essentially a form of contamination. For quality trimming and filtering I would highly recommend reading the following: Trimming of sequence reads alters RNA-Seq gene expression estimates Essentially they show that aggressive trimming is a problem. To quote from the Conclusions: The…

Continue Reading Should I trim adapter sequences and filter by phred score, before alignment by salmon? : bioinformatics

Understanding signatures of positive natural selection in human zinc transporter genes

Datasets and populations We first compiled whole-genome sequencing data to analyze the patterns of variation in ZTGs on two geographical levels. Thus, we explored a worldwide dataset of 2,328 unrelated individuals representing 24 populations across Africa (AFR), Europe (EUR), East Asia (EAS), South Asia (SAS) and America (AMR), denoted as…

Continue Reading Understanding signatures of positive natural selection in human zinc transporter genes

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

The protocol presented here describes a complete pipeline to analyze RNA-sequencing transcriptome data from raw reads to functional analysis, including quality control and preprocessing steps to advanced statistical analytical approaches. Welcome to the protocol of high-throughput transcriptome analysis for investigating host-pathogen interactions. This protocol is divided in the following steps….

Continue Reading High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

Analyzing and slicing FASTQ file entries using Python

Analyzing and slicing FASTQ file entries using Python 1 I have the code pasted below for running on FASTQ file entries in order to compare specific parts and remove the redundancy of the same sequences (based on the miRNA + umi_seq combination). I save the entry IDs and then make…

Continue Reading Analyzing and slicing FASTQ file entries using Python

Vertical stratification of the air microbiome in the lower troposphere

Significance Large-scale meteorological and biological data demonstrate the vertical stratification of airborne biomass. The previously described diel cycle of airborne microorganisms is shown to disappear at height. Atmospheric turbulence and stratification are shown to be defining factors for the scale and boundaries, dynamics, and natural variability of airborne biomass, resulting…

Continue Reading Vertical stratification of the air microbiome in the lower troposphere

Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

I’ve annotated some variants using VEP, and was looking at the minor allele frequencies. Some of the variants had very different MAFs in the annotation than I expected (I expected MAF < 1%, whereas some annotated MAFs were >50%). I looked up the same variants on the gnomAD v3 browser,…

Continue Reading Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

SeqIO object get cleared away after being accessed

I’m using Biopython to parse a fastq file, and I found that the SeqIO object get cleared away once I accessed it. from Bio import SeqIO record_fastqIO = SeqIO.parse(‘SRR835775_1.first1000.fastq’,’fastq’) for record in record_fastqIO: print(record.id) This script works perfectly. But if I add one line to the script: from Bio import…

Continue Reading SeqIO object get cleared away after being accessed

Issue with fastq after converting phred 64 to phred 33 quality scores

Hello, I ran seqtk seq -VQ64 read1.fastq.gz > read1_phred33.fastq to convert my 64 based phred score reads to 33 based phred score phred reads. However when I attempted to run them through tophat alignment I got this error: Saw ASCII character 4 but expected 33-based Phred qual. terminate called after…

Continue Reading Issue with fastq after converting phred 64 to phred 33 quality scores

plotting roh from bcftools

plotting roh from bcftools 0 Heys, I am following this small tutorial on how to calculate ROHs from a vcf file using bcftools (samtools.github.io/bcftools/howtos/roh-calling.html) and I am getting this txt file: # This file was produced by: bcftools roh(1.10.2+htslib-1.10.2-3) # The command line was: bcftools roh -G30 –AF-dflt 0.4 my_file.vcf…

Continue Reading plotting roh from bcftools

How can I get PHRED score?

How can I get PHRED score? 1 Hi, all. I am trying to get the assembly stat(Table S1.) according to the following paper about de novo assembly. [www.ncbi.nlm.nih.gov/pmc/articles/PMC7266049/%5D%5B1] In the table, there is an item “Mean read PHRED score after filtering and trimming”. How can I get this? Is there…

Continue Reading How can I get PHRED score?

The sardine run in southeastern Africa is a mass migration into an ecological trap

INTRODUCTION Large-scale annual migrations occur in an extraordinary range of animals, from insects to the great whales. While the driving mechanisms of these migrations are varied and sometimes poorly understood, they often represent a way of optimizing conditions for breeding and adult fitness when these are in conflict. Often, populations…

Continue Reading The sardine run in southeastern Africa is a mass migration into an ecological trap

Trimmomatic error

Trimmomatic error 1 Hi everyone. I’m trying to trim some read data but i’m getting an error message. This is my input: trimmomatic PE -threads 24 -phred 33 /home/tbeckett/lustre/practice/output_data/ Filtered2S1_L3_R1.fastq.gz /home/tbeckett/lustre/practice/output_data/ Filtered2S1_L3_R2.fastq.gz /home/tbeckett/lustre/practice/output_data/trimmed/ TrimmedFiltered2S1_L3_R1_p.fastq /home/tbeckett/lustre/practice/output_data/trimmed/ TrimmedFiltered2S1_L3_R1_un.fastq /home/tbeckett/lustre/practice/output_data/trimmed/ TrimmedFiltered2S1_L3_R2_p.fastq /home/tbeckett/lustre/practice/output_data/trimmed/ TrimmedFiltered2S1_L3_R2_un.fastq ILLUMINACLIP:NexteraPE-PE.fa LEADING:20 TRAILING:20 MINLEN:60 This is the error i’m getting:…

Continue Reading Trimmomatic error

Illumina Q score

Illumina Q score 1 Hi all, I have Illumina sequencing results of a bacterial genome and a quality score of 35.89 is associated with these data. I know that a quality score of 30 is 99.99% of base calling accuracy based on this but what about the meaning of 35.89?…

Continue Reading Illumina Q score

Oncogene Concatenated Enriched Amplicon Nanopore Sequencing for rapid, accurate, and affordable somatic mutation detection | Genome Biology

Stochastic Amplicon Ligation. DNA samples for oncology sequencing are typically extracted from FFPE tissues and can have average lengths of less than 500 nt due to accumulated chemical damage [18]. We developed the Stochastic Amplicon Ligation (SAL) method to enzymatically concatenate many short DNA molecules together to utilize the long-read…

Continue Reading Oncogene Concatenated Enriched Amplicon Nanopore Sequencing for rapid, accurate, and affordable somatic mutation detection | Genome Biology

Rsubread align maximum nthreads

Hi Experts, I am using Rsubread align using following comand- align (index=”my_index”, readfile1 = “SRR123456_1.fastq” ,readfile2= “SRR123456_2.fastq”, type=”rna”,input_format = “FASTQ”, minFragLength=35,maxFragLength=151,useAnnotation=”TRUE”, nthreads=64, annot.ext = “my_annotation.gtf.gz”, isGTF = “TRUE”, sortReadsByCoordinates = “TRUE”, output_format = “BAM”) here i have asigned 64 threads but in console, i see only 40 threads, I dont…

Continue Reading Rsubread align maximum nthreads

Output of samtools view, what does the third column actually represent?

The samtools view outputs information from SAM and BAM files in SAM format. You can find a description of the SAM format here: samtools.github.io/hts-specs/SAMv1.pdf Section 1.4 deals with the meaning of each of the manditory coloumns. It includes the following table: Col Field Type Regexp/Range Brief description |—|——|——-|—————————-|—————————————-| 1 QNAME…

Continue Reading Output of samtools view, what does the third column actually represent?

Convert a VCF-file in a user specific Format

Convert a VCF-file in a user specific Format 0 Hello everyone, I am curious if it is possible to convert a VCF-File (with multiple samples) in a Format whith 5 columns. Column should be Sample ID Column: Position on the chromosome Genotyp Number of reads covering site QUAL phred-scaled quality…

Continue Reading Convert a VCF-file in a user specific Format