Tag: BBDuk

Java class error message when using BBDuk

Java class error message when using BBDuk 0 I am trying to run BBDuk to quality trim and filter my illumina whole genome sequences. I have used other trimming scripts before and have not had a problem. Although this is my first time preprocessing sequencing data from Quantseq samples. I…

Continue Reading Java class error message when using BBDuk

Yes .. BBMap can do that!

NOTE: This collection was originally posted at SeqAnswers.com. Creating a copy here to preserve the information.Part I is available here: Yes .. BBMap can do that! – Part I : bbmap (aligner), bbduk (scan/trim), repair (fix PE reads) and reformat (format conversions)Part II is available here: Yes .. BBMap can…

Continue Reading Yes .. BBMap can do that!

Using metagenome assembly and binning to identify and mitigate contamination in a genome

Hi everyone, This may be a silly question, but I am interested if using metagenome assembly and binning is a valid method of determining if a sample contains a mixture of species. Similarly, can metagenomics be used to identify and remove contamination from a single genome? For some background, I…

Continue Reading Using metagenome assembly and binning to identify and mitigate contamination in a genome

Problematic fastq files…How can we trust them?

Problematic fastq files…How can we trust them? 1 Hello fellas, A week ago I made another post regarding an error I was getting while I was trying to run BBDuk on a number of fastq files. In that case, there were lines that miss the “+” char. After looking a…

Continue Reading Problematic fastq files…How can we trust them?

Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk

Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk 0 Hi all, After performing adapter trimming with bbduk.sh, I found that the total number of bases in the read1 file is different compared with the read2 file from FastQC quality check. Below was the code…

Continue Reading Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk

BBDuk error: with these 4 lines….

Hello there, I am trying to trim some samples with BBDuk and I am getting this error: Error in /home/projects/Raw_data/sample1_L001_R2_001.fastq.gz , line 120314999, with these 4 lines: + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFTTACTGAGCAAATAAATAGACTCTATATTGTCTCCG ATGGCATAAAAATGTGTTTGTGGAAAAGCAATCCTTAAATTGAGAAAACGTTTTATATTAGGGCCAATGATAGGATAAGCAAGTAATACATCTGTAGCA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFTTACTGAGCAAATAAATAGACTCTATATTGTCTCCG ATGGCATAAAAATGTGTTTGTGGAAAAGCAATCCTTAAATTGAGAAAACGTTTTATATTAGGGCCAATGATAGGATAAGCAAGTAATACATCTGTAGCA at stream.FASTQ.quadToRead_slow(FASTQ.java:708) at stream.FASTQ.toReadList(FASTQ.java:659) at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107) at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:93) at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:681) at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:657) Processing time: 1085.136 seconds. Input:…

Continue Reading BBDuk error: with these 4 lines….

Bulk RNAseq Salmon index building which transcriptome to use

Bulk RNAseq Salmon index building which transcriptome to use 0 Hi all, I am new to the platform. I was wondering what the common/best practice is regarding building a Salmon index for bulk RNAseq analysis of human cells. The tutorial for Salmon/Alevin is using the complete transcriptome from GENCODE (gencode.vM23.transcripts.fa.gz,…

Continue Reading Bulk RNAseq Salmon index building which transcriptome to use

how to deal with Gs?

Small RNA sequencing using Illumina 2 channel SBS: how to deal with Gs? 1 I’m working on a small RNA sequencing experiment (150 PE on NovaSeq 6000), and many reads look like this when the fragment size is smaller than 150 bp, with Gs completing the sequence up to 150:…

Continue Reading how to deal with Gs?

Viral positive and negative strand with paired sequencing and bowtie

You should expect the same number of forward and reverse strand reads because read 1 and read 2 are on opposite strands. What would be more interesting here is to first split the mapped file into r1 and r2, then split THOSE files into forward and reverse, then combine R1…

Continue Reading Viral positive and negative strand with paired sequencing and bowtie

How to trim bases with bbduk.sh

How to trim bases with bbduk.sh 1 Hi, I am using BBMAP following command to trim adapter sequences from the fastq files. bbduk.sh -Xmx1g \ in1=1_R1_001.fastq.gz \ in2=1_R2_001.fastq.gz \ out1=1_R1_001-trimmed.fastq.gz \ out2=1_R2_001-trimmed.fastq.gz \ literal=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \ qtrim=rl \ trimq=20 \ ktrim=r \ k=16 \ filterpolyg=5 \ tbo tpe Can anyone let…

Continue Reading How to trim bases with bbduk.sh

MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data

Real metaHi-C datasets In this study, we leveraged several publicly available metagenomic Hi-C datasets, consisting of two short-read metaHi-C datasets and two long-read metaHi-C datasets. The specific sizes of raw datasets were shown in Supplementary Table 6. Two short-read metaHi-C datasets were generated from different microbial ecosystems, including human gut (BioProject:…

Continue Reading MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data

Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning [PeerJ]

Introduction The rapid proliferation of high-throughput sequencing in metagenomics, combined with the advancement of scalable computational tools, has allowed scientists to digitally isolate tens of thousands of microbial genomes from large collections of metagenomic datasets (Nayfach et al., 2021). Although these genomes are only the tip of the iceberg of…

Continue Reading Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning [PeerJ]

How to install bbmap on Ubuntu 20.04 (Focal Fossa)?

Quick installation of bbmap Architecture: all Version: 38.79+dfsg-1: Step 1: Update system: sudo apt-get update Step 2: Install: bbmap Architecture: all Version: 38.79+dfsg-1 Ater updaing the OS run following command to install the packae: sudo apt-get install bbmap Architecture: all Version: 38.79+dfsg-1 Package Details Package: bbmap Architecture: all Version: 38.79+dfsg-1 Version: 38.79+dfsg-1 Maintainer: Ubuntu Developers Home page: https://sourceforge.net/projects/bbmap/…

Continue Reading How to install bbmap on Ubuntu 20.04 (Focal Fossa)?

PolyA and PolyG sequences in FastQC/MultiQC report

Hello, We sent our samples off for RNA exome sequencing. We normally do mRNA sequencing but our RNA was degraded so RNA exome sequencing was recommended. We used the TruSeq RNA Exome kit. I am encountering issues I haven’t experienced before when trimming reads. Below is the adapter content from…

Continue Reading PolyA and PolyG sequences in FastQC/MultiQC report

Transient naive reprogramming corrects hiPS cells functionally and epigenetically

Cell culture All cell lines used and derived by different approaches in this study are listed in Supplementary Table 1. Detailed information about the experimental design, materials and reagents is presented in the Reporting Summary. Primary human adult dermal fibroblasts (HDFa) from three different female donors were obtained from Gibco…

Continue Reading Transient naive reprogramming corrects hiPS cells functionally and epigenetically

Decreased left heart flow in fetal lambs causes left heart hypoplasia and pro-fibrotic tissue remodeling

Coil implantation in fetal lambs We have complied with all relevant ethical regulations for animal testing. All procedures followed the Canadian Council on Animal Care guidelines and were approved by the University of Western Ontario Council on Animal Care (protocol 2010-257). Time-dated pregnant Dorset × Rideau Arcott ewes (gestational age 76 days,…

Continue Reading Decreased left heart flow in fetal lambs causes left heart hypoplasia and pro-fibrotic tissue remodeling

barcodes not show up in overrepresented sequences in FASTQC

barcodes not show up in overrepresented sequences in FASTQC 1 So, I have a Ribo-seq experiment with multiple samples and for each sample there are two barcodes, supplied by the sequencing lab, like this one: TACTCATA+GCCACAGG I just ran the FASTQC on that bugger and it didn’t pop up as…

Continue Reading barcodes not show up in overrepresented sequences in FASTQC

ResR/McdR-regulated protein translation machinery contributes to drug resilience in Mycobacterium tuberculosis

Bacterial strains and culture conditions Escherichia coli strain DH5α (Thermo Fisher) was used for the propagation of plasmids, whereas E. coli BL21 DE3 (Novagen) was used for the expression and purification of ResR/McdR protein. Mtb Erdman was obtained from Dr. Ramandeep Singh at THSTI, India, and Mtb H37Rv mc2 790242…

Continue Reading ResR/McdR-regulated protein translation machinery contributes to drug resilience in Mycobacterium tuberculosis

BBDuk Guide – DOE Joint Genome Institute

“Duk” stands to Decontamination Using Kmers. BBDuk was made to combine many common data-quality-related trimming, filtering, and masking actions into an single high-performance tool. It are capable of quality-trimming or filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer…

Continue Reading BBDuk Guide – DOE Joint Genome Institute

Genomic screening of 16 UK native bat species through conservationist networks uncovers coronaviruses with zoonotic potential

Sample collection Sampling kits were sent out to various bat rehabilitators in the UK, as described previously56, for the collection of faeces from bats. These faecal samples (0.02–1 g) were immediately stored in 5 ml of RNAlater solution to prevent degradation of RNA. The geographical locations and collection dates for all samples…

Continue Reading Genomic screening of 16 UK native bat species through conservationist networks uncovers coronaviruses with zoonotic potential

RCAC – Knowledge Base: Biocontainers: bbtools

bbtools Link to section ‘Introduction’ of ‘bbtools’ Introduction BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. Docker hub: https://hub.docker.com/r/staphb/bbtoolsHome page: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/ Link to section ‘Versions’ of ‘bbtools’ Versions 39.00 Link to section ‘Commands’ of ‘bbtools’ Commands Xcalcmem.sh a_sample_mt.sh addadapters.sh addssu.sh…

Continue Reading RCAC – Knowledge Base: Biocontainers: bbtools

bbduk.sh trimmer with multiple input files

bbduk.sh trimmer with multiple input files 0 Hi all, I am using bbduk.sh and was wondering if there’s an efficient way to process multiple sets of reads with it? E.g. if you have read 1 as 4 separate files and read 2 as 4 separate files, typical mappers like bowtie2…

Continue Reading bbduk.sh trimmer with multiple input files

Genome sequencing and multifaceted taxonomic analysis of novel strains of violacein-producing bacteria and non-violacein-producing close relatives

Abstract Violacein is a water-insoluble violet pigment produced by various Gram-negative bacteria. The compound and the bacteria that produce it have been gaining attention due to the antimicrobial and proposed antitumour properties of violacein and the possibility that strains producing it may have broad industrial uses. Bacteria that produce violacein…

Continue Reading Genome sequencing and multifaceted taxonomic analysis of novel strains of violacein-producing bacteria and non-violacein-producing close relatives

Answer: Help with understanding BBduk's behavior

> My input “reference” kmer has a “C” at the 9th position, while the > reported matching kmers both have a “T” at that position. `maskmiddle` option is true by default as a result you see that base being “matched”. maskmiddle=t (mm) Treat the middle base of a kmer as…

Continue Reading Answer: Help with understanding BBduk's behavior

Help with understanding BBduk's behavior

Hi All, I’m trying to use BBduk to find and filter exact 18-mer matches in a contaminant fasta file from a set of input reads. BBduk reports that matches exist, however when I examine the reads that are supposed to include one or more kmers, I cannot find the matching…

Continue Reading Help with understanding BBduk's behavior

BBduk reading fastq from S3 directly

BBduk reading fastq from S3 directly – Is it possibile? 0 Hello to all, I am not from Bioinf field but there is no issue for me running bbduk trimming command 🙂 I was wondering is it possibile to load paired fastq reads directly form S3 bucket? Even better, load…

Continue Reading BBduk reading fastq from S3 directly

bbduk command line options for EM-seq (NEB)

Hi I would like to use bbduk to replicate the command line flags used by trim_galore for EM-Seq (NEB). The bismark user guide recommends this for trim_galore –clip_R1 10 –clip_R2 10 –three_prime_clip_R1 10 –three_prime_clip_R2 10 Explanation of the trim_galore parameters –clip_R1 <int> Instructs Trim Galore to remove <int> bp from…

Continue Reading bbduk command line options for EM-seq (NEB)

BBduk log and stats appear to be inconsistent

BBduk log and stats appear to be inconsistent 0 Hi All, I’m using BBduk to filter out reads where there is a kmer match to a specific set of contaminant sequences. Here is an example command: bbduk.sh \ in1=${FQ1} \ in2=${FQ2} \ ref=${contaminant_fasta} \ k=21 \ hdist=1 \ stats=${OUTPUT}/${SAMPLE}_stats.txt \…

Continue Reading BBduk log and stats appear to be inconsistent

SituSeq: an offline protocol for rapid and remote Nanopore 16S rRNA amplicon sequence analysis

For the initial test of this protocol, DNA extraction, PCR, Nanopore library preparation, Nanopore sequencing, and subsequent data analysis were conducted at sea aboard the R/V Atlantic Condor in August 2021 [26]. This investigational effort resulted in the sequencing and analysis of deep sea sediment samples within hours of their…

Continue Reading SituSeq: an offline protocol for rapid and remote Nanopore 16S rRNA amplicon sequence analysis

error bbduk

error bbduk 0 Can anyone help me, I try to run the command and it gives this error, how can I adjust? java -ea -Xmx-78m -Xms-78m -cp /home/qiime2/Documents/bbmap/current/ jgi.BBDuk in= /Home/Desktop/Documents/limpeza/S2500_1.fastq out= /Home/Desktop/Documents/limpeza/S2500_1.fastq.clean.fastq trimq=20 minlen=75 Invalid maximum heap size: -Xmx-78m Error: Could not create the Java Virtual Machine. Error: A…

Continue Reading error bbduk

ATAC-seq fragment size distribution – huge spike at 150 bp

HI, I am working on some ATAC-seq data. We have performed paired-end sequencing with read length of 150 bp on a total of 24 samples (8 conditions in triplicates) To begin with, I will just briefly describe some of the main analysis steps. I have trimmed the read to remove…

Continue Reading ATAC-seq fragment size distribution – huge spike at 150 bp

How To Install bbmap on Ubuntu 20.04

In this tutorial we learn how to install bbmap on Ubuntu 20.04. bbmap is short read aligner and other bioinformatic tools b5833726e45421cf74f3885c83040a6f Introduction In this tutorial we learn how to install bbmap on Ubuntu 20.04. What is bbmap bbmap is: BBMap: Short read aligner for DNA and RNA-seq data. Capable…

Continue Reading How To Install bbmap on Ubuntu 20.04

rRNA decontamination of RNA-Seq reads (tool choice, introns etc)

Greetings! I need some help with performing my rRNA decontamination step properly, which is part of pre-processing pipeline for my Illumina RNA-Seq reads, before mapping to the reference genome (a plant species). SOME RELEVANT LINKS AND MY CRUDE INFERENCES: Download Rrna Sequences – SILVA database is useful for human research,…

Continue Reading rRNA decontamination of RNA-Seq reads (tool choice, introns etc)

Determinants of associations between codon and amino acid usage patterns of microbial communities and the environment inferred based on a cross-biome metagenomic analysis

Data collection Metagenomic project information was collected from the MGnify metagenomic database31. Currently (September 2021), microbiome data (sequence, taxonomic, and functional information, etc.) of 325,323 environmental samples can be found in this database. Often, microbes from similar ecological communities have been studied by different groups at different times and locations….

Continue Reading Determinants of associations between codon and amino acid usage patterns of microbial communities and the environment inferred based on a cross-biome metagenomic analysis

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Sample preparation We ordered the GIAB samples from the Coriell Institute (NA24385, NIST ID HG002; NA24149, NIST-ID HG003 and NA24143, NIST-ID HG004). DNA concentration was measured by Qubit. The library was constructed according to Illumina TruSeq DNA PCR Free Library Prep protocol HT (Illumina Inc., San Diego, CA, USA) for…

Continue Reading Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

BBmap bbduk.sh for filtering reads

I’m looking to filter reads that contain a stretch of A’s, I found these posts looking for polyA tails, meaning this should work all the same (Identify RNA-seq reads containing polyA sequence, Identifying RNA-seq reads containing polyA stretch). However, I cannot get it to work. Given just these two reads,…

Continue Reading BBmap bbduk.sh for filtering reads

Should I trim adapter sequences and filter by phred score, before alignment by salmon? : bioinformatics

First, trimming adapters is definitely necessary as they are essentially a form of contamination. For quality trimming and filtering I would highly recommend reading the following: Trimming of sequence reads alters RNA-Seq gene expression estimates Essentially they show that aggressive trimming is a problem. To quote from the Conclusions: The…

Continue Reading Should I trim adapter sequences and filter by phred score, before alignment by salmon? : bioinformatics

nf-core/circrna

circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data Introduction nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across…

Continue Reading nf-core/circrna

Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings

Although the hypothesis of gene-regulatory network (GRN) cooption is a plausible model to explain the origin of morphological novelties (1), there has been limited empirical evidence to show that this mechanism led to the origin of any novel trait. Several hypotheses have been proposed for the origin of butterfly eyespots,…

Continue Reading Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings

Trying to trim last bp of several samples with BBduk at once

Trying to trim last bp of several samples with BBduk at once 1 Hi, I am trying to use BBduk to trim back my 151bp sequences to 150bp. I tried to create a loop for this so I could do one entire pool at the time, but I do not…

Continue Reading Trying to trim last bp of several samples with BBduk at once

bbduk can’t read file

bbduk can’t read file 0 Hi all, When trying to filter reads using bbduk, I get the following error message: maskMiddle was disabled because useShortKmers=true Exception in thread “main” java.lang.RuntimeException: Can’t read file ‘/home/bioinf/TrainingData/SRR6197336/SRR6197336_1.fastq’ at shared.Tools.testInputFiles(Tools.java:185) at jgi.BBDuk.<init>(BBDuk.java:912) at jgi.BBDuk.main(BBDuk.java:78) This is my code: ~/Downloads/bbmap/bbduk.sh in1=~/TrainingData/SRR6197336/SRR6197336_1.fastq in2=~/TrainingData/SRR6197336/SRR6197336_2.fastq out1=~/TrainingData/reads/bbduk/SRR6197336_1_bbduk.fastq out2=~/TrainingData/reads/bbduk/SRR6197336_2_bbduk.fastq ktrim=r…

Continue Reading bbduk can’t read file

Primers trimming from Illumina paired-end reads by BBDuk software

Primers trimming from Illumina paired-end reads by BBDuk software 0 Hi, I’m dealing with Illumina amplicon data. I’d like to apply BBDuk to remove primers from the 5’end of my forward and reverse reads, but have a couple of questions. Could you help me please? How may i indicate in…

Continue Reading Primers trimming from Illumina paired-end reads by BBDuk software

ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat

    Significance To date, the potential of utilizing root traits in plant breeding remains largely untapped. In this study, we cloned and characterized the ENHANCED GRAVITROPISM2 (EGT2) gene of barley that encodes a STERILE ALPHA MOTIF domain–containing protein. We demonstrated that EGT2 is a key gene of root growth…

Continue Reading ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat