Categories
Tag: BBDuk
Java class error message when using BBDuk
Java class error message when using BBDuk 0 I am trying to run BBDuk to quality trim and filter my illumina whole genome sequences. I have used other trimming scripts before and have not had a problem. Although this is my first time preprocessing sequencing data from Quantseq samples. I…
Yes .. BBMap can do that!
NOTE: This collection was originally posted at SeqAnswers.com. Creating a copy here to preserve the information.Part I is available here: Yes .. BBMap can do that! – Part I : bbmap (aligner), bbduk (scan/trim), repair (fix PE reads) and reformat (format conversions)Part II is available here: Yes .. BBMap can…
Using metagenome assembly and binning to identify and mitigate contamination in a genome
Hi everyone, This may be a silly question, but I am interested if using metagenome assembly and binning is a valid method of determining if a sample contains a mixture of species. Similarly, can metagenomics be used to identify and remove contamination from a single genome? For some background, I…
Problematic fastq files…How can we trust them?
Problematic fastq files…How can we trust them? 1 Hello fellas, A week ago I made another post regarding an error I was getting while I was trying to run BBDuk on a number of fastq files. In that case, there were lines that miss the “+” char. After looking a…
Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk
Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk 0 Hi all, After performing adapter trimming with bbduk.sh, I found that the total number of bases in the read1 file is different compared with the read2 file from FastQC quality check. Below was the code…
BBDuk error: with these 4 lines….
Hello there, I am trying to trim some samples with BBDuk and I am getting this error: Error in /home/projects/Raw_data/sample1_L001_R2_001.fastq.gz , line 120314999, with these 4 lines: + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFTTACTGAGCAAATAAATAGACTCTATATTGTCTCCG ATGGCATAAAAATGTGTTTGTGGAAAAGCAATCCTTAAATTGAGAAAACGTTTTATATTAGGGCCAATGATAGGATAAGCAAGTAATACATCTGTAGCA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFTTACTGAGCAAATAAATAGACTCTATATTGTCTCCG ATGGCATAAAAATGTGTTTGTGGAAAAGCAATCCTTAAATTGAGAAAACGTTTTATATTAGGGCCAATGATAGGATAAGCAAGTAATACATCTGTAGCA at stream.FASTQ.quadToRead_slow(FASTQ.java:708) at stream.FASTQ.toReadList(FASTQ.java:659) at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107) at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:93) at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:681) at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:657) Processing time: 1085.136 seconds. Input:…
Bulk RNAseq Salmon index building which transcriptome to use
Bulk RNAseq Salmon index building which transcriptome to use 0 Hi all, I am new to the platform. I was wondering what the common/best practice is regarding building a Salmon index for bulk RNAseq analysis of human cells. The tutorial for Salmon/Alevin is using the complete transcriptome from GENCODE (gencode.vM23.transcripts.fa.gz,…
how to deal with Gs?
Small RNA sequencing using Illumina 2 channel SBS: how to deal with Gs? 1 I’m working on a small RNA sequencing experiment (150 PE on NovaSeq 6000), and many reads look like this when the fragment size is smaller than 150 bp, with Gs completing the sequence up to 150:…
Viral positive and negative strand with paired sequencing and bowtie
You should expect the same number of forward and reverse strand reads because read 1 and read 2 are on opposite strands. What would be more interesting here is to first split the mapped file into r1 and r2, then split THOSE files into forward and reverse, then combine R1…
How to trim bases with bbduk.sh
How to trim bases with bbduk.sh 1 Hi, I am using BBMAP following command to trim adapter sequences from the fastq files. bbduk.sh -Xmx1g \ in1=1_R1_001.fastq.gz \ in2=1_R2_001.fastq.gz \ out1=1_R1_001-trimmed.fastq.gz \ out2=1_R2_001-trimmed.fastq.gz \ literal=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \ qtrim=rl \ trimq=20 \ ktrim=r \ k=16 \ filterpolyg=5 \ tbo tpe Can anyone let…
MetaCC allows scalable and integrative analyses of both long-read and short-read metagenomic Hi-C data
Real metaHi-C datasets In this study, we leveraged several publicly available metagenomic Hi-C datasets, consisting of two short-read metaHi-C datasets and two long-read metaHi-C datasets. The specific sizes of raw datasets were shown in Supplementary Table 6. Two short-read metaHi-C datasets were generated from different microbial ecosystems, including human gut (BioProject:…
Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning [PeerJ]
Introduction The rapid proliferation of high-throughput sequencing in metagenomics, combined with the advancement of scalable computational tools, has allowed scientists to digitally isolate tens of thousands of microbial genomes from large collections of metagenomic datasets (Nayfach et al., 2021). Although these genomes are only the tip of the iceberg of…
How to install bbmap on Ubuntu 20.04 (Focal Fossa)?
Quick installation of bbmap Architecture: all Version: 38.79+dfsg-1: Step 1: Update system: sudo apt-get update Step 2: Install: bbmap Architecture: all Version: 38.79+dfsg-1 Ater updaing the OS run following command to install the packae: sudo apt-get install bbmap Architecture: all Version: 38.79+dfsg-1 Package Details Package: bbmap Architecture: all Version: 38.79+dfsg-1 Version: 38.79+dfsg-1 Maintainer: Ubuntu Developers Home page: sourceforge.net/projects/bbmap/…
PolyA and PolyG sequences in FastQC/MultiQC report
Hello, We sent our samples off for RNA exome sequencing. We normally do mRNA sequencing but our RNA was degraded so RNA exome sequencing was recommended. We used the TruSeq RNA Exome kit. I am encountering issues I haven’t experienced before when trimming reads. Below is the adapter content from…
Transient naive reprogramming corrects hiPS cells functionally and epigenetically
Cell culture All cell lines used and derived by different approaches in this study are listed in Supplementary Table 1. Detailed information about the experimental design, materials and reagents is presented in the Reporting Summary. Primary human adult dermal fibroblasts (HDFa) from three different female donors were obtained from Gibco…
Decreased left heart flow in fetal lambs causes left heart hypoplasia and pro-fibrotic tissue remodeling
Coil implantation in fetal lambs We have complied with all relevant ethical regulations for animal testing. All procedures followed the Canadian Council on Animal Care guidelines and were approved by the University of Western Ontario Council on Animal Care (protocol 2010-257). Time-dated pregnant Dorset × Rideau Arcott ewes (gestational age 76 days,…
barcodes not show up in overrepresented sequences in FASTQC
barcodes not show up in overrepresented sequences in FASTQC 1 So, I have a Ribo-seq experiment with multiple samples and for each sample there are two barcodes, supplied by the sequencing lab, like this one: TACTCATA+GCCACAGG I just ran the FASTQC on that bugger and it didn’t pop up as…
ResR/McdR-regulated protein translation machinery contributes to drug resilience in Mycobacterium tuberculosis
Bacterial strains and culture conditions Escherichia coli strain DH5α (Thermo Fisher) was used for the propagation of plasmids, whereas E. coli BL21 DE3 (Novagen) was used for the expression and purification of ResR/McdR protein. Mtb Erdman was obtained from Dr. Ramandeep Singh at THSTI, India, and Mtb H37Rv mc2 790242…
BBDuk Guide – DOE Joint Genome Institute
“Duk” stands to Decontamination Using Kmers. BBDuk was made to combine many common data-quality-related trimming, filtering, and masking actions into an single high-performance tool. It are capable of quality-trimming or filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer…
Genomic screening of 16 UK native bat species through conservationist networks uncovers coronaviruses with zoonotic potential
Sample collection Sampling kits were sent out to various bat rehabilitators in the UK, as described previously56, for the collection of faeces from bats. These faecal samples (0.02–1 g) were immediately stored in 5 ml of RNAlater solution to prevent degradation of RNA. The geographical locations and collection dates for all samples…
RCAC – Knowledge Base: Biocontainers: bbtools
bbtools Link to section ‘Introduction’ of ‘bbtools’ Introduction BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. Docker hub: hub.docker.com/r/staphb/bbtoolsHome page: jgi.doe.gov/data-and-tools/software-tools/bbtools/ Link to section ‘Versions’ of ‘bbtools’ Versions 39.00 Link to section ‘Commands’ of ‘bbtools’ Commands Xcalcmem.sh a_sample_mt.sh addadapters.sh addssu.sh…
bbduk.sh trimmer with multiple input files
bbduk.sh trimmer with multiple input files 0 Hi all, I am using bbduk.sh and was wondering if there’s an efficient way to process multiple sets of reads with it? E.g. if you have read 1 as 4 separate files and read 2 as 4 separate files, typical mappers like bowtie2…
Genome sequencing and multifaceted taxonomic analysis of novel strains of violacein-producing bacteria and non-violacein-producing close relatives
Abstract Violacein is a water-insoluble violet pigment produced by various Gram-negative bacteria. The compound and the bacteria that produce it have been gaining attention due to the antimicrobial and proposed antitumour properties of violacein and the possibility that strains producing it may have broad industrial uses. Bacteria that produce violacein…
Answer: Help with understanding BBduk's behavior
> My input “reference” kmer has a “C” at the 9th position, while the > reported matching kmers both have a “T” at that position. `maskmiddle` option is true by default as a result you see that base being “matched”. maskmiddle=t (mm) Treat the middle base of a kmer as…
Help with understanding BBduk's behavior
Hi All, I’m trying to use BBduk to find and filter exact 18-mer matches in a contaminant fasta file from a set of input reads. BBduk reports that matches exist, however when I examine the reads that are supposed to include one or more kmers, I cannot find the matching…
BBduk reading fastq from S3 directly
BBduk reading fastq from S3 directly – Is it possibile? 0 Hello to all, I am not from Bioinf field but there is no issue for me running bbduk trimming command 🙂 I was wondering is it possibile to load paired fastq reads directly form S3 bucket? Even better, load…
bbduk command line options for EM-seq (NEB)
Hi I would like to use bbduk to replicate the command line flags used by trim_galore for EM-Seq (NEB). The bismark user guide recommends this for trim_galore –clip_R1 10 –clip_R2 10 –three_prime_clip_R1 10 –three_prime_clip_R2 10 Explanation of the trim_galore parameters –clip_R1 <int> Instructs Trim Galore to remove <int> bp from…
BBduk log and stats appear to be inconsistent
BBduk log and stats appear to be inconsistent 0 Hi All, I’m using BBduk to filter out reads where there is a kmer match to a specific set of contaminant sequences. Here is an example command: bbduk.sh \ in1=${FQ1} \ in2=${FQ2} \ ref=${contaminant_fasta} \ k=21 \ hdist=1 \ stats=${OUTPUT}/${SAMPLE}_stats.txt \…
SituSeq: an offline protocol for rapid and remote Nanopore 16S rRNA amplicon sequence analysis
For the initial test of this protocol, DNA extraction, PCR, Nanopore library preparation, Nanopore sequencing, and subsequent data analysis were conducted at sea aboard the R/V Atlantic Condor in August 2021 [26]. This investigational effort resulted in the sequencing and analysis of deep sea sediment samples within hours of their…
error bbduk
error bbduk 0 Can anyone help me, I try to run the command and it gives this error, how can I adjust? java -ea -Xmx-78m -Xms-78m -cp /home/qiime2/Documents/bbmap/current/ jgi.BBDuk in= /Home/Desktop/Documents/limpeza/S2500_1.fastq out= /Home/Desktop/Documents/limpeza/S2500_1.fastq.clean.fastq trimq=20 minlen=75 Invalid maximum heap size: -Xmx-78m Error: Could not create the Java Virtual Machine. Error: A…
ATAC-seq fragment size distribution – huge spike at 150 bp
HI, I am working on some ATAC-seq data. We have performed paired-end sequencing with read length of 150 bp on a total of 24 samples (8 conditions in triplicates) To begin with, I will just briefly describe some of the main analysis steps. I have trimmed the read to remove…
How To Install bbmap on Ubuntu 20.04
In this tutorial we learn how to install bbmap on Ubuntu 20.04. bbmap is short read aligner and other bioinformatic tools b5833726e45421cf74f3885c83040a6f Introduction In this tutorial we learn how to install bbmap on Ubuntu 20.04. What is bbmap bbmap is: BBMap: Short read aligner for DNA and RNA-seq data. Capable…
rRNA decontamination of RNA-Seq reads (tool choice, introns etc)
Greetings! I need some help with performing my rRNA decontamination step properly, which is part of pre-processing pipeline for my Illumina RNA-Seq reads, before mapping to the reference genome (a plant species). SOME RELEVANT LINKS AND MY CRUDE INFERENCES: Download Rrna Sequences – SILVA database is useful for human research,…
Determinants of associations between codon and amino acid usage patterns of microbial communities and the environment inferred based on a cross-biome metagenomic analysis
Data collection Metagenomic project information was collected from the MGnify metagenomic database31. Currently (September 2021), microbiome data (sequence, taxonomic, and functional information, etc.) of 325,323 environmental samples can be found in this database. Often, microbes from similar ecological communities have been studied by different groups at different times and locations….
Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment
Sample preparation We ordered the GIAB samples from the Coriell Institute (NA24385, NIST ID HG002; NA24149, NIST-ID HG003 and NA24143, NIST-ID HG004). DNA concentration was measured by Qubit. The library was constructed according to Illumina TruSeq DNA PCR Free Library Prep protocol HT (Illumina Inc., San Diego, CA, USA) for…
BBmap bbduk.sh for filtering reads
I’m looking to filter reads that contain a stretch of A’s, I found these posts looking for polyA tails, meaning this should work all the same (Identify RNA-seq reads containing polyA sequence, Identifying RNA-seq reads containing polyA stretch). However, I cannot get it to work. Given just these two reads,…
Should I trim adapter sequences and filter by phred score, before alignment by salmon? : bioinformatics
First, trimming adapters is definitely necessary as they are essentially a form of contamination. For quality trimming and filtering I would highly recommend reading the following: Trimming of sequence reads alters RNA-Seq gene expression estimates Essentially they show that aggressive trimming is a problem. To quote from the Conclusions: The…
nf-core/circrna
circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data Introduction nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across…
Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings
Although the hypothesis of gene-regulatory network (GRN) cooption is a plausible model to explain the origin of morphological novelties (1), there has been limited empirical evidence to show that this mechanism led to the origin of any novel trait. Several hypotheses have been proposed for the origin of butterfly eyespots,…
Trying to trim last bp of several samples with BBduk at once
Trying to trim last bp of several samples with BBduk at once 1 Hi, I am trying to use BBduk to trim back my 151bp sequences to 150bp. I tried to create a loop for this so I could do one entire pool at the time, but I do not…
bbduk can’t read file
bbduk can’t read file 0 Hi all, When trying to filter reads using bbduk, I get the following error message: maskMiddle was disabled because useShortKmers=true Exception in thread “main” java.lang.RuntimeException: Can’t read file ‘/home/bioinf/TrainingData/SRR6197336/SRR6197336_1.fastq’ at shared.Tools.testInputFiles(Tools.java:185) at jgi.BBDuk.<init>(BBDuk.java:912) at jgi.BBDuk.main(BBDuk.java:78) This is my code: ~/Downloads/bbmap/bbduk.sh in1=~/TrainingData/SRR6197336/SRR6197336_1.fastq in2=~/TrainingData/SRR6197336/SRR6197336_2.fastq out1=~/TrainingData/reads/bbduk/SRR6197336_1_bbduk.fastq out2=~/TrainingData/reads/bbduk/SRR6197336_2_bbduk.fastq ktrim=r…
Primers trimming from Illumina paired-end reads by BBDuk software
Primers trimming from Illumina paired-end reads by BBDuk software 0 Hi, I’m dealing with Illumina amplicon data. I’d like to apply BBDuk to remove primers from the 5’end of my forward and reverse reads, but have a couple of questions. Could you help me please? How may i indicate in…
ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat
Significance To date, the potential of utilizing root traits in plant breeding remains largely untapped. In this study, we cloned and characterized the ENHANCED GRAVITROPISM2 (EGT2) gene of barley that encodes a STERILE ALPHA MOTIF domain–containing protein. We demonstrated that EGT2 is a key gene of root growth…