Tag: fastq

Detection of candidate gene LsACOS5 and development of InDel marker for male sterility by ddRAD-seq and resequencing analysis in lettuce

Ryder, E. J. Lettuce, Endive and Chicory (CABI Publishing, 1999). Google Scholar  Seki, K. et al. A CIN-like TCP transcription factor (LsTCP4) having retrotransposon insertion associates with a shift from Salinas type to Empire type in crisphead lettuce (Lactuca sativa L.). Hortic. Res. 7, 1–14 (2020). Article  Google Scholar  Odland,…

Continue Reading Detection of candidate gene LsACOS5 and development of InDel marker for male sterility by ddRAD-seq and resequencing analysis in lettuce

[W::bgzf_read_block] EOF marker is absent in BBMAP

[W::bgzf_read_block] EOF marker is absent in BBMAP 0 Hello, I’m asking an issue encountered in bbmap. I was using bbmap to remove host contaminants from my microbiome data. The commands are simple as below (ref folder already generated in the last step) bbmap.sh -Xmx42g in=R1.fastq.gz in2=R2.fastq.gz outu=cleaned.interleaved.fastq.gz threads=12 overwrite=t unpigz=t…

Continue Reading [W::bgzf_read_block] EOF marker is absent in BBMAP

Strange Per base sequence content of fastqc

Hi, all! I download fastq.gz files of GSE162708 from ENA which only have 2 files of each sample(usually scRNA-seq has 3 files I1 , R1 & R2 ). Then I run fastp as following Then I get QC report , but I can’t understand why Per base sequence content of…

Continue Reading Strange Per base sequence content of fastqc

tReasure: R-based GUI package analyzing tRNA expression profiles from small RNA sequencing data | BMC Bioinformatics

tReasure (tRNA Expression Analysis Software Utilizing R for Easy use) is a graphical user interface (GUI) tool for the analysis of tRNA expression profiles from deep-sequencing data of small RNAs (small RNA-seq) using R packages. The whole analysis workflow, including the uploading of FASTQ files of small RNA-seq, quantification of…

Continue Reading tReasure: R-based GUI package analyzing tRNA expression profiles from small RNA sequencing data | BMC Bioinformatics

FastQ_7 April 2022(1) – Copy.pptx – What is the FASTA format? The FASTA format is the “workhorse” of bioinformatics. It is used to represent sequence

the FASTA format is not “officially” defined – even though it carries the majority of data information onliving systems. Its origins go back to asoftware tool calledFastawritten byDavidLipman(ascientist that later became, and still is, the director of NCBI) andWilliam R. Pearsonof the University ofVirginia. The tool itself has (to some…

Continue Reading FastQ_7 April 2022(1) – Copy.pptx – What is the FASTA format? The FASTA format is the “workhorse” of bioinformatics. It is used to represent sequence

Reference-based alignment using MUSKET

Reference-based alignment using MUSKET 1 I’m running MUSKET on my dataset trimmed_data.tar.gz using 1000 threads, 2000 threads, and 4000 threads on a HPC. I’ve been unable to obtain any results because the software seems to be running for a long time. ./../musket-1.1/musket -k 90 600000000 -p 1000 -zlib 9 -ino…

Continue Reading Reference-based alignment using MUSKET

(ERR): bowtie2-align exited with value 13

bowtie2 – (ERR): bowtie2-align exited with value 13 1 I am trying to run bowtie2. but following error are occuring everytime bowtie2 –very-fast-local -x bowtie -q -1 R1.fastq -2 R2.fastq -s aligned.sam Saw ASCII character 10 but expected 33-based Phred qual. terminate called after throwing an instance of ‘int’ Aborted…

Continue Reading (ERR): bowtie2-align exited with value 13

Postdoc / Research Scientist in Bioinformatics and Computational Genomics

Job Description Are you a computer geek with a strong interest in genomics? Do you want to use your computational skills to solve human diseases? At the Department of Neurology at Harvard Medical School and Brigham & Women’s Hospital, we have two vacant positions: postdoctoral fellow and research scientist in…

Continue Reading Postdoc / Research Scientist in Bioinformatics and Computational Genomics

Qiime2 Exclude Seqs with FASTQ as query data.

Qiime2 Exclude Seqs with FASTQ as query data. 0 Hello, I am working with FASTQ files and I want to filter them based on the alignment with references sequences in FASTA format. I decided to use QIIME2 for this. So I imported both FASTA and FASTQ files to the required…

Continue Reading Qiime2 Exclude Seqs with FASTQ as query data.

FastQC per base sequence content

FastQC per base sequence content 1 I’m running FastQC on some paired-end fastq files. I have a warning on per-base sequence content, as the first 5 to 6 bases show significant bias towards T and G, as shown below. I was wondering what the sequence in the first 5 or…

Continue Reading FastQC per base sequence content

Validate RNAseq salmon quantification pipeline

Validate RNAseq salmon quantification pipeline 1 Hi, I’ve written a pipeline to perform quantification from RNAseq data with salmon. I’m trying to find a way to evaluate the quality of my results. I was thinking to run the pipeline on available public dataset and compare my output with another analysis….

Continue Reading Validate RNAseq salmon quantification pipeline

can`t find a path for to file

Trimmomatic – can`t find a path for to file 1 I simply need to run Trimmomatic, but he doesn`t see input files. May be you know how to deal with it? #creating variables INPUT_DIR=”path/folderinput” OUTPUT_DIR=”path/folderoutput” APPENDIX=”.fastq.gz” APPENDIX1=”_R1.fastq.gz” APPENDIX2=”_R2.fastq.gz” TRIMMOMATIC=”java -jar /home/path/trimmomatic-0.36.jar” #creating a loop for i in $INPUT_DIR/*$APPENDIX1 do FORWARD=$(basename…

Continue Reading can`t find a path for to file

Mapping back 3 sets of reads/sample with minimap2

I used FaQC to qc my raw fastqs before assembling. That program (and perhaps others) outputs properly paired Forward and Reverse fastqs, as well as an unpaired fastq file for each sample. I used the all 3 for each single sample assembly. Since minimap2 only allows for 2 query files,…

Continue Reading Mapping back 3 sets of reads/sample with minimap2

Sam file is not written

Dear all, It writes the following in the log file: [08-02 01:26:25] Running Step 2: BWA … bwa_wrap /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq 6 Output3/out_1.valid.sam 0 Running BWA on trimmed reads … bwa mem -t 6 /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq | samtools view -h -F 2048 – > Output3/out_1.valid.sam However, the sam file size is…

Continue Reading Sam file is not written

Mapped reference id is not an id of the genome file genome_nowhitespace.fa

miRDeep2: Mapped reference id is not an id of the genome file genome_nowhitespace.fa 1 Hi everyone, I’m trying to run nf-co.re/smrnaseq pipeline and I’m having a problem with mirdeep2. Command: nextflow run nf-core/smrnaseq -profile ijcluster –input /home/794_both.fastq.gz –outdir /home/results –genome GRCh38 –protocol qiaseq –mature mirbase.org/ftp/CURRENT/mature.fa.gz –hairpin mirbase.org/ftp/CURRENT/hairpin.fa.gz Error message: Command…

Continue Reading Mapped reference id is not an id of the genome file genome_nowhitespace.fa

Separate exogenous from endogenous transcripts using Salmon RNAseq DTU

Dear friends, We are trying to use Salmon for DTU analysis. We want to separate exogenous from endogenous transcripts by following this post www.biostars.org/p/443701/ and this paper f1000research.com/articles/7-952 We are focusing on a gene called ASCL1 (endo-ASCL1). We transduced cells with lentiviral vector containing ASCL1 ORF only (Lenti-ASCL1). There should…

Continue Reading Separate exogenous from endogenous transcripts using Salmon RNAseq DTU

Phylogenomic analysis of Syngnathidae reveals novel relationships, origins of endemic diversity and variable diversification rates | BMC Biology

Stölting KN, Wilson AB. Male pregnancy in seahorses and pipefish: beyond the mammalian model. Bioessays. 2007;29:884–96. PubMed  Google Scholar  Whittington CM, Friesen CR. The evolution and physiology of male pregnancy in syngnathid fishes. Biol Rev Camb Philos Soc. 2020;95:1252–72. PubMed  Google Scholar  Rosenqvist G, Berglund A. Sexual signals and mating…

Continue Reading Phylogenomic analysis of Syngnathidae reveals novel relationships, origins of endemic diversity and variable diversification rates | BMC Biology

BioInformatics Product Manager at Helix (remote)

You + Helix Helix is a place where innovators and doers gather in order to drive significant progress in population genomics. We have come together to work at the intersection of clinical care, research, and genomics.   If you’re excited by the idea of making a meaningful impact and joining a…

Continue Reading BioInformatics Product Manager at Helix (remote)

Trimmomatic/ linux system

Trimmomatic/ linux system 1 Hi all, I am trying to remove adapters and clean my RNA-seq.gz files using Trimmomatic, loaded on a Linux system (supercomputer server) Following the steps for Pair ends reads, explained in the manual (www.usadellab.org/cms/?page=trimmomatic) java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3…

Continue Reading Trimmomatic/ linux system

Sequence Duplication Levels failed FastQC Report

Sequence Duplication Levels failed FastQC Report 1 Hi all, I’m checking quality for my RNA-Seq through FastQC and all my fastq failed on “Per base sequence content” and “Sequence Duplication Levels”, besides warning on “Overrepresented sequences” only for read 1 files (it’s paired-end; the sequences match between samples). Below is…

Continue Reading Sequence Duplication Levels failed FastQC Report

Why did I achieve shorter than initial reads subset after aligned reads extraction.

Why did I achieve shorter than initial reads subset after aligned reads extraction. 1 Hello dear colleages! I have recently faced some problem. I have worked with long WGS reads. Firstly I have filtered the longest subset of reads, and aligned them to the custom sequence with several structural variants…

Continue Reading Why did I achieve shorter than initial reads subset after aligned reads extraction.

How to check Fasta file ASCII characters and fix encoding errors?

How to check Fasta file ASCII characters and fix encoding errors? 0 I tried building a diamond database but got this error. Error: Error reading input stream at line 180825: Invalid character (ASCII 0) in sequence How can I fix it? Is there a tool that checks for this and…

Continue Reading How to check Fasta file ASCII characters and fix encoding errors?

Mitogenome of a stink worm (Annelida: Travisiidae) includes degenerate group II intron that is also found in five congeneric species

Tan, M. H. et al. Comparative mitogenomics of the Decapoda reveals evolutionary heterogeneity in architecture and composition. Sci. Rep. 9, 1–16 (2019). ADS  Google Scholar  Zhang, Y. et al. Phylogeny, evolution and mitochondrial gene order rearrangement in scale worms (Aphroditiformia, Annelida). Mol. Phylogenet. Evol. 125, 220–231 (2018). CAS  PubMed  Google…

Continue Reading Mitogenome of a stink worm (Annelida: Travisiidae) includes degenerate group II intron that is also found in five congeneric species

Feature count is very low using htseq-count

Feature count is very low using htseq-count 0 Hello all, I performed bbmap on my RNA-seq paired sequence data using following cmd bbmap.sh in1=J2_R1.fastq in2=J2_R2.fastq out=output_J2.sam ref=im4.fasta nodisk The header of generated sam file is @HD VN:1.4 SO:unsorted @SQ SN:k141_1006 LN:2503 @SQ SN:k141_5512 LN:5393 @SQ SN:k141_4772 LN:4387 @SQ SN:k141_3267 LN:4531…

Continue Reading Feature count is very low using htseq-count

Minimap2 options for Nanopore cDNA direct seq

Minimap2 options for Nanopore cDNA direct seq 0 Hello, I’m working with ONT RNA seq data and I used the cDNA direct seq to do the seq. I want to look for long deletions in mRNAs that are not spliced, for this, I want to use the splice option of…

Continue Reading Minimap2 options for Nanopore cDNA direct seq

Fastp file merge append | Develop Paper

Interpretation of fastq file formatwww.jianshu.com/p/39115d21ee17 Sometimes, the sequencing results of a species will return two double ended fastps.r1.fq.gz l1.fq.gzr2.fq.gz l2.fq.gzThe content of sequencing data is actually one piece, but it is divided into two parts during transmission.When we use it, we are used to merging it into a double ended…

Continue Reading Fastp file merge append | Develop Paper

BTG2 gene predicts poor outcome in PT-DLBCL

Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…

Continue Reading BTG2 gene predicts poor outcome in PT-DLBCL

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

The protocol presented here describes a complete pipeline to analyze RNA-sequencing transcriptome data from raw reads to functional analysis, including quality control and preprocessing steps to advanced statistical analytical approaches. Welcome to the protocol of high-throughput transcriptome analysis for investigating host-pathogen interactions. This protocol is divided in the following steps….

Continue Reading High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

BBTools – BioGrids Consortium – Supported Software

AllHigh-Throughput SequencingGenomicsProteomicsVisualizationOther BBTools Description a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving. Installation Use the following command to…

Continue Reading BBTools – BioGrids Consortium – Supported Software

sorting – indexing sorted alignment file with samtools index gives “Exec format error”

I am struggling with samtools index. I already did the alignment using “bwa mem reference.fa seq.fastq > alg.sam”. The resulting sam file was converted to bam format using “samtools view -S -h -b alg.sam > alg.bam”. Next, the files were sorted by using “sort -h alg.bam >sorted.bam”. And now we…

Continue Reading sorting – indexing sorted alignment file with samtools index gives “Exec format error”

METASnake: a Snakemake workflow to facilitate…

Introduction As sequencing technology has become cheaper and more readily accessible, the need for the increased computational capacity to process these data has become apparent. In particular, high-throughput sequencing has been particularly useful when applied to the field of metagenomics. Substantial effort has been devoted to developing software and computational…

Continue Reading METASnake: a Snakemake workflow to facilitate…

bedtools sample with fastq input and fewer input records than requested

I’m using bedtools sample to sample reads from fastq files. I’d like to submit two feature requests: If the number of requested records is larger than the input I get ERROR: Input file has fewer records than the requested number of output records. I guess this is intentional and not…

Continue Reading bedtools sample with fastq input and fewer input records than requested

Extracellular circulating miRNAs as stress-related signature to search and rescue dogs

Study approval was provided by the Research Ethics Committee of the University of Perugia (report n.2018-21 of 11/12/2018) according to Italian Ministry of Health legislation18. All methods were carried out following relevant guidelines and regulations and the study was carried out in compliance with the ARRIVE guidelines. Informed consent is…

Continue Reading Extracellular circulating miRNAs as stress-related signature to search and rescue dogs

Per base sequence quality – fastqc

Per base sequence quality – fastqc 2 Hi everyone, I am new to bioinformatics, I am asking a very basic question here, I have paired-end fastq data, I did fastqc, and in this per base sequence quality, few reads are in the red region, and there is no adapter and…

Continue Reading Per base sequence quality – fastqc

Genomic variation from an extinct species is retained in the extant radiation following speciation reversal

Vamosi, J. C., Magallon, S., Mayrose, I., Otto, S. P. & Sauquet, H. Macroevolutionary patterns of flowering plant speciation and extinction. Annu. Rev. Plant Biol. 69, 685–706 (2018). CAS  PubMed  Google Scholar  Rhymer, J. M. & Simberloff, D. Extinction by hybridization and introgression. Annu. Rev. Ecol. Syst. 27, 83–109 (1996)….

Continue Reading Genomic variation from an extinct species is retained in the extant radiation following speciation reversal

Analyzing and slicing FASTQ file entries using Python

Analyzing and slicing FASTQ file entries using Python 1 I have the code pasted below for running on FASTQ file entries in order to compare specific parts and remove the redundancy of the same sequences (based on the miRNA + umi_seq combination). I save the entry IDs and then make…

Continue Reading Analyzing and slicing FASTQ file entries using Python

nf-core/circrna

circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data Introduction nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across…

Continue Reading nf-core/circrna

Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis

INTRODUCTION Next-generation sequencing (NGS) has revolutionized many areas of biological research (1, 2), providing ever-more data at an ever-decreasing cost. One such area is microbiome research, the study of microbes in their theater of activity using metagenomic sequencing (3). Here, deep short-read sequencing, and improving performance of long-read sequencing, are…

Continue Reading Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis

Kallisto mapping paired end

Kallisto mapping paired end 0 Hello everyone, I am new to bioinformatics and i am trying to use kallisto to map paired end data. However, I got an error by running the command. So does anyone know what did I do wrong here? Thank you! Here is my command: kallisto…

Continue Reading Kallisto mapping paired end

FastQC for paired end data

FastQC for paired end data 2 Hi, I have 36 fastq files of paired end RNA-seq so I was wondering if anyone knows how to do fastqc on paired-end data? and what is the difference between fastqc of single end data? I have done with single end data before but…

Continue Reading FastQC for paired end data

Processing two lists of files with snakemake

I want to use snakemake to do bowtie2 mapping of split read files to a reference genome, and I’d like that rule to be integrated in the general workflow. For that purpose, I first defined a rule to create a bowtie index rule build_bowtie_index: input: referenceGenomeFasta output: expand(“{name}.{index}.bt2”, index=range(1,5), name…

Continue Reading Processing two lists of files with snakemake

RNA-Seq Data Analysis Software – Isogen Lifescience

BlueBee Genomics The BlueBee platform is a production-ready, robust infrastructure that is easy to use for any researcher. It can be used for analysing data from QuantSeq, CORALL, and SLAMseq experiments. There is no prior bioinformatic experience required. Each purchased QuantSeq and CORALL kit includes a code for free data…

Continue Reading RNA-Seq Data Analysis Software – Isogen Lifescience

Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2) Install $ git clone github.com/bakerwm/find_te_ins.git
 $ cd find_te_ins Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11; $ bash run_pipe.sh run_pipe.sh Prerequisite minimap2 – 2.17-r974-dirty, align long…

Continue Reading Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

Cell Strain-Derived Induced Pluripotent Stem Cells as an Isogenic Approach To Investigate Age-Related Host Response to Flaviviral Infection

INTRODUCTION Dengue is the most common mosquito-borne viral disease globally (1). This acute disease, which can be life-threatening, is caused by four different dengue viruses (DENVs) (DENV-1, DENV-2, DENV-3, and DENV-4). An estimated 390 million people are infected with these DENVs annually (2), and populations throughout the tropics face frequent…

Continue Reading Cell Strain-Derived Induced Pluripotent Stem Cells as an Isogenic Approach To Investigate Age-Related Host Response to Flaviviral Infection

Error in Rsubread featureCounts

Hi there, Excellent package! I am using it to do RNA-seq. But I encountered a small problem when using featureCounts(). The code is as follows: featureCounts( “A1.raw_1.fastq.gz.subjunc.BAM”, annot.inbuilt = NULL, annot.ext = “GCF_015227675.2_mRatBN7.2_genomic.gtf”, isGTFAnnotationFile=TRUE, isPairedEnd=TRUE, nthreads = 8 ) And it returns this: ========== _____ _ _ ____ _____ ______…

Continue Reading Error in Rsubread featureCounts

Postdoctoral position in bioinformatics – focused on single-cell immune transcriptomics – Karolinska Institute – job portal

Postdoctoral position in bioinformatics – focused on single-cell immune transcriptomics Login and apply Do you want to contribute to improving human health? We are looking for an ambitious postdoctoral fellow with solid genome-wide bioinformatics and computational biology skills to join our highly accomplished team. We offer a stimulating environment in…

Continue Reading Postdoctoral position in bioinformatics – focused on single-cell immune transcriptomics – Karolinska Institute – job portal

Merging compressed fastq files based on a conditions defined in a csv file

Hello everybody, I have a question quite different about similar topic addressed on: Post not found I tried Paul’s bash script in the web indicated above (fastq_lane_merging.sh) adapting to my filename organization data being: #!/bin/bash for i in $(find ./ -type f -name “*.fastq.gz” | while read F; do basename…

Continue Reading Merging compressed fastq files based on a conditions defined in a csv file

Mapping to multiple references using bbmap

So my question comes in two parts: First of all is what I’m trying to do within reason given the tools I am using? I am investigating the shuffling effects of a recombinase on a known reporter sequence which subsequently generates libraries of unique sequences. By simulating all of the…

Continue Reading Mapping to multiple references using bbmap

bwa , 2 files fastq to 1 sam

bwa , 2 files fastq to 1 sam 1 i have this problem, please, help me, I’m trying it too from Mac OS Catalina I am creating a sam file, with 2 fastq files, using bwa I apply the following command bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz > V350019555_L03_B5GHUMqcnrRAABA-556.sam…

Continue Reading bwa , 2 files fastq to 1 sam

SeqIO object get cleared away after being accessed

I’m using Biopython to parse a fastq file, and I found that the SeqIO object get cleared away once I accessed it. from Bio import SeqIO record_fastqIO = SeqIO.parse(‘SRR835775_1.first1000.fastq’,’fastq’) for record in record_fastqIO: print(record.id) This script works perfectly. But if I add one line to the script: from Bio import…

Continue Reading SeqIO object get cleared away after being accessed

identify and remove adapter sequence

identify and remove adapter sequence 2 Hi all, I am trying to identify the adapter sequences of my ATAC-sequencing data. The way I tried to achieve this was to send the fastq file to FastQC. Hoping the sequence would be picked and showed in the report. In the report, there…

Continue Reading identify and remove adapter sequence

Petabase-scale sequence alignment catalyses viral discovery

Serratus alignment architecture Serratus (v0.3.0) (github.com/ababaian/serratus) is an open-source cloud-infrastructure designed for ultra-high-throughput sequence alignment against a query sequence or pangenome (Extended Data Fig. 1). Serratus compute costs are dependent on search parameters (expanded discussion available: github.com/ababaian/serratus/wiki/pangenome_design). The nucleotide vertebrate viral pangenome search (bowtie2, database size: 79.8 MB) reached processing rates…

Continue Reading Petabase-scale sequence alignment catalyses viral discovery

Samtools flagstat confusing result of a merged bam file

Hi, I am a bioinformatics student and I am struggling with an issue, I had paired-end fastq files for one sample with some low-quality bases at the end and adapter contamination, so I went and I trimmed my reads with trimmomatic, it gave me 4 files that I used for…

Continue Reading Samtools flagstat confusing result of a merged bam file

R and sra toolkit – odd system() behavior ( R, System )

Problem : ( Scroll to solution ) In order to extract some fastq data from NCBI’s sequence read archive I’ve downloaded and installed the sra toolkit for Windows. In order to test if it is setup correctly, I opened cmd, navigated to the directory and typed in the command fasterq-dump…

Continue Reading R and sra toolkit – odd system() behavior ( R, System )

The role of ATXR6 expression in modulating genome stability and transposable element repression in Arabidopsis

Significance The plant-specific H3K27me1 methyltransferases ATXR5 and ATXR6 play integral roles connecting epigenetic silencing with genomic stability. However, how H3K27me1 relates to these processes is poorly understood. In this study, we performed a comprehensive transcriptome analysis of tissue- and ploidy-specific expression in a hypomorphic atxr5/6 mutant and revealed that the…

Continue Reading The role of ATXR6 expression in modulating genome stability and transposable element repression in Arabidopsis

Any alternatives to BBMap’s clumpify.sh program to optimize gzip compression?

Any alternatives to BBMap’s clumpify.sh program to optimize gzip compression? 1 I’ve had some difficulties implementing this in pipelines because it randomly fails sometimes. Are there any other programs that can be used in its stead? fastq genomics rnaseq • 201 views • link updated 7 hours ago by GenoMax…

Continue Reading Any alternatives to BBMap’s clumpify.sh program to optimize gzip compression?

ChaoXianSen/TrimGalore – Giters

Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data. Installation Trim Galore is a a Perl wrapper around two tools: Cutadapt and FastQC. To use, ensure that these two pieces of software are available…

Continue Reading ChaoXianSen/TrimGalore – Giters

Mle Application With Gekko In Python

The true power of the state space model is to allow the creation and estimation of custom models.This notebook shows various statespace models that subclass sm. That means your MAGeCK python module is installed in /home/john/.pyenv/versions/2.7.13/lib/python2.7/sitepackages.I use conda to install the latest version of. This twovolume set Diseases and Pathology…

Continue Reading Mle Application With Gekko In Python

[lh3/minimap2] Memory leak when using Python and threads

The program align.py uses mappy to align reads in Python using multiple worker threads. After loading the index the memory usage jumps up quickly to >20Gb and then continues to climb steadily through 40Gb an beyond. This issue was first discovered in bonito and isolated to mappy. The data flow…

Continue Reading [lh3/minimap2] Memory leak when using Python and threads

Bwa on multiple processor

Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…

Continue Reading Bwa on multiple processor

python – Missing input files after defining them in function

I am trying to do QC on RNAseq data that is tarballed. I am using Snakemake as a workflow manager and am aware that Snakemake does not like one-to-many rules. I defining a checkpoint would fix the problem but when I run the script I get this this error message…

Continue Reading python – Missing input files after defining them in function

Aligning multiple single and paired-end reads from multiple files (lanes)

Rsubread: Aligning multiple single and paired-end reads from multiple files (lanes) 0 Hello, I am new to bioinformatics and looking for some help. I have 27 files from an Illumina output. There are 4 paired end and 23 single read files. I am trying to align them using Rsubread in…

Continue Reading Aligning multiple single and paired-end reads from multiple files (lanes)

RedChIP identifies noncoding RNAs associated with genomic sites occupied by Polycomb and CTCF proteins

Abstract Nuclear noncoding RNAs (ncRNAs) are key regulators of gene expression and chromatin organization. The progress in studying nuclear ncRNAs depends on the ability to identify the genome-wide spectrum of contacts of ncRNAs with chromatin. To address this question, a panel of RNA–DNA proximity ligation techniques has been developed. However,…

Continue Reading RedChIP identifies noncoding RNAs associated with genomic sites occupied by Polycomb and CTCF proteins

tranfering sam file easy and fast way

tranfering sam file easy and fast way 0 Hi everyone I was tried to align my fastq files by hisat2 but ı couldnot able done because my computer has 4gb ram and ı get error killed. So ı was perfomed process on my friend computer but now I should solve…

Continue Reading tranfering sam file easy and fast way

Alignment report

Alignment report 0 Hi Guys, I did alignment of R1 and R2 fastq files with reference genome using bwa mem and got bam file. Now, I want to check whether the alignment is done correctly and alignment percentage,coverage etc. I run following command: bwa mem hg19.fasta R1.fastq R2.fastq | samtools…

Continue Reading Alignment report

how to align paired and unpaired fastq files of a sample using STAR?

how to align paired and unpaired fastq files of a sample using STAR? 2 Hi all I’m new to using STAR aligner. I have PE sequencing fastq files which have forward and reverse pairs and forward and reverse unpairs reads (4 files). In the manual of this tool, it seems…

Continue Reading how to align paired and unpaired fastq files of a sample using STAR?

sequence alignment – Help with MinION sequencing data species identification

Hi I’m new to bioinformatics and have just completed my first run on the MinION (long read sequencing Oxford Nanopore Technologies). I was hoping someone could direct me towards R packages, workflow, tutorials or guides that will help me identify species that are present in my sample mainly for fungi…

Continue Reading sequence alignment – Help with MinION sequencing data species identification

best platform to analysis chip-seq data using R

best platform to analysis chip-seq data using R 0 Hello Guys, I am wondering if you could share your experience with best platform to analysis chip-seq data from fastq files? I figured several packages but I am just wondering which one is more straightforward chip-seq • 12 views • link…

Continue Reading best platform to analysis chip-seq data using R

error reading fastq-files with readDNAStringset

I am trying to read a fastq-file with readDNAStringSet and having quite some trouble doing so. I need the names, aswell as the quality-scores. Right now I am using: readDNAStringSet(myFastqFile, format=”fastq”, use.names= TRUE, with.qualities = TRUE) But here i get the Error: “@” expected at beginning of line 1 I…

Continue Reading error reading fastq-files with readDNAStringset

sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…

Continue Reading sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Finding fuzzy seed matches enable BLEND to find both 1) exact-matching seeds…

Continue Reading A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches

Average Read length

Average Read length 3 Hello Everyone! Is there a standard tool commonly used to calculate the average read length of fastq files? If yes please mention it here because I want to know the size of average reads of my fastq files so that I can decide the cutoff for…

Continue Reading Average Read length

Issue with fastq after converting phred 64 to phred 33 quality scores

Hello, I ran seqtk seq -VQ64 read1.fastq.gz > read1_phred33.fastq to convert my 64 based phred score reads to 33 based phred score phred reads. However when I attempted to run them through tophat alignment I got this error: Saw ASCII character 4 but expected 33-based Phred qual. terminate called after…

Continue Reading Issue with fastq after converting phred 64 to phred 33 quality scores

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD,  Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…

Continue Reading Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

Index of /~psgendb/doc/bioLegato/blreads

Name Last modified Size Description Parent Directory   –   SOAPdenovo2.hints.html 2019-05-04 15:52 3.9K   Trimmomatic.hints.html 2019-05-20 13:32 6.3K   Trinity.hints.html 2019-04-23 11:39 2.4K   adaptercheck.hints.html 2021-05-13 12:27 8.0K   adaptercheck.html 2021-05-12 17:45 4.9K   adaptercheck_output.png 2021-05-12 17:17 51K   fastq_pair.hints.html 2019-04-05 13:16 3.4K   gffcompare.hints.html 2018-07-18 14:05 3.2K  …

Continue Reading Index of /~psgendb/doc/bioLegato/blreads

Different FastQC results after name-sorting BAM file, sequence duplication increases

Different FastQC results after name-sorting BAM file, sequence duplication increases 1 Okay, so what I did might was stupid, but I was determined to examine on my own a lot of things, and experiment a bit with tools. At one point I decided to do this: I had BAM file…

Continue Reading Different FastQC results after name-sorting BAM file, sequence duplication increases

Single-cell delineation of lineage and genetic identity in the mouse brain

STICR lentiviral library preparation and validation We synthesized a high-complexity lentivirus barcode library that encodes approximately 60–70 million distinct oligonucleotide RNA sequences (STICR barcodes). STICR barcodes comprised three distinct oligonucleotide fragments cloned sequentially into a multicloning site within the 3′ UTR of an enhanced green fluorescent protein (eGFP) transgene under…

Continue Reading Single-cell delineation of lineage and genetic identity in the mouse brain

rust-bio-tools 0.35.0 – Docs.rs

rust-bio-tools-0.35.0 is not a library. A set of ultra fast and robust command line utilities for bioinformatics tasks based on Rust-Bio. Rust-Bio-Tools provides a command rbt, which currently supports the following operations: a linear time implementation for fuzzy matching of two vcf/bcf files (rbt vcf-match) a vcf/bcf to txt converter,…

Continue Reading rust-bio-tools 0.35.0 – Docs.rs

Import problem: Not a(n) QIIME1DemuxFormat file – Technical Support

Hi @emiliomastriani, Did you download the sequences form sra?This previous question may give you some help: Hi there, I am familiar with QIIME1 but relatively new with QIIME2. I have gotten my raw file in the past from a facility in the CASAVA pair ended demultiplexed format and I had…

Continue Reading Import problem: Not a(n) QIIME1DemuxFormat file – Technical Support

Attempting to generate a bam.bai file but the output is not readable

Attempting to generate a bam.bai file but the output is not readable 1 Hi, I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index stage because the output files are in a non-human readable format and I believe…

Continue Reading Attempting to generate a bam.bai file but the output is not readable

hisat2-align died with signal 6 (ABRT) (core dumped)

(ERR): hisat2-align died with signal 6 (ABRT) (core dumped) 0 Hi, run hisat2 ,I encountered an error. hisat2-build -p 10 ~/public_data/genome/Pt_V1.0.fa genome 1>hisat2-build.log 2>&1 ~/software/hisat2-2.2.0/hisat2 -x genome -1 ~/data/clean/NC1_5_clean_R1.fastq.gz -2 ~/data/clean/NC1_5_clean_R2.fastq.gz -S NC1_5.sam 1>NC1_5.log 2>&1 cat NC1_5.log terminate called after throwing an instance of ‘std::bad_alloc’ what(): std::bad_alloc (ERR): hisat2-align died…

Continue Reading hisat2-align died with signal 6 (ABRT) (core dumped)

Find right adapter sequence for trimming

Find right adapter sequence for trimming 0 Hello everyone I am newly start to working RNAseq analysis. I am trying to clean single end reads data according to fastqc result. It was resulted like in example as SRR309133 I was tried Illumina Adapter Sequences find it there.But after trimming result…

Continue Reading Find right adapter sequence for trimming

BBSplit ambiguous dataset analysis

BBSplit ambiguous dataset analysis 1 I have used bbsplit to split a metagenomic dataset into reads mapping to three genomes a, b, c. bbsplit.sh in1={fastq_1} in2={fastq_2} ref={ref_str} ambiguous2=split basename={out_path}out_split_%.sam If I want to identify which ambiguous reads align to ‘a’ and any other genome – is this only ‘ambiguous_a’? or…

Continue Reading BBSplit ambiguous dataset analysis

Single end read adapter trimming via flexbar

Single end read adapter trimming via flexbar 0 Hi guys I am beginner in sequence analysis.I am trying to trimming adapter because ı have seen that my data has adapters end of sequences according to multiqc result. Its is single end reads and ı have tried this commend in below…

Continue Reading Single end read adapter trimming via flexbar

Error with file guillaumeKUnitigsAtLeast32bases_all.fasta, kUnitigLengths.txt is of size 0, must be at least of size 1.

Hello, I am trying running an assembly with MaSuRCa but am getting an error at the step: “Computing super reads from PE”. here’s the output with the error: [xxxx@vic Bovidae]$ cd Assembly_test/ [xxxx@vic Assembly_test]$ ls assemble.sh guillaumeKUnitigsAtLeast32bases_all.fasta.tmp masurca_assembly.o4302352 meanAndStdevByPrefix.pe.txt pe_data.tmp quorum_mer_db.jf work1 environment.sh guillaumeKUnitigsAtLeast32bases_all.jump.fasta masurca_config.txt pe.cor.fa pe.renamed.fastq super1.err ESTIMATED_GENOME_SIZE.txt masurca_assembly.e4302352…

Continue Reading Error with file guillaumeKUnitigsAtLeast32bases_all.fasta, kUnitigLengths.txt is of size 0, must be at least of size 1.

Why single cell R2 fastq have no read identified by bowtie2 ?

Why single cell R2 fastq have no read identified by bowtie2 ? 0 When we input R2 fastq.gz into bowtie2, human sequence was not removed ( ${base}_host_removed is zero). for i in $(find ./ -type f -name “.fastq.gz” | while read F; do basename $F | rev | cut -c…

Continue Reading Why single cell R2 fastq have no read identified by bowtie2 ?

qiime2-import data from non-working directory – User Support

Hello, qiime2 users community! I have the following set-up: a huge collection of .fastq files which I would like to process with the dada2 pipeline a remote cluster of servers with the separated storage where those files are stored, and the working machines for computing. Question: Is it possible to…

Continue Reading qiime2-import data from non-working directory – User Support

16s rRNA Sequencing Meta-analysis Reconstruction Tool (using mothur).

16SMaRT is a bioinformatics analysis pipeline for 16s rRNA gene sequencing data. 16SMaRT is a “one-click” solution towards performing microbial community analysis of amplicon sequencing data. 16SMaRT aims to be your go-to solution for your next microbiome/metagenomics project. The primary objective of 16SMaRT analysis is to determine what genes are…

Continue Reading 16s rRNA Sequencing Meta-analysis Reconstruction Tool (using mothur).

Genome Bioinformatics Analyst – Pittsburgh

**Description** UPMC Presbyterian is hiring a Genome Bioinformatics Analyst to join the Molecular and Genomic Pathology Laboratory (MGP) team! This role will work a daylight schedule Monday through Friday. No weekends or holidays are required! The Molecular and Genomic Pathology Laboratory (MGP) is a dynamic state-of-the-art clinical laboratory that prides…

Continue Reading Genome Bioinformatics Analyst – Pittsburgh

Problem with using flagstat after bowtie2 alignment

I’m running bowtie2 to align multiple samples to one reference genome, and then run samtools flagstats to output the results. All but two samples have aligned and I’ve managed to run flagstat on them. For those two samples, when I run flagstat, I first get: [W::bam_hdr_read] EOF marker is absent….

Continue Reading Problem with using flagstat after bowtie2 alignment

how to generate reference genome from paired-end reads

how to generate reference genome from paired-end reads 0 Hi, I have a bacteria sample and I want to align this sample to a reference genome. However, a reference genome for this particular strain is not available. So, I need to generate a reference genome from my 2X150 paired-end reads…

Continue Reading how to generate reference genome from paired-end reads

get rRNA FASTA file for a particular bacteria

get rRNA FASTA file for a particular bacteria 0 Hey all, I was trying to find a way to get all rRNA (5S, 16S and 23S) FASTA sequences for a particular bacteria (B. thetaiotaomicron VPI-5482, which is the type strain). I wanted this file so that I could use something…

Continue Reading get rRNA FASTA file for a particular bacteria

bash script not a valid identifier

bash script not a valid identifier 2 I am trying to run bash script, but it gives this error ( `$fastq’: not a valid identifier). #!/bin/bash database=”kraken2_database” fastq=”fastq_dir” for $fastq in $(ls *_R1.fastq.gz | sed ‘s/_R1.fastq.gz//’) do kraken2 –db $database –threads 8 –memory-mapping –use-names –confidence 0.1 –report taxonomy_reads/${fastq}_kraken2.tax –paired ${fastq}_R1.fastq.gz…

Continue Reading bash script not a valid identifier

Bash script to help with print Name of reads that only have query subsequence or its verse complement and position of first occurance of this subsequence in read and output of all this in tab separat

Bash script to help with print Name of reads that only have query subsequence or its verse complement and position of first occurance of this subsequence in read and output of all this in tab separat 0 Create bash script that receives name of fastq fastq file and query subsequence…

Continue Reading Bash script to help with print Name of reads that only have query subsequence or its verse complement and position of first occurance of this subsequence in read and output of all this in tab separat

Cell hashing vs barcoding

Cell hashing vs barcoding 1 Hi everyone, I cant seem to unable to differentiate between cell hashing , for example cite-seq.com/cell-hashing/, or cellplex(by 10x) with cell barcoding (used by 10x) for example. What is it with cell hashing that makes it different? barcoding hashing • 556 views Cell barcoding means…

Continue Reading Cell hashing vs barcoding

Trimming DNAStringSet

Trimming DNAStringSet 1 Hello, I am currently dealing with the problem of reading in a Fastq-File with “readDNAStringset”, trimming the Sequences and then writing them in to a new fastq-file. The reading of the fastq-file with “readDNAStringSet” is working just fine. I am then trying to trim a fixed length…

Continue Reading Trimming DNAStringSet

Adding new taxa to a Kraken2 db

Hi, can someone please check if these following steps are correct? I am trying to add to my plants kraken2 db (“plant_original”) few taxa genomes that I have downloaded from the NCBI website (alnus_glutinosa_GCA_003254965.1.fna, carpinus_fangiana_GCA_006937295.1.fna etc..). for file in *.fna do kraken2-build –add-to-library $file –db PATH/kraken/plant_original done Masking low-complexity regions…

Continue Reading Adding new taxa to a Kraken2 db

Using comm to make a list of files that haven’t yet been processed

Using comm to make a list of files that haven’t yet been processed 0 I’m using comm to work out which files have already been processed and which are still to do. The input and output filenames are a little different, so I’ve used basename and sed to strip away…

Continue Reading Using comm to make a list of files that haven’t yet been processed

UMI extraction from 10X visium spatial transcriptome data

UMI extraction from 10X visium spatial transcriptome data 0 Hello everyone I have to analyse visium spatial transcriptome (ST) sequencing data (2 x150 bp) . I want to extract Spatial barcode and UMI from Read1 in order to reduce the read1 length from 150bp to 28 bp (16 bp Spatial…

Continue Reading UMI extraction from 10X visium spatial transcriptome data

How to handle VCFs from the same sample but using different aligners and variant callers?

Hi, I’m using whole-exome sequencing (WES) for somatic variant calling. During the process, I tried to follow the approach described here: pubmed.ncbi.nlm.nih.gov/28420412/ Basically my workflow is as follows: FASTQ preprocessing: Using 2 aligners (BWA-MEM, Bowtie2) BAM calibration Variant calling: Using 3 software (Mutect2, Strelka2, Lancet) Variant filtering: I keep just…

Continue Reading How to handle VCFs from the same sample but using different aligners and variant callers?