Tag: fastq
Detection of candidate gene LsACOS5 and development of InDel marker for male sterility by ddRAD-seq and resequencing analysis in lettuce
Ryder, E. J. Lettuce, Endive and Chicory (CABI Publishing, 1999). Google Scholar Seki, K. et al. A CIN-like TCP transcription factor (LsTCP4) having retrotransposon insertion associates with a shift from Salinas type to Empire type in crisphead lettuce (Lactuca sativa L.). Hortic. Res. 7, 1–14 (2020). Article Google Scholar Odland,…
[W::bgzf_read_block] EOF marker is absent in BBMAP
[W::bgzf_read_block] EOF marker is absent in BBMAP 0 Hello, I’m asking an issue encountered in bbmap. I was using bbmap to remove host contaminants from my microbiome data. The commands are simple as below (ref folder already generated in the last step) bbmap.sh -Xmx42g in=R1.fastq.gz in2=R2.fastq.gz outu=cleaned.interleaved.fastq.gz threads=12 overwrite=t unpigz=t…
Strange Per base sequence content of fastqc
Hi, all! I download fastq.gz files of GSE162708 from ENA which only have 2 files of each sample(usually scRNA-seq has 3 files I1 , R1 & R2 ). Then I run fastp as following Then I get QC report , but I can’t understand why Per base sequence content of…
tReasure: R-based GUI package analyzing tRNA expression profiles from small RNA sequencing data | BMC Bioinformatics
tReasure (tRNA Expression Analysis Software Utilizing R for Easy use) is a graphical user interface (GUI) tool for the analysis of tRNA expression profiles from deep-sequencing data of small RNAs (small RNA-seq) using R packages. The whole analysis workflow, including the uploading of FASTQ files of small RNA-seq, quantification of…
FastQ_7 April 2022(1) – Copy.pptx – What is the FASTA format? The FASTA format is the “workhorse” of bioinformatics. It is used to represent sequence
the FASTA format is not “officially” defined – even though it carries the majority of data information onliving systems. Its origins go back to asoftware tool calledFastawritten byDavidLipman(ascientist that later became, and still is, the director of NCBI) andWilliam R. Pearsonof the University ofVirginia. The tool itself has (to some…
Reference-based alignment using MUSKET
Reference-based alignment using MUSKET 1 I’m running MUSKET on my dataset trimmed_data.tar.gz using 1000 threads, 2000 threads, and 4000 threads on a HPC. I’ve been unable to obtain any results because the software seems to be running for a long time. ./../musket-1.1/musket -k 90 600000000 -p 1000 -zlib 9 -ino…
(ERR): bowtie2-align exited with value 13
bowtie2 – (ERR): bowtie2-align exited with value 13 1 I am trying to run bowtie2. but following error are occuring everytime bowtie2 –very-fast-local -x bowtie -q -1 R1.fastq -2 R2.fastq -s aligned.sam Saw ASCII character 10 but expected 33-based Phred qual. terminate called after throwing an instance of ‘int’ Aborted…
Postdoc / Research Scientist in Bioinformatics and Computational Genomics
Job Description Are you a computer geek with a strong interest in genomics? Do you want to use your computational skills to solve human diseases? At the Department of Neurology at Harvard Medical School and Brigham & Women’s Hospital, we have two vacant positions: postdoctoral fellow and research scientist in…
Qiime2 Exclude Seqs with FASTQ as query data.
Qiime2 Exclude Seqs with FASTQ as query data. 0 Hello, I am working with FASTQ files and I want to filter them based on the alignment with references sequences in FASTA format. I decided to use QIIME2 for this. So I imported both FASTA and FASTQ files to the required…
FastQC per base sequence content
FastQC per base sequence content 1 I’m running FastQC on some paired-end fastq files. I have a warning on per-base sequence content, as the first 5 to 6 bases show significant bias towards T and G, as shown below. I was wondering what the sequence in the first 5 or…
Validate RNAseq salmon quantification pipeline
Validate RNAseq salmon quantification pipeline 1 Hi, I’ve written a pipeline to perform quantification from RNAseq data with salmon. I’m trying to find a way to evaluate the quality of my results. I was thinking to run the pipeline on available public dataset and compare my output with another analysis….
can`t find a path for to file
Trimmomatic – can`t find a path for to file 1 I simply need to run Trimmomatic, but he doesn`t see input files. May be you know how to deal with it? #creating variables INPUT_DIR=”path/folderinput” OUTPUT_DIR=”path/folderoutput” APPENDIX=”.fastq.gz” APPENDIX1=”_R1.fastq.gz” APPENDIX2=”_R2.fastq.gz” TRIMMOMATIC=”java -jar /home/path/trimmomatic-0.36.jar” #creating a loop for i in $INPUT_DIR/*$APPENDIX1 do FORWARD=$(basename…
Mapping back 3 sets of reads/sample with minimap2
I used FaQC to qc my raw fastqs before assembling. That program (and perhaps others) outputs properly paired Forward and Reverse fastqs, as well as an unpaired fastq file for each sample. I used the all 3 for each single sample assembly. Since minimap2 only allows for 2 query files,…
Sam file is not written
Dear all, It writes the following in the log file: [08-02 01:26:25] Running Step 2: BWA … bwa_wrap /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq 6 Output3/out_1.valid.sam 0 Running BWA on trimmed reads … bwa mem -t 6 /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq | samtools view -h -F 2048 – > Output3/out_1.valid.sam However, the sam file size is…
Mapped reference id is not an id of the genome file genome_nowhitespace.fa
miRDeep2: Mapped reference id is not an id of the genome file genome_nowhitespace.fa 1 Hi everyone, I’m trying to run nf-co.re/smrnaseq pipeline and I’m having a problem with mirdeep2. Command: nextflow run nf-core/smrnaseq -profile ijcluster –input /home/794_both.fastq.gz –outdir /home/results –genome GRCh38 –protocol qiaseq –mature mirbase.org/ftp/CURRENT/mature.fa.gz –hairpin mirbase.org/ftp/CURRENT/hairpin.fa.gz Error message: Command…
Separate exogenous from endogenous transcripts using Salmon RNAseq DTU
Dear friends, We are trying to use Salmon for DTU analysis. We want to separate exogenous from endogenous transcripts by following this post www.biostars.org/p/443701/ and this paper f1000research.com/articles/7-952 We are focusing on a gene called ASCL1 (endo-ASCL1). We transduced cells with lentiviral vector containing ASCL1 ORF only (Lenti-ASCL1). There should…
Phylogenomic analysis of Syngnathidae reveals novel relationships, origins of endemic diversity and variable diversification rates | BMC Biology
Stölting KN, Wilson AB. Male pregnancy in seahorses and pipefish: beyond the mammalian model. Bioessays. 2007;29:884–96. PubMed Google Scholar Whittington CM, Friesen CR. The evolution and physiology of male pregnancy in syngnathid fishes. Biol Rev Camb Philos Soc. 2020;95:1252–72. PubMed Google Scholar Rosenqvist G, Berglund A. Sexual signals and mating…
BioInformatics Product Manager at Helix (remote)
You + Helix Helix is a place where innovators and doers gather in order to drive significant progress in population genomics. We have come together to work at the intersection of clinical care, research, and genomics. If you’re excited by the idea of making a meaningful impact and joining a…
Trimmomatic/ linux system
Trimmomatic/ linux system 1 Hi all, I am trying to remove adapters and clean my RNA-seq.gz files using Trimmomatic, loaded on a Linux system (supercomputer server) Following the steps for Pair ends reads, explained in the manual (www.usadellab.org/cms/?page=trimmomatic) java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3…
Sequence Duplication Levels failed FastQC Report
Sequence Duplication Levels failed FastQC Report 1 Hi all, I’m checking quality for my RNA-Seq through FastQC and all my fastq failed on “Per base sequence content” and “Sequence Duplication Levels”, besides warning on “Overrepresented sequences” only for read 1 files (it’s paired-end; the sequences match between samples). Below is…
Why did I achieve shorter than initial reads subset after aligned reads extraction.
Why did I achieve shorter than initial reads subset after aligned reads extraction. 1 Hello dear colleages! I have recently faced some problem. I have worked with long WGS reads. Firstly I have filtered the longest subset of reads, and aligned them to the custom sequence with several structural variants…
How to check Fasta file ASCII characters and fix encoding errors?
How to check Fasta file ASCII characters and fix encoding errors? 0 I tried building a diamond database but got this error. Error: Error reading input stream at line 180825: Invalid character (ASCII 0) in sequence How can I fix it? Is there a tool that checks for this and…
Mitogenome of a stink worm (Annelida: Travisiidae) includes degenerate group II intron that is also found in five congeneric species
Tan, M. H. et al. Comparative mitogenomics of the Decapoda reveals evolutionary heterogeneity in architecture and composition. Sci. Rep. 9, 1–16 (2019). ADS Google Scholar Zhang, Y. et al. Phylogeny, evolution and mitochondrial gene order rearrangement in scale worms (Aphroditiformia, Annelida). Mol. Phylogenet. Evol. 125, 220–231 (2018). CAS PubMed Google…
Feature count is very low using htseq-count
Feature count is very low using htseq-count 0 Hello all, I performed bbmap on my RNA-seq paired sequence data using following cmd bbmap.sh in1=J2_R1.fastq in2=J2_R2.fastq out=output_J2.sam ref=im4.fasta nodisk The header of generated sam file is @HD VN:1.4 SO:unsorted @SQ SN:k141_1006 LN:2503 @SQ SN:k141_5512 LN:5393 @SQ SN:k141_4772 LN:4387 @SQ SN:k141_3267 LN:4531…
Minimap2 options for Nanopore cDNA direct seq
Minimap2 options for Nanopore cDNA direct seq 0 Hello, I’m working with ONT RNA seq data and I used the cDNA direct seq to do the seq. I want to look for long deletions in mRNAs that are not spliced, for this, I want to use the splice option of…
Fastp file merge append | Develop Paper
Interpretation of fastq file formatwww.jianshu.com/p/39115d21ee17 Sometimes, the sequencing results of a species will return two double ended fastps.r1.fq.gz l1.fq.gzr2.fq.gz l2.fq.gzThe content of sequencing data is actually one piece, but it is divided into two parts during transmission.When we use it, we are used to merging it into a double ended…
BTG2 gene predicts poor outcome in PT-DLBCL
Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…
High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions
The protocol presented here describes a complete pipeline to analyze RNA-sequencing transcriptome data from raw reads to functional analysis, including quality control and preprocessing steps to advanced statistical analytical approaches. Welcome to the protocol of high-throughput transcriptome analysis for investigating host-pathogen interactions. This protocol is divided in the following steps….
BBTools – BioGrids Consortium – Supported Software
AllHigh-Throughput SequencingGenomicsProteomicsVisualizationOther BBTools Description a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving. Installation Use the following command to…
sorting – indexing sorted alignment file with samtools index gives “Exec format error”
I am struggling with samtools index. I already did the alignment using “bwa mem reference.fa seq.fastq > alg.sam”. The resulting sam file was converted to bam format using “samtools view -S -h -b alg.sam > alg.bam”. Next, the files were sorted by using “sort -h alg.bam >sorted.bam”. And now we…
METASnake: a Snakemake workflow to facilitate…
Introduction As sequencing technology has become cheaper and more readily accessible, the need for the increased computational capacity to process these data has become apparent. In particular, high-throughput sequencing has been particularly useful when applied to the field of metagenomics. Substantial effort has been devoted to developing software and computational…
bedtools sample with fastq input and fewer input records than requested
I’m using bedtools sample to sample reads from fastq files. I’d like to submit two feature requests: If the number of requested records is larger than the input I get ERROR: Input file has fewer records than the requested number of output records. I guess this is intentional and not…
Extracellular circulating miRNAs as stress-related signature to search and rescue dogs
Study approval was provided by the Research Ethics Committee of the University of Perugia (report n.2018-21 of 11/12/2018) according to Italian Ministry of Health legislation18. All methods were carried out following relevant guidelines and regulations and the study was carried out in compliance with the ARRIVE guidelines. Informed consent is…
Per base sequence quality – fastqc
Per base sequence quality – fastqc 2 Hi everyone, I am new to bioinformatics, I am asking a very basic question here, I have paired-end fastq data, I did fastqc, and in this per base sequence quality, few reads are in the red region, and there is no adapter and…
Genomic variation from an extinct species is retained in the extant radiation following speciation reversal
Vamosi, J. C., Magallon, S., Mayrose, I., Otto, S. P. & Sauquet, H. Macroevolutionary patterns of flowering plant speciation and extinction. Annu. Rev. Plant Biol. 69, 685–706 (2018). CAS PubMed Google Scholar Rhymer, J. M. & Simberloff, D. Extinction by hybridization and introgression. Annu. Rev. Ecol. Syst. 27, 83–109 (1996)….
Analyzing and slicing FASTQ file entries using Python
Analyzing and slicing FASTQ file entries using Python 1 I have the code pasted below for running on FASTQ file entries in order to compare specific parts and remove the redundancy of the same sequences (based on the miRNA + umi_seq combination). I save the entry IDs and then make…
nf-core/circrna
circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data Introduction nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across…
Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis
INTRODUCTION Next-generation sequencing (NGS) has revolutionized many areas of biological research (1, 2), providing ever-more data at an ever-decreasing cost. One such area is microbiome research, the study of microbes in their theater of activity using metagenomic sequencing (3). Here, deep short-read sequencing, and improving performance of long-read sequencing, are…
Kallisto mapping paired end
Kallisto mapping paired end 0 Hello everyone, I am new to bioinformatics and i am trying to use kallisto to map paired end data. However, I got an error by running the command. So does anyone know what did I do wrong here? Thank you! Here is my command: kallisto…
FastQC for paired end data
FastQC for paired end data 2 Hi, I have 36 fastq files of paired end RNA-seq so I was wondering if anyone knows how to do fastqc on paired-end data? and what is the difference between fastqc of single end data? I have done with single end data before but…
Processing two lists of files with snakemake
I want to use snakemake to do bowtie2 mapping of split read files to a reference genome, and I’d like that rule to be integrated in the general workflow. For that purpose, I first defined a rule to create a bowtie index rule build_bowtie_index: input: referenceGenomeFasta output: expand(“{name}.{index}.bt2”, index=range(1,5), name…
RNA-Seq Data Analysis Software – Isogen Lifescience
BlueBee Genomics The BlueBee platform is a production-ready, robust infrastructure that is easy to use for any researcher. It can be used for analysing data from QuantSeq, CORALL, and SLAMseq experiments. There is no prior bioinformatic experience required. Each purchased QuantSeq and CORALL kit includes a code for free data…
Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)
find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2) Install $ git clone github.com/bakerwm/find_te_ins.git $ cd find_te_ins Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11; $ bash run_pipe.sh run_pipe.sh Prerequisite minimap2 – 2.17-r974-dirty, align long…
Cell Strain-Derived Induced Pluripotent Stem Cells as an Isogenic Approach To Investigate Age-Related Host Response to Flaviviral Infection
INTRODUCTION Dengue is the most common mosquito-borne viral disease globally (1). This acute disease, which can be life-threatening, is caused by four different dengue viruses (DENVs) (DENV-1, DENV-2, DENV-3, and DENV-4). An estimated 390 million people are infected with these DENVs annually (2), and populations throughout the tropics face frequent…
Error in Rsubread featureCounts
Hi there, Excellent package! I am using it to do RNA-seq. But I encountered a small problem when using featureCounts(). The code is as follows: featureCounts( “A1.raw_1.fastq.gz.subjunc.BAM”, annot.inbuilt = NULL, annot.ext = “GCF_015227675.2_mRatBN7.2_genomic.gtf”, isGTFAnnotationFile=TRUE, isPairedEnd=TRUE, nthreads = 8 ) And it returns this: ========== _____ _ _ ____ _____ ______…
Postdoctoral position in bioinformatics – focused on single-cell immune transcriptomics – Karolinska Institute – job portal
Postdoctoral position in bioinformatics – focused on single-cell immune transcriptomics Login and apply Do you want to contribute to improving human health? We are looking for an ambitious postdoctoral fellow with solid genome-wide bioinformatics and computational biology skills to join our highly accomplished team. We offer a stimulating environment in…
Merging compressed fastq files based on a conditions defined in a csv file
Hello everybody, I have a question quite different about similar topic addressed on: Post not found I tried Paul’s bash script in the web indicated above (fastq_lane_merging.sh) adapting to my filename organization data being: #!/bin/bash for i in $(find ./ -type f -name “*.fastq.gz” | while read F; do basename…
Mapping to multiple references using bbmap
So my question comes in two parts: First of all is what I’m trying to do within reason given the tools I am using? I am investigating the shuffling effects of a recombinase on a known reporter sequence which subsequently generates libraries of unique sequences. By simulating all of the…
bwa , 2 files fastq to 1 sam
bwa , 2 files fastq to 1 sam 1 i have this problem, please, help me, I’m trying it too from Mac OS Catalina I am creating a sam file, with 2 fastq files, using bwa I apply the following command bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz > V350019555_L03_B5GHUMqcnrRAABA-556.sam…
SeqIO object get cleared away after being accessed
I’m using Biopython to parse a fastq file, and I found that the SeqIO object get cleared away once I accessed it. from Bio import SeqIO record_fastqIO = SeqIO.parse(‘SRR835775_1.first1000.fastq’,’fastq’) for record in record_fastqIO: print(record.id) This script works perfectly. But if I add one line to the script: from Bio import…
identify and remove adapter sequence
identify and remove adapter sequence 2 Hi all, I am trying to identify the adapter sequences of my ATAC-sequencing data. The way I tried to achieve this was to send the fastq file to FastQC. Hoping the sequence would be picked and showed in the report. In the report, there…
Petabase-scale sequence alignment catalyses viral discovery
Serratus alignment architecture Serratus (v0.3.0) (github.com/ababaian/serratus) is an open-source cloud-infrastructure designed for ultra-high-throughput sequence alignment against a query sequence or pangenome (Extended Data Fig. 1). Serratus compute costs are dependent on search parameters (expanded discussion available: github.com/ababaian/serratus/wiki/pangenome_design). The nucleotide vertebrate viral pangenome search (bowtie2, database size: 79.8 MB) reached processing rates…
Samtools flagstat confusing result of a merged bam file
Hi, I am a bioinformatics student and I am struggling with an issue, I had paired-end fastq files for one sample with some low-quality bases at the end and adapter contamination, so I went and I trimmed my reads with trimmomatic, it gave me 4 files that I used for…
R and sra toolkit – odd system() behavior ( R, System )
Problem : ( Scroll to solution ) In order to extract some fastq data from NCBI’s sequence read archive I’ve downloaded and installed the sra toolkit for Windows. In order to test if it is setup correctly, I opened cmd, navigated to the directory and typed in the command fasterq-dump…
The role of ATXR6 expression in modulating genome stability and transposable element repression in Arabidopsis
Significance The plant-specific H3K27me1 methyltransferases ATXR5 and ATXR6 play integral roles connecting epigenetic silencing with genomic stability. However, how H3K27me1 relates to these processes is poorly understood. In this study, we performed a comprehensive transcriptome analysis of tissue- and ploidy-specific expression in a hypomorphic atxr5/6 mutant and revealed that the…
Any alternatives to BBMap’s clumpify.sh program to optimize gzip compression?
Any alternatives to BBMap’s clumpify.sh program to optimize gzip compression? 1 I’ve had some difficulties implementing this in pipelines because it randomly fails sometimes. Are there any other programs that can be used in its stead? fastq genomics rnaseq • 201 views • link updated 7 hours ago by GenoMax…
ChaoXianSen/TrimGalore – Giters
Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data. Installation Trim Galore is a a Perl wrapper around two tools: Cutadapt and FastQC. To use, ensure that these two pieces of software are available…
Mle Application With Gekko In Python
The true power of the state space model is to allow the creation and estimation of custom models.This notebook shows various statespace models that subclass sm. That means your MAGeCK python module is installed in /home/john/.pyenv/versions/2.7.13/lib/python2.7/sitepackages.I use conda to install the latest version of. This twovolume set Diseases and Pathology…
[lh3/minimap2] Memory leak when using Python and threads
The program align.py uses mappy to align reads in Python using multiple worker threads. After loading the index the memory usage jumps up quickly to >20Gb and then continues to climb steadily through 40Gb an beyond. This issue was first discovered in bonito and isolated to mappy. The data flow…
Bwa on multiple processor
Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…
python – Missing input files after defining them in function
I am trying to do QC on RNAseq data that is tarballed. I am using Snakemake as a workflow manager and am aware that Snakemake does not like one-to-many rules. I defining a checkpoint would fix the problem but when I run the script I get this this error message…
Aligning multiple single and paired-end reads from multiple files (lanes)
Rsubread: Aligning multiple single and paired-end reads from multiple files (lanes) 0 Hello, I am new to bioinformatics and looking for some help. I have 27 files from an Illumina output. There are 4 paired end and 23 single read files. I am trying to align them using Rsubread in…
RedChIP identifies noncoding RNAs associated with genomic sites occupied by Polycomb and CTCF proteins
Abstract Nuclear noncoding RNAs (ncRNAs) are key regulators of gene expression and chromatin organization. The progress in studying nuclear ncRNAs depends on the ability to identify the genome-wide spectrum of contacts of ncRNAs with chromatin. To address this question, a panel of RNA–DNA proximity ligation techniques has been developed. However,…
tranfering sam file easy and fast way
tranfering sam file easy and fast way 0 Hi everyone I was tried to align my fastq files by hisat2 but ı couldnot able done because my computer has 4gb ram and ı get error killed. So ı was perfomed process on my friend computer but now I should solve…
Alignment report
Alignment report 0 Hi Guys, I did alignment of R1 and R2 fastq files with reference genome using bwa mem and got bam file. Now, I want to check whether the alignment is done correctly and alignment percentage,coverage etc. I run following command: bwa mem hg19.fasta R1.fastq R2.fastq | samtools…
how to align paired and unpaired fastq files of a sample using STAR?
how to align paired and unpaired fastq files of a sample using STAR? 2 Hi all I’m new to using STAR aligner. I have PE sequencing fastq files which have forward and reverse pairs and forward and reverse unpairs reads (4 files). In the manual of this tool, it seems…
sequence alignment – Help with MinION sequencing data species identification
Hi I’m new to bioinformatics and have just completed my first run on the MinION (long read sequencing Oxford Nanopore Technologies). I was hoping someone could direct me towards R packages, workflow, tutorials or guides that will help me identify species that are present in my sample mainly for fungi…
best platform to analysis chip-seq data using R
best platform to analysis chip-seq data using R 0 Hello Guys, I am wondering if you could share your experience with best platform to analysis chip-seq data from fastq files? I figured several packages but I am just wondering which one is more straightforward chip-seq • 12 views • link…
error reading fastq-files with readDNAStringset
I am trying to read a fastq-file with readDNAStringSet and having quite some trouble doing so. I need the names, aswell as the quality-scores. Right now I am using: readDNAStringSet(myFastqFile, format=”fastq”, use.names= TRUE, with.qualities = TRUE) But here i get the Error: “@” expected at beginning of line 1 I…
sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds
[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…
A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches
BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Finding fuzzy seed matches enable BLEND to find both 1) exact-matching seeds…
Average Read length
Average Read length 3 Hello Everyone! Is there a standard tool commonly used to calculate the average read length of fastq files? If yes please mention it here because I want to know the size of average reads of my fastq files so that I can decide the cutoff for…
Issue with fastq after converting phred 64 to phred 33 quality scores
Hello, I ran seqtk seq -VQ64 read1.fastq.gz > read1_phred33.fastq to convert my 64 based phred score reads to 33 based phred score phred reads. However when I attempted to run them through tophat alignment I got this error: Saw ASCII character 4 but expected 33-based Phred qual. terminate called after…
Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS
This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD, Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…
Index of /~psgendb/doc/bioLegato/blreads
Name Last modified Size Description Parent Directory – SOAPdenovo2.hints.html 2019-05-04 15:52 3.9K Trimmomatic.hints.html 2019-05-20 13:32 6.3K Trinity.hints.html 2019-04-23 11:39 2.4K adaptercheck.hints.html 2021-05-13 12:27 8.0K adaptercheck.html 2021-05-12 17:45 4.9K adaptercheck_output.png 2021-05-12 17:17 51K fastq_pair.hints.html 2019-04-05 13:16 3.4K gffcompare.hints.html 2018-07-18 14:05 3.2K …
Different FastQC results after name-sorting BAM file, sequence duplication increases
Different FastQC results after name-sorting BAM file, sequence duplication increases 1 Okay, so what I did might was stupid, but I was determined to examine on my own a lot of things, and experiment a bit with tools. At one point I decided to do this: I had BAM file…
Single-cell delineation of lineage and genetic identity in the mouse brain
STICR lentiviral library preparation and validation We synthesized a high-complexity lentivirus barcode library that encodes approximately 60–70 million distinct oligonucleotide RNA sequences (STICR barcodes). STICR barcodes comprised three distinct oligonucleotide fragments cloned sequentially into a multicloning site within the 3′ UTR of an enhanced green fluorescent protein (eGFP) transgene under…
rust-bio-tools 0.35.0 – Docs.rs
rust-bio-tools-0.35.0 is not a library. A set of ultra fast and robust command line utilities for bioinformatics tasks based on Rust-Bio. Rust-Bio-Tools provides a command rbt, which currently supports the following operations: a linear time implementation for fuzzy matching of two vcf/bcf files (rbt vcf-match) a vcf/bcf to txt converter,…
Import problem: Not a(n) QIIME1DemuxFormat file – Technical Support
Hi @emiliomastriani, Did you download the sequences form sra?This previous question may give you some help: Hi there, I am familiar with QIIME1 but relatively new with QIIME2. I have gotten my raw file in the past from a facility in the CASAVA pair ended demultiplexed format and I had…
Attempting to generate a bam.bai file but the output is not readable
Attempting to generate a bam.bai file but the output is not readable 1 Hi, I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index stage because the output files are in a non-human readable format and I believe…
hisat2-align died with signal 6 (ABRT) (core dumped)
(ERR): hisat2-align died with signal 6 (ABRT) (core dumped) 0 Hi, run hisat2 ,I encountered an error. hisat2-build -p 10 ~/public_data/genome/Pt_V1.0.fa genome 1>hisat2-build.log 2>&1 ~/software/hisat2-2.2.0/hisat2 -x genome -1 ~/data/clean/NC1_5_clean_R1.fastq.gz -2 ~/data/clean/NC1_5_clean_R2.fastq.gz -S NC1_5.sam 1>NC1_5.log 2>&1 cat NC1_5.log terminate called after throwing an instance of ‘std::bad_alloc’ what(): std::bad_alloc (ERR): hisat2-align died…
Find right adapter sequence for trimming
Find right adapter sequence for trimming 0 Hello everyone I am newly start to working RNAseq analysis. I am trying to clean single end reads data according to fastqc result. It was resulted like in example as SRR309133 I was tried Illumina Adapter Sequences find it there.But after trimming result…
BBSplit ambiguous dataset analysis
BBSplit ambiguous dataset analysis 1 I have used bbsplit to split a metagenomic dataset into reads mapping to three genomes a, b, c. bbsplit.sh in1={fastq_1} in2={fastq_2} ref={ref_str} ambiguous2=split basename={out_path}out_split_%.sam If I want to identify which ambiguous reads align to ‘a’ and any other genome – is this only ‘ambiguous_a’? or…
Single end read adapter trimming via flexbar
Single end read adapter trimming via flexbar 0 Hi guys I am beginner in sequence analysis.I am trying to trimming adapter because ı have seen that my data has adapters end of sequences according to multiqc result. Its is single end reads and ı have tried this commend in below…
Error with file guillaumeKUnitigsAtLeast32bases_all.fasta, kUnitigLengths.txt is of size 0, must be at least of size 1.
Hello, I am trying running an assembly with MaSuRCa but am getting an error at the step: “Computing super reads from PE”. here’s the output with the error: [xxxx@vic Bovidae]$ cd Assembly_test/ [xxxx@vic Assembly_test]$ ls assemble.sh guillaumeKUnitigsAtLeast32bases_all.fasta.tmp masurca_assembly.o4302352 meanAndStdevByPrefix.pe.txt pe_data.tmp quorum_mer_db.jf work1 environment.sh guillaumeKUnitigsAtLeast32bases_all.jump.fasta masurca_config.txt pe.cor.fa pe.renamed.fastq super1.err ESTIMATED_GENOME_SIZE.txt masurca_assembly.e4302352…
Why single cell R2 fastq have no read identified by bowtie2 ?
Why single cell R2 fastq have no read identified by bowtie2 ? 0 When we input R2 fastq.gz into bowtie2, human sequence was not removed ( ${base}_host_removed is zero). for i in $(find ./ -type f -name “.fastq.gz” | while read F; do basename $F | rev | cut -c…
qiime2-import data from non-working directory – User Support
Hello, qiime2 users community! I have the following set-up: a huge collection of .fastq files which I would like to process with the dada2 pipeline a remote cluster of servers with the separated storage where those files are stored, and the working machines for computing. Question: Is it possible to…
16s rRNA Sequencing Meta-analysis Reconstruction Tool (using mothur).
16SMaRT is a bioinformatics analysis pipeline for 16s rRNA gene sequencing data. 16SMaRT is a “one-click” solution towards performing microbial community analysis of amplicon sequencing data. 16SMaRT aims to be your go-to solution for your next microbiome/metagenomics project. The primary objective of 16SMaRT analysis is to determine what genes are…
Genome Bioinformatics Analyst – Pittsburgh
**Description** UPMC Presbyterian is hiring a Genome Bioinformatics Analyst to join the Molecular and Genomic Pathology Laboratory (MGP) team! This role will work a daylight schedule Monday through Friday. No weekends or holidays are required! The Molecular and Genomic Pathology Laboratory (MGP) is a dynamic state-of-the-art clinical laboratory that prides…
Problem with using flagstat after bowtie2 alignment
I’m running bowtie2 to align multiple samples to one reference genome, and then run samtools flagstats to output the results. All but two samples have aligned and I’ve managed to run flagstat on them. For those two samples, when I run flagstat, I first get: [W::bam_hdr_read] EOF marker is absent….
how to generate reference genome from paired-end reads
how to generate reference genome from paired-end reads 0 Hi, I have a bacteria sample and I want to align this sample to a reference genome. However, a reference genome for this particular strain is not available. So, I need to generate a reference genome from my 2X150 paired-end reads…
get rRNA FASTA file for a particular bacteria
get rRNA FASTA file for a particular bacteria 0 Hey all, I was trying to find a way to get all rRNA (5S, 16S and 23S) FASTA sequences for a particular bacteria (B. thetaiotaomicron VPI-5482, which is the type strain). I wanted this file so that I could use something…
bash script not a valid identifier
bash script not a valid identifier 2 I am trying to run bash script, but it gives this error ( `$fastq’: not a valid identifier). #!/bin/bash database=”kraken2_database” fastq=”fastq_dir” for $fastq in $(ls *_R1.fastq.gz | sed ‘s/_R1.fastq.gz//’) do kraken2 –db $database –threads 8 –memory-mapping –use-names –confidence 0.1 –report taxonomy_reads/${fastq}_kraken2.tax –paired ${fastq}_R1.fastq.gz…
Bash script to help with print Name of reads that only have query subsequence or its verse complement and position of first occurance of this subsequence in read and output of all this in tab separat
Bash script to help with print Name of reads that only have query subsequence or its verse complement and position of first occurance of this subsequence in read and output of all this in tab separat 0 Create bash script that receives name of fastq fastq file and query subsequence…
Cell hashing vs barcoding
Cell hashing vs barcoding 1 Hi everyone, I cant seem to unable to differentiate between cell hashing , for example cite-seq.com/cell-hashing/, or cellplex(by 10x) with cell barcoding (used by 10x) for example. What is it with cell hashing that makes it different? barcoding hashing • 556 views Cell barcoding means…
Trimming DNAStringSet
Trimming DNAStringSet 1 Hello, I am currently dealing with the problem of reading in a Fastq-File with “readDNAStringset”, trimming the Sequences and then writing them in to a new fastq-file. The reading of the fastq-file with “readDNAStringSet” is working just fine. I am then trying to trim a fixed length…
Adding new taxa to a Kraken2 db
Hi, can someone please check if these following steps are correct? I am trying to add to my plants kraken2 db (“plant_original”) few taxa genomes that I have downloaded from the NCBI website (alnus_glutinosa_GCA_003254965.1.fna, carpinus_fangiana_GCA_006937295.1.fna etc..). for file in *.fna do kraken2-build –add-to-library $file –db PATH/kraken/plant_original done Masking low-complexity regions…
Using comm to make a list of files that haven’t yet been processed
Using comm to make a list of files that haven’t yet been processed 0 I’m using comm to work out which files have already been processed and which are still to do. The input and output filenames are a little different, so I’ve used basename and sed to strip away…
UMI extraction from 10X visium spatial transcriptome data
UMI extraction from 10X visium spatial transcriptome data 0 Hello everyone I have to analyse visium spatial transcriptome (ST) sequencing data (2 x150 bp) . I want to extract Spatial barcode and UMI from Read1 in order to reduce the read1 length from 150bp to 28 bp (16 bp Spatial…
How to handle VCFs from the same sample but using different aligners and variant callers?
Hi, I’m using whole-exome sequencing (WES) for somatic variant calling. During the process, I tried to follow the approach described here: pubmed.ncbi.nlm.nih.gov/28420412/ Basically my workflow is as follows: FASTQ preprocessing: Using 2 aligners (BWA-MEM, Bowtie2) BAM calibration Variant calling: Using 3 software (Mutect2, Strelka2, Lancet) Variant filtering: I keep just…