Tag: fastq

UCL-BLIC/rnaseq – Giters

Introduction UCL-BLIC/rnaseq is a bioinformatics analysis pipeline used for RNA sequencing data, modified to add kallisto. The workflow processes raw data from FastQ inputs (FastQC, Trim Galore!), aligns the reads (STAR or HiSAT2), generates gene counts (featureCounts, StringTie) as well as kallisto abundance files, and performs extensive quality-control on the…

Continue Reading UCL-BLIC/rnaseq – Giters

Confusion regarding manual inclusion of read group information from fastq files

I have recently received a collection of paired-end fastq files (WES) from our collaborators. I am following the GATK best practices workflow. I have completed the alignment, sorting&indexing step and generated a list of bam files. However, upon further inspection, I found out that the bam files do not have…

Continue Reading Confusion regarding manual inclusion of read group information from fastq files

Confusion regarding manual inclusion of read group information from fastq files

I have recently received a collection of paired-end fastq files (WES) from our collaborators. I am following the GATK best practices workflow. I have completed the alignment, sorting&indexing step and generated a list of bam files. However, upon further inspection, I found out that the bam files do not have…

Continue Reading Confusion regarding manual inclusion of read group information from fastq files

How to assess structural variation in your genome, and identify jumping transposons

Prerequisites Data An annotated genome Long reads Repeat annotation Software minimap2 samtools bedtools – for comparisons only tabix – for visualization only Installation 1 2 3 /work/gif/remkv6/USDA/04_TEJumper conda create -n svim_env –channel bioconda svim source activate svim_env Map your long reads to your genome with minimap My directory locale 1…

Continue Reading How to assess structural variation in your genome, and identify jumping transposons

vanheeringen-lab/seq2science – Giters

Seq2science is the attempt of the van heeringen lab to generate a collection of generic pipelines/workflows which can be used by complete beginners to bioinformatics and experienced bioinformaticians alike. Please take a look at our docs for help with installation, how to run it, and best practices. Our supported workflows:…

Continue Reading vanheeringen-lab/seq2science – Giters

Nebula Genomics Black Friday & Cyber Monday Deal: Save $100!

Nebula Genomics has a Black Friday and Cyber Monday deal! DNA tests make great holiday gifts and the biggest sale of the year is here just in time for Black Friday and Cyber Monday! Save $100 on the 30x Deep Test Kit on 11/26 -11/29. Also now through November 30,…

Continue Reading Nebula Genomics Black Friday & Cyber Monday Deal: Save $100!

STAR alignment in a full directory

STAR alignment in a full directory 0 Hi! I’m working with STAR and I would like to align multiple file, but separately. I have file paires like these two: Dros_01_S48_L001_R1_001.fastq.gz Dros_01_S48_L001_R2_001.fastq.gz etc. Only the S[number] changes and the R1 and R2 in the names. I have a code, what I…

Continue Reading STAR alignment in a full directory

Bioinformatics Scientist at Infectious Disease Institute

IDI seeks to hire a Bioinformatics Scientist (BS) for the centre. The BS will be a fulltime staff who is familiar with the application of computational and biotechnology capabilities to biomedical and public health problems like genetics, clinical and medical research, as well as other data intensive analyses. By coordinating…

Continue Reading Bioinformatics Scientist at Infectious Disease Institute

#1000577 – abyss: autopkgtest regression on armhf and i386: Floating point exception

#1000577 – abyss: autopkgtest regression on armhf and i386: Floating point exception – Debian Bug report logs Reply or subscribe to this bug. Toggle useless messages Report forwarded to debian-bugs-dist@lists.debian.org, debian-ci@lists.debian.org, Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>:Bug#1000577; Package src:abyss. (Thu, 25 Nov 2021 10:24:03 GMT) (full text, mbox, link). Acknowledgement sent…

Continue Reading #1000577 – abyss: autopkgtest regression on armhf and i386: Floating point exception

No cell names (colnames) names present in the input matrix

CreateSeuratObject: No cell names (colnames) names present in the input matrix 2 I did scRNA-seq using patient PBMC, 4 biological replicates. Each PBMC was tagged with 4 different types of hashtag oligos and subjected to multiplexing. Using fastq files, cellranger multi was performed and 4 matrices were generated. Using Seurat,…

Continue Reading No cell names (colnames) names present in the input matrix

FastQC Error Problem

FastQC Error Problem 3 I would really like to use FastQC for my project but am getting the following error message when I try to run it on my Ubuntu server 15.04 bio@ubuntu:~$ fastqc & [1] 716 rafay@ubuntu:~$ Exception in thread “main” java.awt.HeadlessException: No X11 DISPLAY variable was set, but this…

Continue Reading FastQC Error Problem

empty fastq files created by docker bcl2fastq2 v2.20 OSX

empty fastq files created by docker bcl2fastq2 v2.20 OSX 0 HI, I installed a following docker image, REPOSITORY TAG IMAGE ID CREATED SIZE zymoresearch/bcl2fastq latest 037f216c2523 13 months ago 117MB and run a following command; docker run -d –name bcl2fastq -v /Volumes/Aura2/bcl_NU/170720_NB501488_0132_AH5V32BGX3:/mnt/run -v /Volumes/Aura2/output:/mnt/out zymoresearch/bcl2fastq:2.20 -R /mnt/run -o /mnt/out/Data/Intensities/BaseCalls/Alignment_1 –barcode-mismatches…

Continue Reading empty fastq files created by docker bcl2fastq2 v2.20 OSX

how to remove host reads from other microbe reads

Hi wonderful people, I have been analyzing data from a paper, where I have used GEM3Mapper to get SAM files. Now, I have to remove host reads. There are 2 advices I got from biostars : $ samtools view -buSh -f 4 x.sam | samtools fastq – | cat ->…

Continue Reading how to remove host reads from other microbe reads

how to remove host reads from other microbe reads

Hi wonderful people, I have been analyzing data from a paper, where I have used GEM3Mapper to get SAM files. Now, I have to remove host reads. There are 2 advices I got from biostars : $ samtools view -buSh -f 4 x.sam | samtools fastq – | cat ->…

Continue Reading how to remove host reads from other microbe reads

Trimming only custom adapter sequences

If you’ve already made a custom FASTA file for your adapters, can you post it, or a portion of it? Another question: do you get any output at all from the command you supplied? Your command looks OK as far as I can tell. Perhaps it doesn’t like your supplied…

Continue Reading Trimming only custom adapter sequences

Trimming only custom adapter sequences

If you’ve already made a custom FASTA file for your adapters, can you post it, or a portion of it? Another question: do you get any output at all from the command you supplied? Your command looks OK as far as I can tell. Perhaps it doesn’t like your supplied…

Continue Reading Trimming only custom adapter sequences

Bowtie2 alignment using a large index (*.bt2l)

Bowtie2 alignment using a large index (*.bt2l) 0 Hello! I have used Bowtie2 to in the past successfully to index and align reads on R. I currently have a job that requires using large index. bowtie2_build returns successful index, but it has the unique bt2l extension (as mention in bowtie2…

Continue Reading Bowtie2 alignment using a large index (*.bt2l)

Gzip output of fasterq-dump

Gzip output of fasterq-dump 0 Hello everyone, I have always used fastq-dump to download raw data from the SRA, with the caveat that it was very slow. I recently switched to fasterq-dump, which is great in terms of speed, but its inability to gzip the fastq files on the fly…

Continue Reading Gzip output of fasterq-dump

Retrieve and count variable barcodes from pooled sequencing fastq

Retrieve and count variable barcodes from pooled sequencing fastq 3 I have a large fastq file with 100-base reads from a pooled barcoding experiment. This is not data I generated so I have limited options. The barcodes are 21-mer and there are up to 100,000 different barcodes in the FASTQ….

Continue Reading Retrieve and count variable barcodes from pooled sequencing fastq

How could I generate a gi_taxid_nucl.dmp file similar to the one previously hosted by NCBI?

How could I generate a gi_taxid_nucl.dmp file similar to the one previously hosted by NCBI? 0 Background. I’m trying to use a tool called centrifuge to identify potential genus and species in a given set of FASTQ files. It works with their provided indices, but these indices are out of…

Continue Reading How could I generate a gi_taxid_nucl.dmp file similar to the one previously hosted by NCBI?

How can I produce gene level quantification using Salmon pseudo-aligner?

How can I produce gene level quantification using Salmon pseudo-aligner? 3 Hi ! I am using Salmon in order to permform pseudo-alignment on paired end rna-seq data. I want a gene quantification but i obtain files cith transcripts quantification : command line used : salmon quant -i Transcriptome_GH38_release_92/Homo_sapiens.GRCh38.92.cdna.ncrna.fa_quasi_index/ -l A…

Continue Reading How can I produce gene level quantification using Salmon pseudo-aligner?

Error: I have no name! occurs when trying to run ~$ sudo docker run –volume – General Discussions

HiI’m a new user to linux and docker.My base OS is Linux Ubuntu 18.04.6 LTS. I have a file that I want to analyse using Docker with various programs. First I created a Dockerfile using ~$ sudo docker build -t nano_tools_debian(the # comments are just for me giving myself some…

Continue Reading Error: I have no name! occurs when trying to run ~$ sudo docker run –volume – General Discussions

query on filtering/dropping samples based on total reads (yield)

query on filtering/dropping samples based on total reads (yield) 0 Hi, This must be a pretty straightforward question, but just wanted the views of the community. We have sequenced (paired-end) multiple samples in a target enrichment experiment, and the samples have variable number of total reads. What should be the…

Continue Reading query on filtering/dropping samples based on total reads (yield)

Trim 100bp PE sequencing to 50bp reads

Trim 100bp PE sequencing to 50bp reads 2 Hello, we’re doing some QC for future sequencing and want to have an empirical comparison of 100bp SE reads with 50bp PE reads. Starting with 100bp PE reads, how can I trim the fastq file to the first 50 bases? (i.e. retain…

Continue Reading Trim 100bp PE sequencing to 50bp reads

Pile up reads without reference?

Pile up reads without reference? 1 Hi everyone, I have multiple groups of reads (count < 1000) which derived from a number of novel junctions, with each reads contain minimum ~10bp overhang on the other side (i.e., exon). Because the number of random links between exon-exon can be quite large…

Continue Reading Pile up reads without reference?

Pile up reads without reference?

Pile up reads without reference? 1 Hi everyone, I have multiple groups of reads (count < 1000) which derived from a number of novel junctions, with each reads contain minimum ~10bp overhang on the other side (i.e., exon). Because the number of random links between exon-exon can be quite large…

Continue Reading Pile up reads without reference?

STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice?

STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice? 0 Hi Currently I have pair-end mouse RNA-seq data. 3 Tissues with 17 samples/tissue. Length of the sequence is from 100bp to 150bp. My goal is to perform DEG after the alignment. This is…

Continue Reading STAR for mouse RNA-seq alignment, which parameters should I use besides the basic ones in practice?

Weird output of geneCount for STAR?

Weird output of geneCount for STAR? 1 Hi I’m using STAR to perform alignment and my goal is to do the DEG analysis in the future. The parameter I set up as follows: Step 1: STAR –runThreadN 12 –runMode genomeGenerate –genomeDir genomedir –genomeFastaFiles ./ref/GRCm39.primary_assembly.genome.fa –sjdbGTFfile ./ref/gencode.vM27.primary_assembly.annotation.gtf –sjdbOverhang 100 Step 2:…

Continue Reading Weird output of geneCount for STAR?

Weird output of geneCount for STAR?

Weird output of geneCount for STAR? 1 Hi I’m using STAR to perform alignment and my goal is to do the DEG analysis in the future. The parameter I set up as follows: Step 1: STAR –runThreadN 12 –runMode genomeGenerate –genomeDir genomedir –genomeFastaFiles ./ref/GRCm39.primary_assembly.genome.fa –sjdbGTFfile ./ref/gencode.vM27.primary_assembly.annotation.gtf –sjdbOverhang 100 Step 2:…

Continue Reading Weird output of geneCount for STAR?

How to prepare HiCUP output as input for HOMER?

HiC analysis: How to prepare HiCUP output as input for HOMER? 0 Hi everyone, I am new to HiC and using HiCUP to analyse HiC data (I will try other pipelines and combination of software eventually). /tools/hicup –zip –bowtie2 /tools/bowtie2-2.3.4.1-linux-x86_64/bowtie2 –index /hg38/hg38 –digest /hg38/Digest_hg38_HindIII_None_15-12-15_27-10-2021.txt R1.fastq R2.fastq For now, I aligned…

Continue Reading How to prepare HiCUP output as input for HOMER?

Error after STAR mapping

Error after STAR mapping 0 Hi, I’m doing the STAR mapping, but I get the bam files with some problems.When I use the command samtools flagstat SRR7195620_2.fastq.gz_Aligned.sortedByCoord.out.bam to see the details of the bam file,it shows this: 3266075 + 0 in total (QC-passed reads + QC-failed reads) 1044500 + 0…

Continue Reading Error after STAR mapping

Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice?

Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice? 0 Hi Currently I have pair-end mouse RNA-seq data. 3 Tissues with 17 samples/tissue. Length of the sequence is from 100bp to 150bp. My goal is to perform DEG…

Continue Reading Using STAR for mouse RNA-seq alignment, which parameters of STAR besides the basic ones would be better to use in practice?

GATK4 stripping header from .bam???? What the heck? : bioinformatics

Hi all. I have a problem. Code posted below for those who want to take a look. I have a series of 167 .bam files I need to variant call for my project. Aside from them being an absolute nightmare to work with on other grounds, a new problem has…

Continue Reading GATK4 stripping header from .bam???? What the heck? : bioinformatics

How to demultiplex a single indexed library on a dual indexed flow cell?

How to demultiplex a single indexed library on a dual indexed flow cell? 0 Hello, I need to analyze data from single cell RNA seq. I normally use cell ranger mkfastq to make the fastq files. However, this time I need to make fastq files for a project that contains…

Continue Reading How to demultiplex a single indexed library on a dual indexed flow cell?

BWA MEM anfter trimming

BWA MEM anfter trimming 0 Hi there, I have WES data. The read length is 35-101 bp. If I trim the sequence, I get 15-70 length sequence. I trimmed based and performed illumicaclip order on trimmomatic. The question is as follows: I should get a mean coverage of X100, (based…

Continue Reading BWA MEM anfter trimming

Read coverage for specific coordinates

Read coverage for specific coordinates 0 Hi all, I’ve N samples of Stomach Cancer having good read depth and I have run rMATS and DEXSeq on them. I have got the results successfully. I have a test set (30 samples) for which read depth is not good, it’s around 1…

Continue Reading Read coverage for specific coordinates

Read coverage for specific coordinates

Read coverage for specific coordinates 0 Hi all, I’ve N samples of Stomach Cancer having good read depth and I have run rMATS and DEXSeq on them. I have got the results successfully. I have a test set (30 samples) for which read depth is not good, it’s around 1…

Continue Reading Read coverage for specific coordinates

Velvet – ins_length auto

Hi all, I have obtained two illumina MiSeq 2×75 paired-end read files, one forward and one reverse. oenopla-reads1.fastq & oenopla-reads2.fastq Then I performed genome assembly using Velvet. I found out kmer of 67 produces the best N50 and maximum length. Since I do not know the insert length, I declared…

Continue Reading Velvet – ins_length auto

Velvet – ins_length auto

Hi all, I have obtained two illumina MiSeq 2×75 paired-end read files, one forward and one reverse. oenopla-reads1.fastq & oenopla-reads2.fastq Then I performed genome assembly using Velvet. I found out kmer of 67 produces the best N50 and maximum length. Since I do not know the insert length, I declared…

Continue Reading Velvet – ins_length auto

Multi-fasta file for gffread

Multi-fasta file for gffread 0 Hey Guys, I’m having a problem trying to extract the transcripts from a merged StringTie .gtf file with gffread. I have downloaded the cDNA fastq file from ENSEMBL and tried to run the following command: gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf However I’m getting the…

Continue Reading Multi-fasta file for gffread

Empty BAM file in DANPOS3

Error: Empty BAM file in DANPOS3 0 Hi everyone, I am currently trying to run the command in DANPOS3 – which is a software to analyse nucleosome positions and call peaks. This is the command $python3 danpos.py dpos <filename.bam> [optional parameters] The bam file I am using was created using…

Continue Reading Empty BAM file in DANPOS3

samtools – How to analyze IGV alignment

I’m working on a project where I am analyzing the performance of an alignment workflow. My goal is to find regions in the resulting BAM file where there are outstanding discrepancies or anything that indicates my assembly/alignment has “mistakes”. My workflow so far: input paired end FASTQ files (human) into…

Continue Reading samtools – How to analyze IGV alignment

Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

1. Muller, H. J. The relation of recombination to mutational advance. Mutat. Res. Mol. Mech. Mutagen. 1, 2–9 (1964). Google Scholar  2. Maynard Smith, J. The Evolution of Sex (Cambridge University Press, 1978). Google Scholar  3. Avise, J. C. Clonality (Oxford University Press, 2008). Google Scholar  4. Hamilton, W. D.,…

Continue Reading Interploidy gene flow involving the sexual-asexual cycle facilitates the diversification of gynogenetic triploid Carassius fish

Add Cigar string and Template Length to Read Name

Add Cigar string and Template Length to Read Name 1 Hi all, I need to convert a BAM file to Fastq format, but I don’t want to loose the Cigar and TLen information. My idea is to edit each read name in the BAM file, by appending both Cigar and…

Continue Reading Add Cigar string and Template Length to Read Name

ONT minION reads demultiplexing

ONT minION reads demultiplexing 0 Dear All, Peace be with you. I am a beginner with ONT minION. we have run an experiment and unfortunately selected the wrong kit id, which resulted in unclassified data (not separated on the basis of barcodes though we used it). now we have some…

Continue Reading ONT minION reads demultiplexing

Drug resistance mutations & genetic diversity of HIV1 in ART

Introduction At the end of 2018, 19.5 million people living with HIV were accessing ART globally.1 However, virological failure and development of drug resistance are becoming a bottleneck for the success of ART program. A global study involving 36 countries and 1926 patients with treatment failure from 1998–2015 reported that…

Continue Reading Drug resistance mutations & genetic diversity of HIV1 in ART

How to convert base called fast5 to fastq

How to convert base called fast5 to fastq 0 Hi, I received some basecalled fast5 file from company, and want to convert it to fastq. I tried poretools, but it doesn’t work. I am currently using guppy to do so as following command: guppy_basecaller –flowcell FLO-MIN106 –kit SQK-RAD004 -r -i…

Continue Reading How to convert base called fast5 to fastq

Using python FlashText to do pattern matching in nucleotide sequences

Using python FlashText to do pattern matching in nucleotide sequences 0 Hi all, I’m playing with the idea of using FlashText (instead of RegEx) to do some pattern finding in nucleotide sequences. My idea came from the massive speed up seen in the post below: dev.to/vi3k6i5/regex-was-taking-5-days-to-run-so-i-built-a-tool-that-did-it-in-15-minutes-c98?ref=codebldr My basic idea is…

Continue Reading Using python FlashText to do pattern matching in nucleotide sequences

very low mapping rate of polysome profling seq

very low mapping rate of polysome profling seq 0 when i use sequencing data from SRR7695423 frist, i use fastp to trim raw reads: fastp -i r1_fastq.gz -I r2_fastq.gz -o r1_trimmed_fastq.gz -O r2_trimmed_fastq.gz then i use STAR to align: STAR –runThreadN 20 –genomeDir index –readFilesIn r1_trimmed_fastq.gz r2_trimmed_fastq.gz –readFilesCommand ‘zcat’ –outFileNamePrefix…

Continue Reading very low mapping rate of polysome profling seq

very low mapping rate of polysome profling seq

very low mapping rate of polysome profling seq 0 when i use sequencing data from SRR7695423 frist, i use fastp to trim raw reads: fastp -i r1_fastq.gz -I r2_fastq.gz -o r1_trimmed_fastq.gz -O r2_trimmed_fastq.gz then i use STAR to align: STAR –runThreadN 20 –genomeDir index –readFilesIn r1_trimmed_fastq.gz r2_trimmed_fastq.gz –readFilesCommand ‘zcat’ –outFileNamePrefix…

Continue Reading very low mapping rate of polysome profling seq

Purge duplicates from hifiasm assembly v1.0 (HiFi genome assembly stage 3)

0 HiFiASM 1o assembly Primary assembly input from hifiasm workflow, in FASTA format. 1 HiFi reads as FASTQ Raw HiFi reads input, in FASTQ format. 2 Step 1. Base-level coverage toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy3 3 Step 1. Run minimap2 to align pacbio data and generate paf files, toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.20+galaxy2 4 Step 1. Split an…

Continue Reading Purge duplicates from hifiasm assembly v1.0 (HiFi genome assembly stage 3)

Trimmomatic failing to cut known adapter

Hi all, Usually get on well with Trimmomatic but am having issues removing adapters from some paired-end RNA-seq data. Here’s my process so far: 1) Put together a PE adapter fasta for the adapters described by the sequencing company; ran trimmomatic; no reads dropped. 2) Ran Trim_galore! to automatically detect…

Continue Reading Trimmomatic failing to cut known adapter

How to remove nextera with fastp

How to remove nextera with fastp 0 When run Fastp with fastp -i in.R1.fastq.gz -o out.R1.fastq.gz -I in.R2.fastq.gz -O out.R2.fastq.gz –detect_adapter_for_pe –disable_quality_filtering –length_required 15 Fastq still detects nextera transposase sequences in the output. Is there a way to remove them with Fastp? rnaseq nextera fastp ngs • 13 views Read…

Continue Reading How to remove nextera with fastp

file size of pair-end sequencing fastq

file size of pair-end sequencing fastq 1 Hello everyone! Do you think I should expect R1 and R2 fastq files have similar, or even equal, size? I mean the length of file measured in bytes. Thanks! RNA-SEQ • 28 views • link updated 39 minutes ago by liorglic &utrif; 460…

Continue Reading file size of pair-end sequencing fastq

3′ bias on polyA RNAseq goes wrong!

Hi! I’m checking the output from gene_bodyCoverage.py using a bed file with housekeeping genes. We sequenced some samples using a polyA capture library (all in the same kit, same sequencing run), so I would expect a 3′ bias. However, the output I got was somewhat mixed – I see some…

Continue Reading 3′ bias on polyA RNAseq goes wrong!

Error normalization process Trinity

Error normalization process Trinity 0 Hello I hope everyone is well. I have a question with this error in the normalization of the data. I use this command ./Trinity –seqType fq –left /home/danielurrea/Escritorio/Paula_Manuela/SRR7942798_1.fastq –right /home/danielurrea/Escritorio/Paula_Manuela/SRR7942798_2.fastq –CPU 6 –max_memory 8G The error thi’s here. Error, no reads made it to the…

Continue Reading Error normalization process Trinity

alignment using bowtie2

alignment using bowtie2 1 Hello everyone, Just quick question on Alignment using in the current version of bowtie2. I am doing alignment for the sequence read indicated below(30.fastq). However, getting the warning below. Could not figure it out that does it mean? Is that mean I shall not give name…

Continue Reading alignment using bowtie2

why do i get this gnu parallel warning (Trimmomatic/Bowtie2)

why do i get this gnu parallel warning (Trimmomatic/Bowtie2) 0 I am trying to use Trimmomatic and Bowtie2 on a sample (metagenomic) to do some quality filtering and contaminant screening, the sample has been stitched. when I try to run the parallel command on my stitched read I receive this…

Continue Reading why do i get this gnu parallel warning (Trimmomatic/Bowtie2)

Why does featurecounts give me an output file with only 0s?

Why does featurecounts give me an output file with only 0s? 0 Hello, I’m trying to run featurecounts on my .bam files, but the resulting file yields only 0s in every row and column. Here are the steps I have taken so far: (de novo) Assembled 40 transcripts from RNASeq…

Continue Reading Why does featurecounts give me an output file with only 0s?

Malformed walker argument using MarkDuplicatesSpark

Malformed walker argument using MarkDuplicatesSpark 1 I am creating my own NGS pipeline from illumina-fastq file to vcf. This is for pure learning purposes. When I run the following code everything is ok java -Xmx4000m “$javatmp” -jar “$picardpath” SortSam INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/sam/1.sam OUTPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam SORT_ORDER=coordinate COMPRESSION_LEVEL=5 java -Xmx4000m “$javatmp” -jar “$picardpath” MarkDuplicates INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam…

Continue Reading Malformed walker argument using MarkDuplicatesSpark

How does the DNA-seq machine know which strand that is forward/reverse?

How does the DNA-seq machine know which strand that is forward/reverse? 1 Hi! As the title says, how does the sequencing machine know which strand is the forward/reverse? Let’s say I have a fastq file of a WGS, is that presented in a single strand with the forward strand followed…

Continue Reading How does the DNA-seq machine know which strand that is forward/reverse?

How does the DNA-seq machine know which strand that is forward/reverse?

How does the DNA-seq machine know which strand that is forward/reverse? 1 Hi! As the title says, how does the sequencing machine know which strand is the forward/reverse? Let’s say I have a fastq file of a WGS, is that presented in a single strand with the forward strand followed…

Continue Reading How does the DNA-seq machine know which strand that is forward/reverse?

In the NGS pipeline, why read are sorted before marking duplicates?

In the NGS pipeline, why read are sorted before marking duplicates? 0 I am creating my own NGS pipeline (from Illumina fastq to vcf file). I am using best practices GATK and the pipeline already created in the clinical lab I am working. I have seen that the fastq is…

Continue Reading In the NGS pipeline, why read are sorted before marking duplicates?

Single cell RNA-seq analysis

Single cell RNA-seq analysis 2 Hi guys I am new to biological field and single cell RNA seq analysis , and I do not have idea how to start I am going to analysis single cell RNA seq which in fastq format to do clustering and get gene expression level,…

Continue Reading Single cell RNA-seq analysis

how to determine a fastq is phred+33 or phred+64

how to determine a fastq is phred+33 or phred+64 1 Hello everyone! here is a simple question: If I got a fastq from public database, is it possible to determine whether its quality score is phred + 33 or phred + 64 based on the file itself? Thanks! Pei RNA-seq…

Continue Reading how to determine a fastq is phred+33 or phred+64

a primer in overrepresented sequences of RNAseq data

a primer in overrepresented sequences of RNAseq data 0 Hello I got pair-end RNA sequence data in Novaseq 6000, using SMARTer Stranded RNA library by outsourcing. I checked the quality by fastQC, then found below overrepresented sequence in only reverse fastq. GCCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTACGCGTTAGTGTAG Illumina Single End PCR Primer 1 (97% over…

Continue Reading a primer in overrepresented sequences of RNAseq data

Sending Fasta files : bioinformatics

What is the source of your fasta file? If it’s your personal data from sequencing I would not give it out without ensuring they are keeping anonymity and proper ethical considerations, like who else will this file be shared with and how. Many publications have Geo datasets or SRA archives…

Continue Reading Sending Fasta files : bioinformatics

convert multiple files simultaneously using samtools

convert multiple files simultaneously using samtools 1 Hello everyone, I am new to bioinformatics. I have several files in BAM format and I want to convert them to fastq using samtools. Is there any way to convert them all at once? I tried : samtools fastq * .bam > *…

Continue Reading convert multiple files simultaneously using samtools

convert multiple files simultaneously using samtools

convert multiple files simultaneously using samtools 1 Hello everyone, I am new to bioinformatics. I have several files in BAM format and I want to convert them to fastq using samtools. Is there any way to convert them all at once? I tried : samtools fastq * .bam > *…

Continue Reading convert multiple files simultaneously using samtools

Snakemake Megahit error

Snakemake Megahit error 0 Hello everyone! A few days ago I started using Snakemake for the first time. I am having an issue when I am trying to run the megahit rule in my pipeline. It gives me the following error “Outputs of incorrect type (directories when expecting files or…

Continue Reading Snakemake Megahit error

Developing my own NGS pipeline

Developing my own NGS pipeline 1 I am a trainee bioinformatician working in a genomics lab. For learning proposes I want to develop my own NGS pipeline (from fastq file to VCF file). it would be great if someone could please pass me links where I can step by step…

Continue Reading Developing my own NGS pipeline

rna seq – Snakemake Fastqc: “SyntaxError in line [#] of [working directory]/Snakefile: Multiple run or shell keywords in rule run_fastqc.”

I am trying to check the quality of RNA-Seq data from Illumina using fastqc in snakemake in a conda environment. I get the error “Multiple run or shell keywords in rule run_fastqc”. Snakefile: rule run_fastqc: input: “/[RNA fastq file path]/well01_S001_L001_R1_001.fastq.gz” output: “./[out path]/well01_S001_L001_R1_001.fastqc.html”, “./[out path]/well01_S001_L001_R1_001.fastqc.zip” wrapper: “0.79.0/bio/fastqc” shell: “fastqc {input}…

Continue Reading rna seq – Snakemake Fastqc: “SyntaxError in line [#] of [working directory]/Snakefile: Multiple run or shell keywords in rule run_fastqc.”

Error showing Unable to determine input files in Trimmomatic

Error showing Unable to determine input files in Trimmomatic 0 Hello all, I am trying to run trimmomatic on paired fastq file. Can anyone please explain why it is unable to determine the input files? Fastq filesSO_5492_LR_60A_BR_01_R1.fastq SO_5492_LR_60A_BR_01_R2.fastq This is the command I used Trimmomatic PE -threads 30 -basein SO_5492_LR_60A_BR_01_R1.fastq…

Continue Reading Error showing Unable to determine input files in Trimmomatic

Help understanding “sed” command in a loop

Help understanding “sed” command in a loop 0 Hi everyone, I have 2 questions: 1) I have found this script online to run Kraken2 in a loop on paired ends. Although I know it works well, because I have compared the results with another loop I have, I am not…

Continue Reading Help understanding “sed” command in a loop

MinKNOW (guppy) not having the permission to basecall or demultiplex from a different location

Hello everyone. I am still quite new at bioinformatics and I am sorry if this seems like a silly question. I used Nanopore MinION to sequence a genome, and now I am trying to sequence another genome and I launched MinKNOW for the first time since March. As a little…

Continue Reading MinKNOW (guppy) not having the permission to basecall or demultiplex from a different location

Picard vs Samtools converting CRAM to FASTQ

Picard vs Samtools converting CRAM to FASTQ 0 I need to convert my CRAM files to FASTQ to complete an analysis. I have been trying to do this via GATK and Picard, but I have repeatedly been getting an “out of memory” error even as I have increased allocated memory…

Continue Reading Picard vs Samtools converting CRAM to FASTQ

SPAdes did not assemble the genome completely

SPAdes did not assemble the genome completely 2 Hi everyone. I have a goal to assemble the SARS-CoV-2 having forward and reverse FASTQ reads. I have used the SPAdes tool and the best result I managed to receive is a FASTA with a bunch of scaffolds, namely 38 pieces. What…

Continue Reading SPAdes did not assemble the genome completely

How can I load numerous files from a config file in Snakemake? Is it worth it ?

How can I load numerous files from a config file in Snakemake? Is it worth it ? 0 Hello there! A few days ago I started using Snakemake for the first time. Mainly I want to use fasterq-dump to download a big number of files from NCBI and I do…

Continue Reading How can I load numerous files from a config file in Snakemake? Is it worth it ?

Data QC

Data QC step, can run alone or as part of a combined workflow for large genome assembly. What it does: Reports statistics from sequencing reads. Inputs: long reads (fastq.gz format), short reads (R1 and R2) (fastq.gz format). Outputs: For long reads: a nanoplot report (the HTML report summarizes all the…

Continue Reading Data QC

Demultiplexing nanopore reads

Demultiplexing nanopore reads 1 Hey everyone! My problem is next: I got many FAST5 files from MiniON, then I run guppy to basecall them (with default parameters on GPU). As a result I got FastQ-files. Then I read there were a lot of ways for demultiplexing my reads. And I…

Continue Reading Demultiplexing nanopore reads

Trim and filter reads – fastp

Trim and filter reads; can run alone or as part of a combined workflow for large genome assembly. What it does: Trims and filters raw sequence reads according to specified settings. Inputs: Long reads (format fastq); Short reads R1 and R2 (format fastq) Outputs: Trimmed and filtered reads: fastp_filtered_long_reads.fastq.gz (But…

Continue Reading Trim and filter reads – fastp

From sample to appropriate tool

From sample to appropriate tool 0 Hello guys, My question will sound strange probably, but it is important to me. Suppose making or applying a metagenomic pipeline to identify the taxonomy of the samples you have, in my case it is viral samples (usually). Of course, knowing which technique has…

Continue Reading From sample to appropriate tool

running trinity align_and_estimate_abundance.pl on multiple files

running trinity align_and_estimate_abundance.pl on multiple files 0 Hello, I am fairly new to comp bio. This is a novice question, and I’d appreciate any advice (or points in the right direction to get the info I need). I am attempting to run Trinity’s align_and_estimate_abundance on multiple libraries. I have paired-end…

Continue Reading running trinity align_and_estimate_abundance.pl on multiple files

Salmon quants command keeps on getting killed

Salmon quants command keeps on getting killed 1 Hi all! I’m a wetlab guy quite new to data analysis and would appreciate some help if possible! Currently i’m trying to run Salmon quants on my desktop – using Ubuntu terminal for this and it seems that my computer is unable…

Continue Reading Salmon quants command keeps on getting killed

Could you please tell me how to download genome file from ncbi?

Could you please tell me how to download genome file from ncbi? 1 Hi all. I would like to use this genome file of Aphlenchus avenae, so I tried to download it from here with reference to this article. But it seemed that I can’t download it from this page….

Continue Reading Could you please tell me how to download genome file from ncbi?

HISAT2 no properly paired alignments

HISAT2 no properly paired alignments 1 Hi All! I’m a wetlab guy quite new to data analysis and would appreciate some help if possible! Slowly i’m getting into commandline and understanding some of the workflow behind analysis but i’ve hit a bit of a wall. Following hisat2-build on the human…

Continue Reading HISAT2 no properly paired alignments

Dryad Data — A revised classification of Glossopetalon (Crossosomataceae) based on restriction site-associated DNA sequencing

Glossopetalon inhabits arid regions in the American west and northern Mexico on limestone substrates. The genus comprises four species: G. clokeyi ; G. pungens ; G. texense ; and G. spinescens . Three of the species are narrow endemics. The fourth, G. spinescens , is a widespread species with six…

Continue Reading Dryad Data — A revised classification of Glossopetalon (Crossosomataceae) based on restriction site-associated DNA sequencing

Different output when generating step by step airway dataset

rnaseqGene: Different output when generating step by step airway dataset 1 @dec8f401 Last seen 13 hours ago Spain I have been following the workflow ‘RNA-seq workflow: gene-level exploratory analysis and differential expression’ and I have a problem: I tried to generate step by step the airway dataset, downloading the fastq…

Continue Reading Different output when generating step by step airway dataset

CUT&RUN pipeline – zsh: command not found: sbatch (MacOS terminal)

CUT&RUN pipeline – zsh: command not found: sbatch (MacOS terminal) – how to fix this? 1 Dear all, I apologise in advance if this issue has been addressed before and I just did not find it in here. Also, I am not a bioinformatician/programmer/computer person at all… I have been…

Continue Reading CUT&RUN pipeline – zsh: command not found: sbatch (MacOS terminal)

Failed to create symbolic link. Trinity

Failed to create symbolic link. Trinity 0 This error appears if my output file is at /mnt/…(drive). But if an output file is in home directory – trinity runs well And some strange situation, when I wrote *sudo” the program cannot find salmon, but without this one, it runs well….

Continue Reading Failed to create symbolic link. Trinity

How to append two fastq files ?

How to append two fastq files ? 1 Dear Biostars, How can I append sequences of two fastq files ? Suppose we have two fastq files: **file1.fastq** @HEADER CTCAGNTTGG + AAAAA#EEEE @HEADER GTGAGTTTAG + AA<AA#EE<E **file2.fastq** @HEADER CTTTA + #EEEA @HEADER GTGAG + A#E<E **result.fastq = append file2.fastq to file1.fastq**…

Continue Reading How to append two fastq files ?

Is there any reason to discard singletons after de-hosting metagenomic reads?

Is there any reason to discard singletons after de-hosting metagenomic reads? 0 I an double de hosting metagenomic reads and am wondering if there is any reason to discard singletons after both alignment steps? The workflow is as: Map to host 1 Convert to fastq and repair.sh to resort Map…

Continue Reading Is there any reason to discard singletons after de-hosting metagenomic reads?

Assigning variables programmatically for bwa-mem

Assigning variables programmatically for bwa-mem 1 I have the following script: bwa mem -t 10 -R “@RGtID:xxxtSM:xxxxtLB:LB-1tPU:xxxtPL:ILLUMINA” ref_genome.fa sample_1_1.fastq sample_1_2.fastq | samtools view -@ 10 -b – | samtools s sort -@ 10 -o sample_1.bam I also have a spreadsheet with a column for the forward reads (sample 1, sample…

Continue Reading Assigning variables programmatically for bwa-mem

Edit the Fastq headers

Edit the Fastq headers 1 Hello, I’m using a tool that produces for every barcode’s name a fastq file where every fastq file name contains the barcode number. I need to add the barcode number to all headers depending on their suitable barcodes. I used this command bellow but I…

Continue Reading Edit the Fastq headers

Compare genotype genome sequences at basepair level

I have recently explored various alternatives to a similar problem and came away with the following potential solutions: Solution 1 The “easiest” to do this would be to generate a VCF variant file with a SNP calling tool, then transform that variant file into a tabular file with bcftools view….

Continue Reading Compare genotype genome sequences at basepair level

STAR producing an empty BAM file

STAR producing an empty BAM file 0 I’m trying to run STAR but I am getting an empty BAM file. Does anyone know why this is happening and how to fix it? iCount mapstar demultiplexed/demux_NNNGGCGNN.fastq.gz hs88 mapping_NNNGGCGNN > –annotation homo_sapiens.88.gtf.gz #for context, mapstar needs the following arguments reads, genome_index, out_dir…

Continue Reading STAR producing an empty BAM file

Does Haplotypecaller of GATK find all the mutations?

Does Haplotypecaller of GATK find all the mutations? 1 Hi, I have some assembled sequences and know that some genes with specific mutations are present in them. However, when I go from fatsq to bam format and then apply haplotypecaller of GATK tool, very few of these genes are missing….

Continue Reading Does Haplotypecaller of GATK find all the mutations?

Should your variant calling pipeline change depending on whether using WES or WGS?

Should your variant calling pipeline change depending on whether using WES or WGS? 1 Hi, This might be an obvious question, but can you use the same pipeline for WES and WGS? They should all be in FASTQ file, but I just wanted to ask incase I was missing some…

Continue Reading Should your variant calling pipeline change depending on whether using WES or WGS?

troubleshooting benchmarking small variants: hap.py and rtg

Hi! I tried to do what other posts reported and I have a problem that I do not fully understand why … 1) I downloaded the fastq files from Garvan (ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/) with the bed file. I had to convert the bed file to hg38 (my_regions) … as I understand it…

Continue Reading troubleshooting benchmarking small variants: hap.py and rtg

Ncbi fastq-dump with multiple lanes and cellranger count

Ncbi fastq-dump with multiple lanes and cellranger count 0 I’m currently trying to get fastq or fastq.gz for cellranger count input I found those 7 sra sites And for example in www.ncbi.nlm.nih.gov/sra/SRX12574893%5Baccn], Original fastq file looks like it contains multiple lands and pair-end (16 files) patient_A-1_S1_L003_R1_001.fastq.gz.1 patient_A-1_S1_L003_R2_001.fastq.gz.1 patient_A-2_S1_L004_R1_001.fastq.gz.1 patient_A-2_S1_L004_R2_001.fastq.gz.1 ……

Continue Reading Ncbi fastq-dump with multiple lanes and cellranger count