Tag: fq.gz

invalid deflate data (invalid code lengths set)

I am trying to trim paired end reads using Trim-Galore. I have made sure that the files match based on the total reads processed in the output txt file from trim-galore. One of the files trimmed correctly but when I try some of the others the total written and quality…

Continue Reading invalid deflate data (invalid code lengths set)

Snakemake rule error

Snakemake rule error 0 I have the following rule in snakemake: rule low_coverage_contig_reads: input: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam.bai”, output: r1=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R1.fq.gz”, r2=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R2.fq.gz” threads: 8 params: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam” log: log1=”logs/{sample}_{fraction}_low_coverage_reads.log”, shell: “”” (samtools coverage {params.bam} | awk ‘NR > 1 && $7 < 10 {{print $1}}’ | tr ‘\\n’ ‘ ‘ | samtools view -u {params.bam}…

Continue Reading Snakemake rule error

r – Fst calculation from VCF files

I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…

Continue Reading r – Fst calculation from VCF files

Yes .. BBMap can do that!

NOTE: This collection was originally posted at SeqAnswers.com. Creating a copy here to preserve the information.Part I is available here: Yes .. BBMap can do that! – Part I : bbmap (aligner), bbduk (scan/trim), repair (fix PE reads) and reformat (format conversions)Part II is available here: Yes .. BBMap can…

Continue Reading Yes .. BBMap can do that!

bwa mem hangs after a few thousand reads

I am trying to align a bunch of paired sample fastq files using bwa mem. My original command was: bwa mem -t 8 hg38.fa sample_read1.fq.gz sample_read2.fq.gz > sample_paired.sam I am running this on a HPC cluster. These files have approx. 25 million reads, so I initially anticipated that they might…

Continue Reading bwa mem hangs after a few thousand reads

low rate of ‘Successfully assigned alignments’

Hello everybody. I’m a newbie in RNA-seq Analysis, and I have this situation that I don’t really understand. While working with featureCounts for RNA-seq read quantification, I came across an intriguing issue. The rate of successfully assigned alignments turned out to be unexpectedly low, totalling just 15463270 (7.6%). This was…

Continue Reading low rate of ‘Successfully assigned alignments’

not best k value found

not best k value found 0 i am running kmergenie to estimate best k, however i am ending up with this error. DeprecationWarning: pkg_resources is deprecated as an API. See setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources running histogram estimation list of reads: /media/gyanlab/D/shailesh/Frogs/X201SC22082726-Z01-F002/01.RawData/Meg3/error_corrected/Meg3_read1.fq.gz /media/gyanlab/D/shailesh/Frogs/X201SC22082726-Z01-F002/01.RawData/Meg3/error_corrected/Meg3_read2.fq.gz Setting maximum kmer length to: 150 bp computing histograms…

Continue Reading not best k value found

jellyfish histo is empty

I ran Jellyfish > zcat V*_R1.fastp.fq.gz | jellyfish count -t 2 -C -m 21 -s 5G –quality-start=33 -o reads.jf > cat reads.jf 000001535{“alignment”:8,”canonical”:true,”cmdline”:[“count”,”-t”,”2″,”-C”,”-m”,”21″,”-s”,”5G”,”–quality-start=33″,”-o”,”reads.jf”],”counter_len”:4,”exe_path”:”/mnt/hpccs01/work/x/miniconda2/envs/jellyfish/bin/jellyfish”,”format”:”binary/sorted”,”hostname”:”cl5n014″,”key_len”:42,”matrix1″:{“c”:42,”columns”:[7114389000,2192333556,7375600293,3957177567,1753639420,5791419737,7660672264,3955008356,2531272710,4865888693,6095688361,6974692332,2128614574,2258809230,885814170,1703365744,5945426963,375951490,7217126865,6145592432,7418978358,2071372677,3810558711,7150470371,5456096994,5475478,7949231332,5100019741,6318414384,7141154869,3543742516,6024410566,5605745954,575704984,8124532068,1053004640,2675922405,1674046033,5535382637,1853889752,1425936082,3907867046],”identity”:false,”r”:33},”max_reprobe”:126,”pwd”:”/mnt/hpccs01/scratch/x/y/fastpDNA”,”reprobes”:[1,1,3,6,10,15,21,28,36,45,55,66,78,91,105,120,136,153,171,190,210,231,253,276,300,325,351,378,406,435,465,496,528,561,595,630,666,703,741,780,820,861,903,946,990,1035,1081,1128,1176,1225,1275,1326,1378,1431,1485,1540,1596,1653,1711,1770,1830,1891,1953,2016,2080,2145,2211,2278,2346,2415,2485,2556,2628,2701,2775,2850,2926,3003,3081,3160,3240,3321,3403,3486,3570,3655,3741,3828,3916,4005,4095,4186,4278,4371,4465,4560,4656,4753,4851,4950,5050,5151,5253,5356,5460,5565,5671,5778,5886,5995,6105,6216,6328,6441,6555,6670,6786,6903,7021,7140,7260,7381,7503,7626,7750,7875,8001],”size”:8589934592,”time”:”Wed Nov 22 14:46:48 2023″,”val_len”:7} However, when I ran jellyfish histo -t 2 reads.jf > reads.histo. The output is empty. What did I miss? Read more here: Source…

Continue Reading jellyfish histo is empty

Jellyfish problem with Failed to open input file ‘reads.jf’

Jellyfish problem with Failed to open input file ‘reads.jf’ 0 Hi, I can’t get jellyfish running with two different ways: Method 1 jellyfish count -t 10 -C -m 21 -s 10G –quality-start=33 -o reads.jf <(zcat V350181330_L03_R1.fastp.fq.gz) <(zcat V350181330_L04_R1.fastp.fq.gz) 26897 Killed jellyfish count -t 10 -C -m 21 -s 10G –quality-start=33…

Continue Reading Jellyfish problem with Failed to open input file ‘reads.jf’

Efficient Bulk Data Retrieval from NCBI BioProject

Efficient Bulk Data Retrieval from NCBI BioProject 0 Hello, A month ago, I utilized the SRA Toolkit Pipeline to download Fastq files from a BioProject accession. Following the recommended steps, I generated a list of SRR Names, used prefetch, and then employed fasterq-dump (using parallel-fastq-dump) to obtain the data locally,…

Continue Reading Efficient Bulk Data Retrieval from NCBI BioProject

featureCount Error “No paired-end reads were detected in paired-end read library”

I created a combined mm39 and MHV-A59 (Viral) reference genome and aligned my paired end reads using STAR with the following input commands: STAR –runMode alignReads –runThreadN 16 –genomeDir /genomeDir –readFilesIn /FastqDir/1.fq.gz, /FastqDir/2.fq.gz –readFilesCommand gunzip -c –outReadsUnmapped Fastx –outSAMtype BAM SortedByCoordinate It seems that everything went fine. Here is a…

Continue Reading featureCount Error “No paired-end reads were detected in paired-end read library”

Htseq Count

Hello, I have ran htseq-count numerous times and continue to get the same error. That NONE of my genes are counted as seen here. ZXDC 0 ZYG11B 0 ZYX 0 ZZEF1 0 ZZZ3 0 __no_feature 70257177 __ambiguous 0 __too_low_aQual 1509790 __not_aligned 3970775 __alignment_not_unique 4277765 However, I have a very high…

Continue Reading Htseq Count

Wildcards in Snakemake

I wish to input the names of my samples in a table with the corresponding files of reads forward and reverse to use them in a Snakemake workflow: sample fq1 fq2 1 ../reads/110627_0240_AC0254ABXX_2_SA-PE-001.1.fq.gz ../reads/110627_0240_AC0254ABXX_2_SA-PE-001.2.fq.gz 2 ../reads/110627_0240_AC0254ABXX_2_SA-PE-002.1.fq.gz ../reads/110627_0240_AC0254ABXX_2_SA-PE-002.2.fq.gz 22 ../reads/110802_0249_AD0CM0ABXX_3_SA-PE-022.1.fq.gz ../reads/110802_0249_AD0CM0ABXX_3_SA-PE-022.2.fq.gz Unfortunately, the management of the wildcards is more complex and…

Continue Reading Wildcards in Snakemake

Viral positive and negative strand with paired sequencing and bowtie

You should expect the same number of forward and reverse strand reads because read 1 and read 2 are on opposite strands. What would be more interesting here is to first split the mapped file into r1 and r2, then split THOSE files into forward and reverse, then combine R1…

Continue Reading Viral positive and negative strand with paired sequencing and bowtie

Redirecting the output of a command related to another command

Redirecting the output of a command related to another command 1 I want to get a .txt file showing the computer’s performance of gtime command related to another command: bwa mem. Bwa is a genome aligner tool. Below the code I’ve used: { gtime -v bwa mem -t 4 -R…

Continue Reading Redirecting the output of a command related to another command

Mapping of paired-end ddRADseq results in 0.00% of reads pairing

Hey all, I’m trying to map my RADseq to a reference genome, and none of my paired-end reads are being paired. Forward and reverse reads are both mapping separately, but not pairing. This problem is consistent for all of my samples. Any help would be much appreciated!! I also viewed…

Continue Reading Mapping of paired-end ddRADseq results in 0.00% of reads pairing

Error when running HUMAnN – HUMAnN

Hi, I am keep getting an error when running humann v3.7 humann v3.7 was installed with conda conda install humann -c biobakery –solver=libmambaBefore installing the environment was created and channels were configured as per instructions I downloaded the databases as follow humann_databases –download chocophlan full /home/swijegun/humann/databases/ –update-config yes humann_databases –download…

Continue Reading Error when running HUMAnN – HUMAnN

more reads in metagenomic samples after ‘removing host reads’

more reads in metagenomic samples after ‘removing host reads’ 0 the title says it all – this is driving me nuts I have Illumina shotgun data from cow manure and I am trying to remove the cow reads. I have downloaded the latest cow genome and used it as reference…

Continue Reading more reads in metagenomic samples after ‘removing host reads’

what is the mean of the file “*._f1.fq.gz” and “*._r2.fa.gz”

what is the mean of the file “*._f1.fq.gz” and “*._r2.fa.gz” 1 Hello, I downloaded the files from: bigd.big.ac.cn/gsa/ The file is ended with: ” *._f1.fq.gz” and ” *._r2.fa.gz”. Is it single-end or paired-end sequencing? If it is paired-end sequencing, the file should be: ” *._r1.fq.gz” and ” *._r2.fa.gz”, not ”…

Continue Reading what is the mean of the file “*._f1.fq.gz” and “*._r2.fa.gz”

How to run TRUST4 for BCR/TCR detection using 10X data?

I’m trying to follow this tutorial here: github.com/liulab-dfci/TRUST4#10x-genomics-data-and-barcode-based-single-cell-data However, I’m not sure how to adapt my data. I have R1 reads that look like this where the reads are 28 bp long: @A00588:95:H2H5KDRX3:1:1101:1163:1000 1:N:0:GTAACATGCG+AGGTAACACT AGCTATCTACTTCTGGTACAACCCACTN + FFFF,FFFFF:FFFFFFF:FF:FFFFF# @A00588:95:H2H5KDRX3:1:1101:1904:1000 1:N:0:GTAACATGCG+AGGTAACACT My R2 reads look like this and they 90 bp long:…

Continue Reading How to run TRUST4 for BCR/TCR detection using 10X data?

GATK AnnotateVcfWithBamDepth returns zero DP for all variants in VCF

Dear all, I am using GATK (v4.1.9.0) AnnotateVcfWithBamDepth to get the DP for all variants in ClinVar VCF in a retina RNA-seq BAM file. However, the tool returns zero depth for all variants in the VCF, even though I checked multiple variants in IGV and I saw that they are…

Continue Reading GATK AnnotateVcfWithBamDepth returns zero DP for all variants in VCF

Expected number of raw reads in fq.gz files in PE sequencing

Expected number of raw reads in fq.gz files in PE sequencing 1 Hi guys, I have a ridiculously fundamental question so can someone please help me out? Background: An external company did RNA-seq for me. I ordered 20M read depth, paired-end 150 bp seq. Question: Am I supposed to expect…

Continue Reading Expected number of raw reads in fq.gz files in PE sequencing

Trimming of reads in miRNA-Seq data

Trimming of reads in miRNA-Seq data 0 Dear All, I have been trying to filter out reads from Fastq files from miRNA-Seq that we received. The read structure looks like the one shown in the figure below. I can use Cutadapt to filter out the adapter (we have the adapter…

Continue Reading Trimming of reads in miRNA-Seq data

Check Strandedness

Check Strandedness 0 I need to figure out the strandedness for the -s flag for regtools junctions extract used for Leafcutter. I get a peculiar error when using how_are_we_stranded_here. Command Run: check_strandedness –gtf path/to/Danio_rerio.GRCz11.110.chr.gtf –transcripts /path/to/Danio_rerio.GRCz11.dna_sm.primary_assembly.fa –reads_1 Sample_1_R1.fq.gz –reads_2 Sample_1_R2.fq.gz Gives the output: Results stored in: stranded_test_Sample_1_R1 converting gtf to…

Continue Reading Check Strandedness

Can you convert ASCII text to fastq?

Can you convert ASCII text to fastq? 1 Hi smarty pants peeps, I got sequence files from collaborators and their original file types look like this: File: MG430_L4_2.fq.gz, Type: application/x-gzip File: MG431_L4_1.fq.gz, Type: application/x-gzip File:MG431_L4_2.fq.gz, Type: application/x-gzip I ran fastqc, trimmomatic (removing adapters) and then fastqc again. I then started…

Continue Reading Can you convert ASCII text to fastq?

No module named ‘readfq’ arises during the execution of kmergenie

ModuleNotFoundError: No module named ‘readfq’ arises during the execution of kmergenie 0 $cat kmer_list_files D31-10A_1.qc.fq.gz D31-10A_2.qc.fq.gz $ kmergenie kmer_list_files –diploid /home/cdbi1/miniconda3/envs/gensize/bin/kmergenie:29: DeprecationWarning: pkg_resources is deprecated as an API. See setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources running histogram estimation list of reads: D31-10A_1.qc.fq.gz D31-10A_2.qc.fq.gz Traceback (most recent call last): File “/home/cdbi1/miniconda3/envs/gensize/bin/kmergenie”, line 303, in…

Continue Reading No module named ‘readfq’ arises during the execution of kmergenie

Filtering parasite reads from host reads

Filtering parasite reads from host reads 0 Hi all, I have RNA-seq PE fastq reads for control and infected (at different time points) samples. The host species is lacking a reference genome. Thus, I will be doing a de novo transcriptome assembly and later differential gene expression analysis. My questions:…

Continue Reading Filtering parasite reads from host reads

HISAT2 HLA genotyping errors

HISAT2 HLA genotyping errors 2 Hi,I’m trying to follow the tutorial for HLA typing and assembly in HISAT2 as described at ccb.jhu.edu/hisat-genotype/index.php/Type:HLA using my own RNAseq data. I do everything as described. I have samtools 1.7 installed. I am able to extract my HLA reads. But when I get to…

Continue Reading HISAT2 HLA genotyping errors

Salmon_index.json doesn’t seem to exist

Good evening, thanks to a previous post I was able to solve one problem in my pipeline. Now I am dealing with another one. I have to quantify some rna sequence with salmon and I downloaded a pre-computed index from the link suggested in salmon documentation. The link is this,…

Continue Reading Salmon_index.json doesn’t seem to exist

Mapping to mtDNA and then align the unmapped

Mapping to mtDNA and then align the unmapped 1 Hello all, I have aligned my samples against the mitochondrion genome of the species I work with. My idea was that after this I would keep the unmapped ones (which would be the nuclear reads), and then align these against the…

Continue Reading Mapping to mtDNA and then align the unmapped

bbsplit running slow or out of memory?

bbsplit running slow or out of memory? 1 Hello, I have Illumina fastq files from some RNA-seq, ATAC-seq and WES that originated as PDX samples. I am looking to filter out contaminating mouse reads from the human reads in these datasets. I have used Xenome before but wanted to try…

Continue Reading bbsplit running slow or out of memory?

Use bbmap reformat.sh to convert from paired fq files to a bam file

Use bbmap reformat.sh to convert from paired fq files to a bam file 1 As outlined here I was able to create paired-end fastq files with the help of GenoMax . Now I wonder how I can use reformat.sh from bbmap to convert this files to a valid bam file…

Continue Reading Use bbmap reformat.sh to convert from paired fq files to a bam file

Removing overrepresented sequences in paired end RNA-seq

Removing overrepresented sequences in paired end RNA-seq 0 After trimming and QC’ing RNAseq for adapters with trim_galore, should I remove overrepresented sequences that FastQC identifies as possible adapter/primer source? If so, is there a way to do so automatically? For what it’s worth, both of the identified sequences start with…

Continue Reading Removing overrepresented sequences in paired end RNA-seq

Use bbmap reformat.sh to convert from paired fq files to a valid abam file

Use bbmap reformat.sh to convert from paired fq files to a valid abam file 1 As outlined here I was able to create paired-end fastq files with the help of GenoMax . Now I wonder how I can use reformat.sh from bbmap to convert this files to a valid bam…

Continue Reading Use bbmap reformat.sh to convert from paired fq files to a valid abam file

FastQC command line usage

FastQC command line usage 2 Hi, I am supposed to run FastQC in linux server (I don’t have chance to use graphical user interface) how can I use command line for running fastqc for my sequences? Thanks in advance. command-line FastQC linux CentOS • 7.8k views After installing FastQC, you…

Continue Reading FastQC command line usage

How to extract sequences from multiple fastq files based on a certain sequence ?

How to extract sequences from multiple fastq files based on a certain sequence ? 1 Dear all, I would like to know if it is possible to extract/retain sequences from multiple fastq files based on a certan input sequences, and get new fastq files containing only sequences sharing the input…

Continue Reading How to extract sequences from multiple fastq files based on a certain sequence ?

samtools collate

samtools collate 0 Hi all, I am using samtools collate to convert my bam files to paired end fastq files. here is the command that I am using samtools view -h -T mm10.fa {input.bam} | samtools collate -O -u -@ {threads} – | samtools fastq -1 output_paired1.fq.gz -2 output_paired2.fq.gz -0…

Continue Reading samtools collate

Very few snp and indels variation were identified using PAV variation input file base on vg call

Very few snp and indels variation were identified using PAV variation input file base on vg call 0 Hi all, We want to find the snp and indels variation from the result vcf file BS_graph_call.vcf by using the pan_genome vg analysis software. There are only **fewer than 20 snp and…

Continue Reading Very few snp and indels variation were identified using PAV variation input file base on vg call

SNP analysis with an assembly

Hi there, I am new in SNP analyses so before starting doing anything I would like to check if my pipeline is correct. What I have now is : RNA-seq samples (.fq.gz) + Trinity assembly (from those reads). My model organism has not an assembled genome, that’s the way I…

Continue Reading SNP analysis with an assembly

python – Snakemake wrappers suddenly stopped working

I have this wrappers in my snakemake file rule fastqc: input: “reads/{sample}_trimmed.fq.gz” output: html=”qc/fastqc/{sample}.html”, zip=”qc/fastqc/{sample}_fastqc.zip” # the suffix _fastqc.zip is necessary for multiqc to find the file params: extra = “–quiet” log: “logs/fastqc/{sample}.log” threads: config[“resources”][“fastqc”][“cpu”] conda: “envs/qc.yaml” wrapper: “v1.31.1/bio/fastqc” qc.yaml: name: qc channels: – bioconda dependencies: – python – fastqc…

Continue Reading python – Snakemake wrappers suddenly stopped working

Cutadapt error: too many parameters.

Cutadapt error: too many parameters. 0 Hi biostars community! I am having issues to loop cutadapt over gunzipped samples. This is the script I am using: #!/bin/bash #SBATCH –account GRINFISH #SBATCH -c 8 #SBATCH –mem 96g #SBATCH –output logfile.out #SBATCH –error logfile.err # This script performs trimming for PE sequences…

Continue Reading Cutadapt error: too many parameters.

Trimmomatic run error

Trimmomatic run error 0 Hello I have a pb On running input_dir=”$HOME/workdir/group” output_dir=”$HOME/workdir/group/fqdata_trimmed” adap=”$CONDA_PREFIX/share/trimmomatic-0.39-1/adapters” f1=”$HOME/workdir/group/P4.R1.fq.gz” f2=”$HOME/workdir/group/P4.R2.fq.gz” newf1=”$HOME/workdir/group/P4.R1.pe.trim.fq.gz” newf2=”$HOME/workdir/group/P4.R2.pe.trim.fq.gz” newf1u=”$HOME/workdir/group/P4.R1.se.trim.fq.gz” newf2u=”$HOME/workdir/group/P4.R2.se.trim.fq.gz” mismatch_values=(1 2 3 4 5) for mismatch_value in “${mismatch_values[@]}” do trimmomatic PE -threads 1 -phred33 -trimlog trimLogFile -summary statsSummaryFile \ $f1 $f2 $newf1 $newf1U $newf2 $newf2U \ ILLUMINACLIP:$adap/TruSeq3-PE-2.fa:${mismatch_value}:30:10:1 \ SLIDINGWINDOW:4:15…

Continue Reading Trimmomatic run error

No differentially expressed genes after multiple testing correction in mice

No differentially expressed genes after multiple testing correction in mice 0 Hi all, I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified. I have had .cram files from mouse…

Continue Reading No differentially expressed genes after multiple testing correction in mice

High number of duplicates and low percentage properly paired

High number of duplicates and low percentage properly paired 0 I have some paired end sequencing data that I have trimmed using cutadapt. It was sequenced on an illumina novaseq 6000 and is low coverage RADseq data (2-3x). My cutadapt script used forward and reverse adapters from illumina : cutadapt…

Continue Reading High number of duplicates and low percentage properly paired

How to split a fastq file to multiples fastq files

How to split a fastq file to multiples fastq files 1 Dear all, I have a fastq.gz file that has more than 100 million reads. My aim is to divide this fastq file into three separate fastq files, ensuring that all reads from the original fastq file are distributed and…

Continue Reading How to split a fastq file to multiples fastq files

Error, fewer reads in file specified with -1 than in file specified with -2

Bowtie2: Error, fewer reads in file specified with -1 than in file specified with -2 1 Hi all, This is my first time attempting to align sequences to a reference index. I am using bowtie2 with the -1 and -2 arguments and have gotten the following error message: Error, fewer…

Continue Reading Error, fewer reads in file specified with -1 than in file specified with -2

Snakemake workflow for trimmomatic

Snakemake workflow for trimmomatic 0 Hello everybody ! I’m a novice in Snakemake. I want to create a workflow for Illumina data analysis. I’m currently programming trimmomatic rule and I’m facing to issue. This is the code: SAMPLES = [“1G_S15”, “7G_S13″] rule trimmomatic_pe: input: adaptaters =”Illumina/adaptaters/TruSeq2-PE.fa”, forward = expand(“HHV8/fastq_raw_/fastq_H8/{sample}_R1.fastq.gz”, sample…

Continue Reading Snakemake workflow for trimmomatic

Correct script for featurecounts in Rsubread

I am new to R and RStudio but have been trying to work through different examples using Rsubread for my data. I have tried reading vignettes and manuals prior to posting here but I am stuck and could really use some advice. I have 7 paired-end, fastq files from Illumina…

Continue Reading Correct script for featurecounts in Rsubread

Hard clip fastq

Hard clip fastq 2 I hope this is not a silly question. I have 2x 200bp fastqs generated from MGI G400 sequencer. I would like to do a comparison with Illumina but these only come as 2x 150bp fastqs. Is it possible to hard clip the 2x 200bp fastqs down…

Continue Reading Hard clip fastq

Nextflow memory issues custom config -c

Nextflow memory issues custom config -c 1 Hi all, I am trying to run nextflow on my laptop nextflow run nf-core/rnaseq \ –input samplesheet.csv \ –genome mm10 \ -profile docker I am having issues with memory: Error executing process > ‘NFCORE_RNASEQ:RNASEQ:FASTQC_UMITOOLS_TRIMGALORE:FASTQC (KO_3)’ Caused by: Process requirement exceed available memory –…

Continue Reading Nextflow memory issues custom config -c

How to select or subset process outputs in Nextflow DSL2?

How to select or subset process outputs in Nextflow DSL2? 2 I have a DSL2 Nextflow workflow. I would like to use just the outputs named “paired.fq.gz” ( index 0 and 2 in the tuple) in downstream processes. Is there a way to filter or select a subset of the…

Continue Reading How to select or subset process outputs in Nextflow DSL2?

Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?

Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding? 9 Is there a simple tool I can use to quickly find out if a FASTQ file is in Sanger or Phred64 encoding? Ideally something that tells me ‘Encoding XX’ somewhere the terminal output. fastq tools • 46k…

Continue Reading Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?

10x 3′ library creates R1 and R2 fastq files with the same read length

Let me show you an example: trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR16093385&display=metadata This data contains two reads, R1 and R2. The read length of R1 and R2 are the same 150bp. However, this experiment is performed following 10x 3’library protocol. In the method section, it described as below: The scRNA-seq libraries were generated using the…

Continue Reading 10x 3′ library creates R1 and R2 fastq files with the same read length

trimmomatic on scRNA seq data

trimmomatic on scRNA seq data 1 Hello I’m struggling with scRNA pipeline. I downloaded data from 10* genomics database : support.10xgenomics.com/single-cell-gene expression/datasets/3.0.0/pbmc_1k_v3 when I want to check the size of files I found this : -rw-r–r– 1 5062 5000 753851810 Nov 2 2018 pbmc_1k_v3_S1_L001_R1_001.fastq.gz -rw-r–r– 1 5062 5000 1772725195 Nov…

Continue Reading trimmomatic on scRNA seq data

Trimmomatic generated two (reverse-forward) paired-files with different number of reads

Trimmomatic generated two (reverse-forward) paired-files with different number of reads 0 Hi all, Through the RNA-seq analysis workflow using Linux, Trimmomatic generates 4 out-put files; forward-paired.fq.gz, reverse-paired.fq.gz, and the 2 unpaired files. As I read in several threads, Trimomatic is expected to; Remove the adapters and the low-quality reads. generates…

Continue Reading Trimmomatic generated two (reverse-forward) paired-files with different number of reads

Can’t add read group correctly to minimap2 sam alignmnet

Can’t add read group correctly to minimap2 sam alignmnet 1 Hello I am running minimap2 in a pipeline with GATK that needs read group data @RG with sample information. minimap2 -ax sr -t 20 -I 100G -R @RG\\tID:A00253_251_HTN2JDSXY.2\\tPL:ILLUMINA\tLB:LB1\\tSM:TA90 ref.mmi reads_1.fq.gz reads_2.fq.gz | samtools view -bh -F 260 -T ref.fa >out.bam…

Continue Reading Can’t add read group correctly to minimap2 sam alignmnet

snakemake wildcard in shell

snakemake wildcard in shell 0 I’m a newbie at snakemake. I’m trying to implement the GATK FastqToSam in a rule. I have the following and it works if I hard code the samplename into the shell part but I was wanting to get the samplename from the config file. I…

Continue Reading snakemake wildcard in shell

forcing read error correction using SPAdes

forcing read error correction using SPAdes 2 Given that this is my code below, why is SPAdes giving me the following message?: Mode: ONLY assembling (without read error correction) Debug mode is turned OFF I would like for the assembly to complete the read error correction step if possible. Based…

Continue Reading forcing read error correction using SPAdes

XenoCell fq.gz output files

XenoCell fq.gz output files 0 Hi, I followed the tutorial of XenoCell (see below) and extracted the graft barcodes by using hgmm_5k_v3 example dataset. I got 3 output files (cellular_barcodes.txt, fq_barcode.fq.gz, and fq_transcript.fq.gz) under graft folder. Does anyone know how to convert these files and feed them into 10x genomic…

Continue Reading XenoCell fq.gz output files

STAR is running but .sam file size does not increase after hours mapping

STAR is running but .sam file size does not increase after hours mapping 0 Hi there, I’m using STAR with a small genome. My samples are paired. The commands are: For genome indexes STAR –runThreadN 20 –runMode genomeGenerate –genomeDir /path/to/folder/Analyses/STAR/ –genomeFastaFiles /path/to/genome_reference/genome.fna –readFilesCommand zcat path/to/folder/with/giz_samples/R1.fq.gz R2.fq.gz –sjdbGTFfile path/to/genome_reference/genome.gff –genomeSAindexNbases 11…

Continue Reading STAR is running but .sam file size does not increase after hours mapping

RNAseq for DE purpose

RNAseq for DE purpose 0 Hi all, I am totally new in the bioinformatic analysis. I am working on a project that looks at DGE among different time treatments. Besides, there is no reference genome (meaning that I need a de novo assembly step). So far, after struggling and navigating…

Continue Reading RNAseq for DE purpose

Randomize Read Order In Multigbp Fastq File?

Randomize Read Order In Multigbp Fastq File? 3 Is there any method to randomize the read order in a multi-Gbp fastq file? fastq • 6.0k views Assuming you are talking about a single-end file, you can use awk to put each 4-line fastq entry on a single line. You then…

Continue Reading Randomize Read Order In Multigbp Fastq File?

Error 134 while aligning using hisat2

Error 134 while aligning using hisat2 0 Hello, I am using the below command to align the reads and get bam file: hisat2 -x /hisat/grch38/genome -1 /fastq/output_forward_paired.fq.gz -2 /fastq/output_reverse_paired.fq.gz | samtools sort -o /bams/outout.bam This was running perfectly ok for the last try, however, for the new try I got…

Continue Reading Error 134 while aligning using hisat2

Error in trimmomatic

Error in trimmomatic 1 Hi! Trust you are well. I am trying to run this program but I get the following error and I dont know to fix it neither understand it. Could you help me please? java -jar trimmomatic-0.36.jar PE -phred33 white_replicate1.R1paired.fq.gz white_replicate1.R2paired.fq.gz white_replicate1.R1paired.fq.gz white_replicate1.R1unpaired.fq.gz white_replicate1.R2paired.fq.gz white_replicate1.R2unpaired.fq.gz ILLUMINACLIP:/mnt/g/poolseq_tutorial_1/poolseq_tutorial/adapters/TruSeq3- PE.fa:2:20:10:1:true…

Continue Reading Error in trimmomatic

cutadapt installed via conda igzip error for some fastq files

Only very recently (~2 weeks ago), cutadapt installed via conda has the following error: This is cutadapt 3.2 with Python 3.8.6 Command line parameters: -j 4 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC in2438_3_CKDL210000739-2a-AK5142-AK6697_HVHF2DSXY_L2_1.fq.gz Processing reads on 4 cores in single-end mode … [———>8 ] 00:00:26 5,536,084 reads @…

Continue Reading cutadapt installed via conda igzip error for some fastq files

htslib/c what is the correct way to use bgzf_thread_pool ?

htslib/c what is the correct way to use bgzf_thread_pool ? 1 I try to split fastq files into ‘N’ chunks using a simple CC program and htslib-C . It works fine: ./split2file -o TMP S1.R1.fq.gz S1.R2.fq.gz -n 10 but when I use a thread pool ( As far as I…

Continue Reading htslib/c what is the correct way to use bgzf_thread_pool ?

Getting information on CRAM files from headers inside the files

Getting information on CRAM files from headers inside the files 1 Hello. I wish to know if one can find the following information in CRAM files’ headers: 1) Whether or not sequencing data in CRAM files is from WGS or WES, and if so, where? and 2) In case one…

Continue Reading Getting information on CRAM files from headers inside the files

Converting Bam file to Fasta (Zipped)

Converting Bam file to Fasta (Zipped) 0 I would like to convert .bam files to fq.gz (zipped fasta files) for paired reads. bedtools bamtofastq seems to be a commonly recommended method, I have also seen samtools fastq as a possible alternative. bedtools bamtofastq -i inputfile.bam -fq outputR1.fq -fq2 outputR2.fq samtools…

Continue Reading Converting Bam file to Fasta (Zipped)

error when fastp filters data

Use fastp filter to appear sequence and quality have different lengths fastp -i CK-2_R1.fq.gz -o CK-2_R1.clean.fq.gz -I CK-2_R2.fq.gz -O CK-2_R2.clean.fq.gz After filtering the data for a while, it will not be updated anymore. [pengliang@fat01 CK-2]$ ERROR: sequence and quality have different length: WARNNIG: different read numbers of the 4852 packRead1…

Continue Reading error when fastp filters data

BBmap bbduk.sh for filtering reads

I’m looking to filter reads that contain a stretch of A’s, I found these posts looking for polyA tails, meaning this should work all the same (Identify RNA-seq reads containing polyA sequence, Identifying RNA-seq reads containing polyA stretch). However, I cannot get it to work. Given just these two reads,…

Continue Reading BBmap bbduk.sh for filtering reads

Hisat2 – stringtie – deseq2 pipeline for bulk RNA seq

Software official website : Hisat2: Manual | HISAT2 StringTie:StringTie article :Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown | Nature Protocols It is recommended to watch the nanny level tutorial : 1. RNA-seq : Hisat2+Stringtie+DESeq2 – Hengnuo Xinzhi 2. RNA-seq use hisat2、stringtie、DESeq2 analysis – Simple books Basic usage…

Continue Reading Hisat2 – stringtie – deseq2 pipeline for bulk RNA seq

Detailed differences between sambamba and samtools

3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…

Continue Reading Detailed differences between sambamba and samtools

The low successful assignment ratio of FeatureCounts

Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…

Continue Reading The low successful assignment ratio of FeatureCounts

Trimmomatic/ linux system

Trimmomatic/ linux system 1 Hi all, I am trying to remove adapters and clean my RNA-seq.gz files using Trimmomatic, loaded on a Linux system (supercomputer server) Following the steps for Pair ends reads, explained in the manual (www.usadellab.org/cms/?page=trimmomatic) java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3…

Continue Reading Trimmomatic/ linux system

Fastp file merge append | Develop Paper

Interpretation of fastq file formatwww.jianshu.com/p/39115d21ee17 Sometimes, the sequencing results of a species will return two double ended fastps.r1.fq.gz l1.fq.gzr2.fq.gz l2.fq.gzThe content of sequencing data is actually one piece, but it is divided into two parts during transmission.When we use it, we are used to merging it into a double ended…

Continue Reading Fastp file merge append | Develop Paper

Snakemake using multi inputs – Stackify

You need to define target output files using rule all. SAMPLES = [‘1’, ‘2’, ‘3’, ‘4’] rule all: input: expand(“sample{sample}.R{read_no}.fq.gz.out”, sample=SAMPLES, read_no=[‘1’, ‘2’]) rule fastp: input: reads1=”sample{sample}.R1.fq.gz”, reads2=”sample{sample}.R2.fq.gz” output: reads1out=”sample{sample}.R1.fq.gz.out”, reads2out=”sample{sample}.R2.fq.gz.out” shell: “fastp -i {input.reads1} -I {input.reads2} -o {output.reads1out} -O {output.reads2out}” This is the output of command snakemake -np, with…

Continue Reading Snakemake using multi inputs – Stackify

bwa , 2 files fastq to 1 sam

bwa , 2 files fastq to 1 sam 1 i have this problem, please, help me, I’m trying it too from Mac OS Catalina I am creating a sam file, with 2 fastq files, using bwa I apply the following command bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz > V350019555_L03_B5GHUMqcnrRAABA-556.sam…

Continue Reading bwa , 2 files fastq to 1 sam

Secret BBMAP helper page – HRGV/Marmics_Metagenomics Wiki

#How to map to the assembled scaffolds.fasta bbmap is a powerful and highly flexible read mapper jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/. For the upcoming analysis you are not interested in the typical mapping output but in statistics on the coverage on every scaffold, you can get them with scaffstats. We want to be specific…

Continue Reading Secret BBMAP helper page – HRGV/Marmics_Metagenomics Wiki

Trimmomatic parameters

Trimmomatic parameters 0 $java -jar /apps/eb/Trimmomatic/0.39-Java-1.8.0_144/trimmomatic-0.39.jar PE -phred33 seq1_L2_1.fq.gz seq1_L2_2.fq.gz _L2_r1_paired_fq.gz seq1_L2_r1_unpaired.fq.gz seq_L2_r2_paired.fq.gz Seq1_L2_r2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:5 ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:5 Trimmomatic • 137 views • link updated 15 hours ago by GenoMax 110k • written 17 hours ago by ronny • 0 Login before adding your…

Continue Reading Trimmomatic parameters

Using STAR SJ.out.tab file to identify novel ncRNAs

Using STAR SJ.out.tab file to identify novel ncRNAs 0 Hi All, I am attempting to identify novel ncRNAs from a circadian RNAseq dataset. Specifically I have a ribo-depleted RNAseq timecourse with 31 samples (sample every 2 hours for 60hrs). I have run STAR (code below). I am trying to follow…

Continue Reading Using STAR SJ.out.tab file to identify novel ncRNAs

Mapping multiples

Mapping multiples 1 Hi, I am coming to you for help. I am doing a mapping on short and long read files with BWA and MINIMAP2 My problem is that, I want to make an if loop that would allow me to choose either BWA if I work with short…

Continue Reading Mapping multiples

STAR+RSEM pippline without gtf

STAR+RSEM pippline without gtf 0 Dear all, I have question I mapped reads on cds sequence through STAR I don’t have gtf file and want to calculate read count using RSEM but I am stuck by error “RSEM error: RSEM currently does not support gapped alignments” as I don’t have…

Continue Reading STAR+RSEM pippline without gtf

BBMerge / Tadpole error correction

I’ve been using BBMerge recently to address a very specific problem: I am sequencing pooled short DNA molecules (< 400bps) using paired end reads (average length ~ 230 bps post trimming) Each molecule can be assumed to be different (i.e. contains sequence differences – substitutions & indels – with respect…

Continue Reading BBMerge / Tadpole error correction

How to pass custom software specific variables to nf-core/sarek nextflow pipeline?

How to pass custom software specific variables to nf-core/sarek nextflow pipeline? 0 I’m attempting to call whole genome variants using nf-core/sarek nextflow pipeline. In QC step there is an option that invokes trim_galore quality trimming, but i don’t know how to pass my custom adapters to be cut as well….

Continue Reading How to pass custom software specific variables to nf-core/sarek nextflow pipeline?

STAR align multiple files

STAR align multiple files 1 Hi everybody, I am doing alignment to 36 PE samples using star. to make it little bit easy task I wrote a bash loop to align them all with the same command. here is my loop: for i in $(ls raw_data); do STAR –genomeDir index.150…

Continue Reading STAR align multiple files

Biostar Systems

Comment: STAR vs Novoalign IGV Browser visualization by chasem &utrif; 10 That is good to know that it isn’t just my set of reads…still concerning, though. Comment: STAR vs Novoalign IGV Browser visualization by chasem &utrif; 10 I was not expecting this — not sure what to make of it…

Continue Reading Biostar Systems

question about running CIRI-full

question about running CIRI-full 1 I’m using ciri-full to calculate the full length sequence of circRNAs ,and I can run the test data set successfully, but I can’t run my own data running test data set: java -jar ../CIRI-full.jar Pipeline -1 test_1.fq.gz -2 test_2.fq.gz -a test_anno.gtf -r test_ref.fa -d test_output/…

Continue Reading question about running CIRI-full

I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv.

I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv. 0 Hey everyone, before i start apologies for the inconvenience cause of my wrong or inappropriate use of terms. I take some fails of bwa mem lately. As i…

Continue Reading I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv.