Tag: fq.gz
python – Snakemake wrappers suddenly stopped working
I have this wrappers in my snakemake file rule fastqc: input: “reads/{sample}_trimmed.fq.gz” output: html=”qc/fastqc/{sample}.html”, zip=”qc/fastqc/{sample}_fastqc.zip” # the suffix _fastqc.zip is necessary for multiqc to find the file params: extra = “–quiet” log: “logs/fastqc/{sample}.log” threads: config[“resources”][“fastqc”][“cpu”] conda: “envs/qc.yaml” wrapper: “v1.31.1/bio/fastqc” qc.yaml: name: qc channels: – bioconda dependencies: – python – fastqc…
Cutadapt error: too many parameters.
Cutadapt error: too many parameters. 0 Hi biostars community! I am having issues to loop cutadapt over gunzipped samples. This is the script I am using: #!/bin/bash #SBATCH –account GRINFISH #SBATCH -c 8 #SBATCH –mem 96g #SBATCH –output logfile.out #SBATCH –error logfile.err # This script performs trimming for PE sequences…
Trimmomatic run error
Trimmomatic run error 0 Hello I have a pb On running input_dir=”$HOME/workdir/group” output_dir=”$HOME/workdir/group/fqdata_trimmed” adap=”$CONDA_PREFIX/share/trimmomatic-0.39-1/adapters” f1=”$HOME/workdir/group/P4.R1.fq.gz” f2=”$HOME/workdir/group/P4.R2.fq.gz” newf1=”$HOME/workdir/group/P4.R1.pe.trim.fq.gz” newf2=”$HOME/workdir/group/P4.R2.pe.trim.fq.gz” newf1u=”$HOME/workdir/group/P4.R1.se.trim.fq.gz” newf2u=”$HOME/workdir/group/P4.R2.se.trim.fq.gz” mismatch_values=(1 2 3 4 5) for mismatch_value in “${mismatch_values[@]}” do trimmomatic PE -threads 1 -phred33 -trimlog trimLogFile -summary statsSummaryFile \ $f1 $f2 $newf1 $newf1U $newf2 $newf2U \ ILLUMINACLIP:$adap/TruSeq3-PE-2.fa:${mismatch_value}:30:10:1 \ SLIDINGWINDOW:4:15…
No differentially expressed genes after multiple testing correction in mice
No differentially expressed genes after multiple testing correction in mice 0 Hi all, I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified. I have had .cram files from mouse…
High number of duplicates and low percentage properly paired
High number of duplicates and low percentage properly paired 0 I have some paired end sequencing data that I have trimmed using cutadapt. It was sequenced on an illumina novaseq 6000 and is low coverage RADseq data (2-3x). My cutadapt script used forward and reverse adapters from illumina : cutadapt…
How to split a fastq file to multiples fastq files
How to split a fastq file to multiples fastq files 1 Dear all, I have a fastq.gz file that has more than 100 million reads. My aim is to divide this fastq file into three separate fastq files, ensuring that all reads from the original fastq file are distributed and…
Error, fewer reads in file specified with -1 than in file specified with -2
Bowtie2: Error, fewer reads in file specified with -1 than in file specified with -2 1 Hi all, This is my first time attempting to align sequences to a reference index. I am using bowtie2 with the -1 and -2 arguments and have gotten the following error message: Error, fewer…
Snakemake workflow for trimmomatic
Snakemake workflow for trimmomatic 0 Hello everybody ! I’m a novice in Snakemake. I want to create a workflow for Illumina data analysis. I’m currently programming trimmomatic rule and I’m facing to issue. This is the code: SAMPLES = [“1G_S15”, “7G_S13″] rule trimmomatic_pe: input: adaptaters =”Illumina/adaptaters/TruSeq2-PE.fa”, forward = expand(“HHV8/fastq_raw_/fastq_H8/{sample}_R1.fastq.gz”, sample…
Correct script for featurecounts in Rsubread
I am new to R and RStudio but have been trying to work through different examples using Rsubread for my data. I have tried reading vignettes and manuals prior to posting here but I am stuck and could really use some advice. I have 7 paired-end, fastq files from Illumina…
Hard clip fastq
Hard clip fastq 2 I hope this is not a silly question. I have 2x 200bp fastqs generated from MGI G400 sequencer. I would like to do a comparison with Illumina but these only come as 2x 150bp fastqs. Is it possible to hard clip the 2x 200bp fastqs down…
Nextflow memory issues custom config -c
Nextflow memory issues custom config -c 1 Hi all, I am trying to run nextflow on my laptop nextflow run nf-core/rnaseq \ –input samplesheet.csv \ –genome mm10 \ -profile docker I am having issues with memory: Error executing process > ‘NFCORE_RNASEQ:RNASEQ:FASTQC_UMITOOLS_TRIMGALORE:FASTQC (KO_3)’ Caused by: Process requirement exceed available memory –…
How to select or subset process outputs in Nextflow DSL2?
How to select or subset process outputs in Nextflow DSL2? 2 I have a DSL2 Nextflow workflow. I would like to use just the outputs named “paired.fq.gz” ( index 0 and 2 in the tuple) in downstream processes. Is there a way to filter or select a subset of the…
Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?
Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding? 9 Is there a simple tool I can use to quickly find out if a FASTQ file is in Sanger or Phred64 encoding? Ideally something that tells me ‘Encoding XX’ somewhere the terminal output. fastq tools • 46k…
10x 3′ library creates R1 and R2 fastq files with the same read length
Let me show you an example: trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR16093385&display=metadata This data contains two reads, R1 and R2. The read length of R1 and R2 are the same 150bp. However, this experiment is performed following 10x 3’library protocol. In the method section, it described as below: The scRNA-seq libraries were generated using the…
trimmomatic on scRNA seq data
trimmomatic on scRNA seq data 1 Hello I’m struggling with scRNA pipeline. I downloaded data from 10* genomics database : support.10xgenomics.com/single-cell-gene expression/datasets/3.0.0/pbmc_1k_v3 when I want to check the size of files I found this : -rw-r–r– 1 5062 5000 753851810 Nov 2 2018 pbmc_1k_v3_S1_L001_R1_001.fastq.gz -rw-r–r– 1 5062 5000 1772725195 Nov…
Trimmomatic generated two (reverse-forward) paired-files with different number of reads
Trimmomatic generated two (reverse-forward) paired-files with different number of reads 0 Hi all, Through the RNA-seq analysis workflow using Linux, Trimmomatic generates 4 out-put files; forward-paired.fq.gz, reverse-paired.fq.gz, and the 2 unpaired files. As I read in several threads, Trimomatic is expected to; Remove the adapters and the low-quality reads. generates…
Can’t add read group correctly to minimap2 sam alignmnet
Can’t add read group correctly to minimap2 sam alignmnet 1 Hello I am running minimap2 in a pipeline with GATK that needs read group data @RG with sample information. minimap2 -ax sr -t 20 -I 100G -R @RG\\tID:A00253_251_HTN2JDSXY.2\\tPL:ILLUMINA\tLB:LB1\\tSM:TA90 ref.mmi reads_1.fq.gz reads_2.fq.gz | samtools view -bh -F 260 -T ref.fa >out.bam…
snakemake wildcard in shell
snakemake wildcard in shell 0 I’m a newbie at snakemake. I’m trying to implement the GATK FastqToSam in a rule. I have the following and it works if I hard code the samplename into the shell part but I was wanting to get the samplename from the config file. I…
forcing read error correction using SPAdes
forcing read error correction using SPAdes 2 Given that this is my code below, why is SPAdes giving me the following message?: Mode: ONLY assembling (without read error correction) Debug mode is turned OFF I would like for the assembly to complete the read error correction step if possible. Based…
XenoCell fq.gz output files
XenoCell fq.gz output files 0 Hi, I followed the tutorial of XenoCell (see below) and extracted the graft barcodes by using hgmm_5k_v3 example dataset. I got 3 output files (cellular_barcodes.txt, fq_barcode.fq.gz, and fq_transcript.fq.gz) under graft folder. Does anyone know how to convert these files and feed them into 10x genomic…
STAR is running but .sam file size does not increase after hours mapping
STAR is running but .sam file size does not increase after hours mapping 0 Hi there, I’m using STAR with a small genome. My samples are paired. The commands are: For genome indexes STAR –runThreadN 20 –runMode genomeGenerate –genomeDir /path/to/folder/Analyses/STAR/ –genomeFastaFiles /path/to/genome_reference/genome.fna –readFilesCommand zcat path/to/folder/with/giz_samples/R1.fq.gz R2.fq.gz –sjdbGTFfile path/to/genome_reference/genome.gff –genomeSAindexNbases 11…
RNAseq for DE purpose
RNAseq for DE purpose 0 Hi all, I am totally new in the bioinformatic analysis. I am working on a project that looks at DGE among different time treatments. Besides, there is no reference genome (meaning that I need a de novo assembly step). So far, after struggling and navigating…
Randomize Read Order In Multigbp Fastq File?
Randomize Read Order In Multigbp Fastq File? 3 Is there any method to randomize the read order in a multi-Gbp fastq file? fastq • 6.0k views Assuming you are talking about a single-end file, you can use awk to put each 4-line fastq entry on a single line. You then…
Error 134 while aligning using hisat2
Error 134 while aligning using hisat2 0 Hello, I am using the below command to align the reads and get bam file: hisat2 -x /hisat/grch38/genome -1 /fastq/output_forward_paired.fq.gz -2 /fastq/output_reverse_paired.fq.gz | samtools sort -o /bams/outout.bam This was running perfectly ok for the last try, however, for the new try I got…
Error in trimmomatic
Error in trimmomatic 1 Hi! Trust you are well. I am trying to run this program but I get the following error and I dont know to fix it neither understand it. Could you help me please? java -jar trimmomatic-0.36.jar PE -phred33 white_replicate1.R1paired.fq.gz white_replicate1.R2paired.fq.gz white_replicate1.R1paired.fq.gz white_replicate1.R1unpaired.fq.gz white_replicate1.R2paired.fq.gz white_replicate1.R2unpaired.fq.gz ILLUMINACLIP:/mnt/g/poolseq_tutorial_1/poolseq_tutorial/adapters/TruSeq3- PE.fa:2:20:10:1:true…
cutadapt installed via conda igzip error for some fastq files
Only very recently (~2 weeks ago), cutadapt installed via conda has the following error: This is cutadapt 3.2 with Python 3.8.6 Command line parameters: -j 4 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC in2438_3_CKDL210000739-2a-AK5142-AK6697_HVHF2DSXY_L2_1.fq.gz Processing reads on 4 cores in single-end mode … [———>8 ] 00:00:26 5,536,084 reads @…
htslib/c what is the correct way to use bgzf_thread_pool ?
htslib/c what is the correct way to use bgzf_thread_pool ? 1 I try to split fastq files into ‘N’ chunks using a simple CC program and htslib-C . It works fine: ./split2file -o TMP S1.R1.fq.gz S1.R2.fq.gz -n 10 but when I use a thread pool ( As far as I…
Getting information on CRAM files from headers inside the files
Getting information on CRAM files from headers inside the files 1 Hello. I wish to know if one can find the following information in CRAM files’ headers: 1) Whether or not sequencing data in CRAM files is from WGS or WES, and if so, where? and 2) In case one…
Converting Bam file to Fasta (Zipped)
Converting Bam file to Fasta (Zipped) 0 I would like to convert .bam files to fq.gz (zipped fasta files) for paired reads. bedtools bamtofastq seems to be a commonly recommended method, I have also seen samtools fastq as a possible alternative. bedtools bamtofastq -i inputfile.bam -fq outputR1.fq -fq2 outputR2.fq samtools…
error when fastp filters data
Use fastp filter to appear sequence and quality have different lengths fastp -i CK-2_R1.fq.gz -o CK-2_R1.clean.fq.gz -I CK-2_R2.fq.gz -O CK-2_R2.clean.fq.gz After filtering the data for a while, it will not be updated anymore. [pengliang@fat01 CK-2]$ ERROR: sequence and quality have different length: WARNNIG: different read numbers of the 4852 packRead1…
BBmap bbduk.sh for filtering reads
I’m looking to filter reads that contain a stretch of A’s, I found these posts looking for polyA tails, meaning this should work all the same (Identify RNA-seq reads containing polyA sequence, Identifying RNA-seq reads containing polyA stretch). However, I cannot get it to work. Given just these two reads,…
Hisat2 – stringtie – deseq2 pipeline for bulk RNA seq
Software official website : Hisat2: Manual | HISAT2 StringTie:StringTie article :Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown | Nature Protocols It is recommended to watch the nanny level tutorial : 1. RNA-seq : Hisat2+Stringtie+DESeq2 – Hengnuo Xinzhi 2. RNA-seq use hisat2、stringtie、DESeq2 analysis – Simple books Basic usage…
Detailed differences between sambamba and samtools
3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…
The low successful assignment ratio of FeatureCounts
Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…
Trimmomatic/ linux system
Trimmomatic/ linux system 1 Hi all, I am trying to remove adapters and clean my RNA-seq.gz files using Trimmomatic, loaded on a Linux system (supercomputer server) Following the steps for Pair ends reads, explained in the manual (www.usadellab.org/cms/?page=trimmomatic) java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3…
Fastp file merge append | Develop Paper
Interpretation of fastq file formatwww.jianshu.com/p/39115d21ee17 Sometimes, the sequencing results of a species will return two double ended fastps.r1.fq.gz l1.fq.gzr2.fq.gz l2.fq.gzThe content of sequencing data is actually one piece, but it is divided into two parts during transmission.When we use it, we are used to merging it into a double ended…
Snakemake using multi inputs – Stackify
You need to define target output files using rule all. SAMPLES = [‘1’, ‘2’, ‘3’, ‘4’] rule all: input: expand(“sample{sample}.R{read_no}.fq.gz.out”, sample=SAMPLES, read_no=[‘1’, ‘2’]) rule fastp: input: reads1=”sample{sample}.R1.fq.gz”, reads2=”sample{sample}.R2.fq.gz” output: reads1out=”sample{sample}.R1.fq.gz.out”, reads2out=”sample{sample}.R2.fq.gz.out” shell: “fastp -i {input.reads1} -I {input.reads2} -o {output.reads1out} -O {output.reads2out}” This is the output of command snakemake -np, with…
bwa , 2 files fastq to 1 sam
bwa , 2 files fastq to 1 sam 1 i have this problem, please, help me, I’m trying it too from Mac OS Catalina I am creating a sam file, with 2 fastq files, using bwa I apply the following command bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz > V350019555_L03_B5GHUMqcnrRAABA-556.sam…
Secret BBMAP helper page – HRGV/Marmics_Metagenomics Wiki
#How to map to the assembled scaffolds.fasta bbmap is a powerful and highly flexible read mapper jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/. For the upcoming analysis you are not interested in the typical mapping output but in statistics on the coverage on every scaffold, you can get them with scaffstats. We want to be specific…
Trimmomatic parameters
Trimmomatic parameters 0 $java -jar /apps/eb/Trimmomatic/0.39-Java-1.8.0_144/trimmomatic-0.39.jar PE -phred33 seq1_L2_1.fq.gz seq1_L2_2.fq.gz _L2_r1_paired_fq.gz seq1_L2_r1_unpaired.fq.gz seq_L2_r2_paired.fq.gz Seq1_L2_r2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:5 ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:5 Trimmomatic • 137 views • link updated 15 hours ago by GenoMax 110k • written 17 hours ago by ronny • 0 Login before adding your…
Using STAR SJ.out.tab file to identify novel ncRNAs
Using STAR SJ.out.tab file to identify novel ncRNAs 0 Hi All, I am attempting to identify novel ncRNAs from a circadian RNAseq dataset. Specifically I have a ribo-depleted RNAseq timecourse with 31 samples (sample every 2 hours for 60hrs). I have run STAR (code below). I am trying to follow…
Mapping multiples
Mapping multiples 1 Hi, I am coming to you for help. I am doing a mapping on short and long read files with BWA and MINIMAP2 My problem is that, I want to make an if loop that would allow me to choose either BWA if I work with short…
STAR+RSEM pippline without gtf
STAR+RSEM pippline without gtf 0 Dear all, I have question I mapped reads on cds sequence through STAR I don’t have gtf file and want to calculate read count using RSEM but I am stuck by error “RSEM error: RSEM currently does not support gapped alignments” as I don’t have…
BBMerge / Tadpole error correction
I’ve been using BBMerge recently to address a very specific problem: I am sequencing pooled short DNA molecules (< 400bps) using paired end reads (average length ~ 230 bps post trimming) Each molecule can be assumed to be different (i.e. contains sequence differences – substitutions & indels – with respect…
How to pass custom software specific variables to nf-core/sarek nextflow pipeline?
How to pass custom software specific variables to nf-core/sarek nextflow pipeline? 0 I’m attempting to call whole genome variants using nf-core/sarek nextflow pipeline. In QC step there is an option that invokes trim_galore quality trimming, but i don’t know how to pass my custom adapters to be cut as well….
STAR align multiple files
STAR align multiple files 1 Hi everybody, I am doing alignment to 36 PE samples using star. to make it little bit easy task I wrote a bash loop to align them all with the same command. here is my loop: for i in $(ls raw_data); do STAR –genomeDir index.150…
Biostar Systems
Comment: STAR vs Novoalign IGV Browser visualization by chasem ▴ 10 That is good to know that it isn’t just my set of reads…still concerning, though. Comment: STAR vs Novoalign IGV Browser visualization by chasem ▴ 10 I was not expecting this — not sure what to make of it…
question about running CIRI-full
question about running CIRI-full 1 I’m using ciri-full to calculate the full length sequence of circRNAs ,and I can run the test data set successfully, but I can’t run my own data running test data set: java -jar ../CIRI-full.jar Pipeline -1 test_1.fq.gz -2 test_2.fq.gz -a test_anno.gtf -r test_ref.fa -d test_output/…
I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv.
I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv. 0 Hey everyone, before i start apologies for the inconvenience cause of my wrong or inappropriate use of terms. I take some fails of bwa mem lately. As i…