Categories
Tag: fq.gz
invalid deflate data (invalid code lengths set)
I am trying to trim paired end reads using Trim-Galore. I have made sure that the files match based on the total reads processed in the output txt file from trim-galore. One of the files trimmed correctly but when I try some of the others the total written and quality…
Snakemake rule error
Snakemake rule error 0 I have the following rule in snakemake: rule low_coverage_contig_reads: input: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam.bai”, output: r1=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R1.fq.gz”, r2=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R2.fq.gz” threads: 8 params: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam” log: log1=”logs/{sample}_{fraction}_low_coverage_reads.log”, shell: “”” (samtools coverage {params.bam} | awk ‘NR > 1 && $7 < 10 {{print $1}}’ | tr ‘\\n’ ‘ ‘ | samtools view -u {params.bam}…
r – Fst calculation from VCF files
I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…
Yes .. BBMap can do that!
NOTE: This collection was originally posted at SeqAnswers.com. Creating a copy here to preserve the information.Part I is available here: Yes .. BBMap can do that! – Part I : bbmap (aligner), bbduk (scan/trim), repair (fix PE reads) and reformat (format conversions)Part II is available here: Yes .. BBMap can…
bwa mem hangs after a few thousand reads
I am trying to align a bunch of paired sample fastq files using bwa mem. My original command was: bwa mem -t 8 hg38.fa sample_read1.fq.gz sample_read2.fq.gz > sample_paired.sam I am running this on a HPC cluster. These files have approx. 25 million reads, so I initially anticipated that they might…
low rate of ‘Successfully assigned alignments’
Hello everybody. I’m a newbie in RNA-seq Analysis, and I have this situation that I don’t really understand. While working with featureCounts for RNA-seq read quantification, I came across an intriguing issue. The rate of successfully assigned alignments turned out to be unexpectedly low, totalling just 15463270 (7.6%). This was…
not best k value found
not best k value found 0 i am running kmergenie to estimate best k, however i am ending up with this error. DeprecationWarning: pkg_resources is deprecated as an API. See setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources running histogram estimation list of reads: /media/gyanlab/D/shailesh/Frogs/X201SC22082726-Z01-F002/01.RawData/Meg3/error_corrected/Meg3_read1.fq.gz /media/gyanlab/D/shailesh/Frogs/X201SC22082726-Z01-F002/01.RawData/Meg3/error_corrected/Meg3_read2.fq.gz Setting maximum kmer length to: 150 bp computing histograms…
jellyfish histo is empty
I ran Jellyfish > zcat V*_R1.fastp.fq.gz | jellyfish count -t 2 -C -m 21 -s 5G –quality-start=33 -o reads.jf > cat reads.jf 000001535{“alignment”:8,”canonical”:true,”cmdline”:[“count”,”-t”,”2″,”-C”,”-m”,”21″,”-s”,”5G”,”–quality-start=33″,”-o”,”reads.jf”],”counter_len”:4,”exe_path”:”/mnt/hpccs01/work/x/miniconda2/envs/jellyfish/bin/jellyfish”,”format”:”binary/sorted”,”hostname”:”cl5n014″,”key_len”:42,”matrix1″:{“c”:42,”columns”:[7114389000,2192333556,7375600293,3957177567,1753639420,5791419737,7660672264,3955008356,2531272710,4865888693,6095688361,6974692332,2128614574,2258809230,885814170,1703365744,5945426963,375951490,7217126865,6145592432,7418978358,2071372677,3810558711,7150470371,5456096994,5475478,7949231332,5100019741,6318414384,7141154869,3543742516,6024410566,5605745954,575704984,8124532068,1053004640,2675922405,1674046033,5535382637,1853889752,1425936082,3907867046],”identity”:false,”r”:33},”max_reprobe”:126,”pwd”:”/mnt/hpccs01/scratch/x/y/fastpDNA”,”reprobes”:[1,1,3,6,10,15,21,28,36,45,55,66,78,91,105,120,136,153,171,190,210,231,253,276,300,325,351,378,406,435,465,496,528,561,595,630,666,703,741,780,820,861,903,946,990,1035,1081,1128,1176,1225,1275,1326,1378,1431,1485,1540,1596,1653,1711,1770,1830,1891,1953,2016,2080,2145,2211,2278,2346,2415,2485,2556,2628,2701,2775,2850,2926,3003,3081,3160,3240,3321,3403,3486,3570,3655,3741,3828,3916,4005,4095,4186,4278,4371,4465,4560,4656,4753,4851,4950,5050,5151,5253,5356,5460,5565,5671,5778,5886,5995,6105,6216,6328,6441,6555,6670,6786,6903,7021,7140,7260,7381,7503,7626,7750,7875,8001],”size”:8589934592,”time”:”Wed Nov 22 14:46:48 2023″,”val_len”:7} However, when I ran jellyfish histo -t 2 reads.jf > reads.histo. The output is empty. What did I miss? Read more here: Source…
Jellyfish problem with Failed to open input file ‘reads.jf’
Jellyfish problem with Failed to open input file ‘reads.jf’ 0 Hi, I can’t get jellyfish running with two different ways: Method 1 jellyfish count -t 10 -C -m 21 -s 10G –quality-start=33 -o reads.jf <(zcat V350181330_L03_R1.fastp.fq.gz) <(zcat V350181330_L04_R1.fastp.fq.gz) 26897 Killed jellyfish count -t 10 -C -m 21 -s 10G –quality-start=33…
Efficient Bulk Data Retrieval from NCBI BioProject
Efficient Bulk Data Retrieval from NCBI BioProject 0 Hello, A month ago, I utilized the SRA Toolkit Pipeline to download Fastq files from a BioProject accession. Following the recommended steps, I generated a list of SRR Names, used prefetch, and then employed fasterq-dump (using parallel-fastq-dump) to obtain the data locally,…
featureCount Error “No paired-end reads were detected in paired-end read library”
I created a combined mm39 and MHV-A59 (Viral) reference genome and aligned my paired end reads using STAR with the following input commands: STAR –runMode alignReads –runThreadN 16 –genomeDir /genomeDir –readFilesIn /FastqDir/1.fq.gz, /FastqDir/2.fq.gz –readFilesCommand gunzip -c –outReadsUnmapped Fastx –outSAMtype BAM SortedByCoordinate It seems that everything went fine. Here is a…
Htseq Count
Hello, I have ran htseq-count numerous times and continue to get the same error. That NONE of my genes are counted as seen here. ZXDC 0 ZYG11B 0 ZYX 0 ZZEF1 0 ZZZ3 0 __no_feature 70257177 __ambiguous 0 __too_low_aQual 1509790 __not_aligned 3970775 __alignment_not_unique 4277765 However, I have a very high…
Wildcards in Snakemake
I wish to input the names of my samples in a table with the corresponding files of reads forward and reverse to use them in a Snakemake workflow: sample fq1 fq2 1 ../reads/110627_0240_AC0254ABXX_2_SA-PE-001.1.fq.gz ../reads/110627_0240_AC0254ABXX_2_SA-PE-001.2.fq.gz 2 ../reads/110627_0240_AC0254ABXX_2_SA-PE-002.1.fq.gz ../reads/110627_0240_AC0254ABXX_2_SA-PE-002.2.fq.gz 22 ../reads/110802_0249_AD0CM0ABXX_3_SA-PE-022.1.fq.gz ../reads/110802_0249_AD0CM0ABXX_3_SA-PE-022.2.fq.gz Unfortunately, the management of the wildcards is more complex and…
Viral positive and negative strand with paired sequencing and bowtie
You should expect the same number of forward and reverse strand reads because read 1 and read 2 are on opposite strands. What would be more interesting here is to first split the mapped file into r1 and r2, then split THOSE files into forward and reverse, then combine R1…
Redirecting the output of a command related to another command
Redirecting the output of a command related to another command 1 I want to get a .txt file showing the computer’s performance of gtime command related to another command: bwa mem. Bwa is a genome aligner tool. Below the code I’ve used: { gtime -v bwa mem -t 4 -R…
Mapping of paired-end ddRADseq results in 0.00% of reads pairing
Hey all, I’m trying to map my RADseq to a reference genome, and none of my paired-end reads are being paired. Forward and reverse reads are both mapping separately, but not pairing. This problem is consistent for all of my samples. Any help would be much appreciated!! I also viewed…
Error when running HUMAnN – HUMAnN
Hi, I am keep getting an error when running humann v3.7 humann v3.7 was installed with conda conda install humann -c biobakery –solver=libmambaBefore installing the environment was created and channels were configured as per instructions I downloaded the databases as follow humann_databases –download chocophlan full /home/swijegun/humann/databases/ –update-config yes humann_databases –download…
more reads in metagenomic samples after ‘removing host reads’
more reads in metagenomic samples after ‘removing host reads’ 0 the title says it all – this is driving me nuts I have Illumina shotgun data from cow manure and I am trying to remove the cow reads. I have downloaded the latest cow genome and used it as reference…
what is the mean of the file “*._f1.fq.gz” and “*._r2.fa.gz”
what is the mean of the file “*._f1.fq.gz” and “*._r2.fa.gz” 1 Hello, I downloaded the files from: bigd.big.ac.cn/gsa/ The file is ended with: ” *._f1.fq.gz” and ” *._r2.fa.gz”. Is it single-end or paired-end sequencing? If it is paired-end sequencing, the file should be: ” *._r1.fq.gz” and ” *._r2.fa.gz”, not ”…
How to run TRUST4 for BCR/TCR detection using 10X data?
I’m trying to follow this tutorial here: github.com/liulab-dfci/TRUST4#10x-genomics-data-and-barcode-based-single-cell-data However, I’m not sure how to adapt my data. I have R1 reads that look like this where the reads are 28 bp long: @A00588:95:H2H5KDRX3:1:1101:1163:1000 1:N:0:GTAACATGCG+AGGTAACACT AGCTATCTACTTCTGGTACAACCCACTN + FFFF,FFFFF:FFFFFFF:FF:FFFFF# @A00588:95:H2H5KDRX3:1:1101:1904:1000 1:N:0:GTAACATGCG+AGGTAACACT My R2 reads look like this and they 90 bp long:…
GATK AnnotateVcfWithBamDepth returns zero DP for all variants in VCF
Dear all, I am using GATK (v4.1.9.0) AnnotateVcfWithBamDepth to get the DP for all variants in ClinVar VCF in a retina RNA-seq BAM file. However, the tool returns zero depth for all variants in the VCF, even though I checked multiple variants in IGV and I saw that they are…
Expected number of raw reads in fq.gz files in PE sequencing
Expected number of raw reads in fq.gz files in PE sequencing 1 Hi guys, I have a ridiculously fundamental question so can someone please help me out? Background: An external company did RNA-seq for me. I ordered 20M read depth, paired-end 150 bp seq. Question: Am I supposed to expect…
Trimming of reads in miRNA-Seq data
Trimming of reads in miRNA-Seq data 0 Dear All, I have been trying to filter out reads from Fastq files from miRNA-Seq that we received. The read structure looks like the one shown in the figure below. I can use Cutadapt to filter out the adapter (we have the adapter…
Check Strandedness
Check Strandedness 0 I need to figure out the strandedness for the -s flag for regtools junctions extract used for Leafcutter. I get a peculiar error when using how_are_we_stranded_here. Command Run: check_strandedness –gtf path/to/Danio_rerio.GRCz11.110.chr.gtf –transcripts /path/to/Danio_rerio.GRCz11.dna_sm.primary_assembly.fa –reads_1 Sample_1_R1.fq.gz –reads_2 Sample_1_R2.fq.gz Gives the output: Results stored in: stranded_test_Sample_1_R1 converting gtf to…
Can you convert ASCII text to fastq?
Can you convert ASCII text to fastq? 1 Hi smarty pants peeps, I got sequence files from collaborators and their original file types look like this: File: MG430_L4_2.fq.gz, Type: application/x-gzip File: MG431_L4_1.fq.gz, Type: application/x-gzip File:MG431_L4_2.fq.gz, Type: application/x-gzip I ran fastqc, trimmomatic (removing adapters) and then fastqc again. I then started…
No module named ‘readfq’ arises during the execution of kmergenie
ModuleNotFoundError: No module named ‘readfq’ arises during the execution of kmergenie 0 $cat kmer_list_files D31-10A_1.qc.fq.gz D31-10A_2.qc.fq.gz $ kmergenie kmer_list_files –diploid /home/cdbi1/miniconda3/envs/gensize/bin/kmergenie:29: DeprecationWarning: pkg_resources is deprecated as an API. See setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources running histogram estimation list of reads: D31-10A_1.qc.fq.gz D31-10A_2.qc.fq.gz Traceback (most recent call last): File “/home/cdbi1/miniconda3/envs/gensize/bin/kmergenie”, line 303, in…
Filtering parasite reads from host reads
Filtering parasite reads from host reads 0 Hi all, I have RNA-seq PE fastq reads for control and infected (at different time points) samples. The host species is lacking a reference genome. Thus, I will be doing a de novo transcriptome assembly and later differential gene expression analysis. My questions:…
HISAT2 HLA genotyping errors
HISAT2 HLA genotyping errors 2 Hi,I’m trying to follow the tutorial for HLA typing and assembly in HISAT2 as described at ccb.jhu.edu/hisat-genotype/index.php/Type:HLA using my own RNAseq data. I do everything as described. I have samtools 1.7 installed. I am able to extract my HLA reads. But when I get to…
Salmon_index.json doesn’t seem to exist
Good evening, thanks to a previous post I was able to solve one problem in my pipeline. Now I am dealing with another one. I have to quantify some rna sequence with salmon and I downloaded a pre-computed index from the link suggested in salmon documentation. The link is this,…
Mapping to mtDNA and then align the unmapped
Mapping to mtDNA and then align the unmapped 1 Hello all, I have aligned my samples against the mitochondrion genome of the species I work with. My idea was that after this I would keep the unmapped ones (which would be the nuclear reads), and then align these against the…
bbsplit running slow or out of memory?
bbsplit running slow or out of memory? 1 Hello, I have Illumina fastq files from some RNA-seq, ATAC-seq and WES that originated as PDX samples. I am looking to filter out contaminating mouse reads from the human reads in these datasets. I have used Xenome before but wanted to try…
Use bbmap reformat.sh to convert from paired fq files to a bam file
Use bbmap reformat.sh to convert from paired fq files to a bam file 1 As outlined here I was able to create paired-end fastq files with the help of GenoMax . Now I wonder how I can use reformat.sh from bbmap to convert this files to a valid bam file…
Removing overrepresented sequences in paired end RNA-seq
Removing overrepresented sequences in paired end RNA-seq 0 After trimming and QC’ing RNAseq for adapters with trim_galore, should I remove overrepresented sequences that FastQC identifies as possible adapter/primer source? If so, is there a way to do so automatically? For what it’s worth, both of the identified sequences start with…
Use bbmap reformat.sh to convert from paired fq files to a valid abam file
Use bbmap reformat.sh to convert from paired fq files to a valid abam file 1 As outlined here I was able to create paired-end fastq files with the help of GenoMax . Now I wonder how I can use reformat.sh from bbmap to convert this files to a valid bam…
FastQC command line usage
FastQC command line usage 2 Hi, I am supposed to run FastQC in linux server (I don’t have chance to use graphical user interface) how can I use command line for running fastqc for my sequences? Thanks in advance. command-line FastQC linux CentOS • 7.8k views After installing FastQC, you…
How to extract sequences from multiple fastq files based on a certain sequence ?
How to extract sequences from multiple fastq files based on a certain sequence ? 1 Dear all, I would like to know if it is possible to extract/retain sequences from multiple fastq files based on a certan input sequences, and get new fastq files containing only sequences sharing the input…
samtools collate
samtools collate 0 Hi all, I am using samtools collate to convert my bam files to paired end fastq files. here is the command that I am using samtools view -h -T mm10.fa {input.bam} | samtools collate -O -u -@ {threads} – | samtools fastq -1 output_paired1.fq.gz -2 output_paired2.fq.gz -0…
Very few snp and indels variation were identified using PAV variation input file base on vg call
Very few snp and indels variation were identified using PAV variation input file base on vg call 0 Hi all, We want to find the snp and indels variation from the result vcf file BS_graph_call.vcf by using the pan_genome vg analysis software. There are only **fewer than 20 snp and…
SNP analysis with an assembly
Hi there, I am new in SNP analyses so before starting doing anything I would like to check if my pipeline is correct. What I have now is : RNA-seq samples (.fq.gz) + Trinity assembly (from those reads). My model organism has not an assembled genome, that’s the way I…
python – Snakemake wrappers suddenly stopped working
I have this wrappers in my snakemake file rule fastqc: input: “reads/{sample}_trimmed.fq.gz” output: html=”qc/fastqc/{sample}.html”, zip=”qc/fastqc/{sample}_fastqc.zip” # the suffix _fastqc.zip is necessary for multiqc to find the file params: extra = “–quiet” log: “logs/fastqc/{sample}.log” threads: config[“resources”][“fastqc”][“cpu”] conda: “envs/qc.yaml” wrapper: “v1.31.1/bio/fastqc” qc.yaml: name: qc channels: – bioconda dependencies: – python – fastqc…
Cutadapt error: too many parameters.
Cutadapt error: too many parameters. 0 Hi biostars community! I am having issues to loop cutadapt over gunzipped samples. This is the script I am using: #!/bin/bash #SBATCH –account GRINFISH #SBATCH -c 8 #SBATCH –mem 96g #SBATCH –output logfile.out #SBATCH –error logfile.err # This script performs trimming for PE sequences…
Trimmomatic run error
Trimmomatic run error 0 Hello I have a pb On running input_dir=”$HOME/workdir/group” output_dir=”$HOME/workdir/group/fqdata_trimmed” adap=”$CONDA_PREFIX/share/trimmomatic-0.39-1/adapters” f1=”$HOME/workdir/group/P4.R1.fq.gz” f2=”$HOME/workdir/group/P4.R2.fq.gz” newf1=”$HOME/workdir/group/P4.R1.pe.trim.fq.gz” newf2=”$HOME/workdir/group/P4.R2.pe.trim.fq.gz” newf1u=”$HOME/workdir/group/P4.R1.se.trim.fq.gz” newf2u=”$HOME/workdir/group/P4.R2.se.trim.fq.gz” mismatch_values=(1 2 3 4 5) for mismatch_value in “${mismatch_values[@]}” do trimmomatic PE -threads 1 -phred33 -trimlog trimLogFile -summary statsSummaryFile \ $f1 $f2 $newf1 $newf1U $newf2 $newf2U \ ILLUMINACLIP:$adap/TruSeq3-PE-2.fa:${mismatch_value}:30:10:1 \ SLIDINGWINDOW:4:15…
No differentially expressed genes after multiple testing correction in mice
No differentially expressed genes after multiple testing correction in mice 0 Hi all, I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified. I have had .cram files from mouse…
High number of duplicates and low percentage properly paired
High number of duplicates and low percentage properly paired 0 I have some paired end sequencing data that I have trimmed using cutadapt. It was sequenced on an illumina novaseq 6000 and is low coverage RADseq data (2-3x). My cutadapt script used forward and reverse adapters from illumina : cutadapt…
How to split a fastq file to multiples fastq files
How to split a fastq file to multiples fastq files 1 Dear all, I have a fastq.gz file that has more than 100 million reads. My aim is to divide this fastq file into three separate fastq files, ensuring that all reads from the original fastq file are distributed and…
Error, fewer reads in file specified with -1 than in file specified with -2
Bowtie2: Error, fewer reads in file specified with -1 than in file specified with -2 1 Hi all, This is my first time attempting to align sequences to a reference index. I am using bowtie2 with the -1 and -2 arguments and have gotten the following error message: Error, fewer…
Snakemake workflow for trimmomatic
Snakemake workflow for trimmomatic 0 Hello everybody ! I’m a novice in Snakemake. I want to create a workflow for Illumina data analysis. I’m currently programming trimmomatic rule and I’m facing to issue. This is the code: SAMPLES = [“1G_S15”, “7G_S13″] rule trimmomatic_pe: input: adaptaters =”Illumina/adaptaters/TruSeq2-PE.fa”, forward = expand(“HHV8/fastq_raw_/fastq_H8/{sample}_R1.fastq.gz”, sample…
Correct script for featurecounts in Rsubread
I am new to R and RStudio but have been trying to work through different examples using Rsubread for my data. I have tried reading vignettes and manuals prior to posting here but I am stuck and could really use some advice. I have 7 paired-end, fastq files from Illumina…
Hard clip fastq
Hard clip fastq 2 I hope this is not a silly question. I have 2x 200bp fastqs generated from MGI G400 sequencer. I would like to do a comparison with Illumina but these only come as 2x 150bp fastqs. Is it possible to hard clip the 2x 200bp fastqs down…
Nextflow memory issues custom config -c
Nextflow memory issues custom config -c 1 Hi all, I am trying to run nextflow on my laptop nextflow run nf-core/rnaseq \ –input samplesheet.csv \ –genome mm10 \ -profile docker I am having issues with memory: Error executing process > ‘NFCORE_RNASEQ:RNASEQ:FASTQC_UMITOOLS_TRIMGALORE:FASTQC (KO_3)’ Caused by: Process requirement exceed available memory –…
How to select or subset process outputs in Nextflow DSL2?
How to select or subset process outputs in Nextflow DSL2? 2 I have a DSL2 Nextflow workflow. I would like to use just the outputs named “paired.fq.gz” ( index 0 and 2 in the tuple) in downstream processes. Is there a way to filter or select a subset of the…
Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?
Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding? 9 Is there a simple tool I can use to quickly find out if a FASTQ file is in Sanger or Phred64 encoding? Ideally something that tells me ‘Encoding XX’ somewhere the terminal output. fastq tools • 46k…
10x 3′ library creates R1 and R2 fastq files with the same read length
Let me show you an example: trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR16093385&display=metadata This data contains two reads, R1 and R2. The read length of R1 and R2 are the same 150bp. However, this experiment is performed following 10x 3’library protocol. In the method section, it described as below: The scRNA-seq libraries were generated using the…
trimmomatic on scRNA seq data
trimmomatic on scRNA seq data 1 Hello I’m struggling with scRNA pipeline. I downloaded data from 10* genomics database : support.10xgenomics.com/single-cell-gene expression/datasets/3.0.0/pbmc_1k_v3 when I want to check the size of files I found this : -rw-r–r– 1 5062 5000 753851810 Nov 2 2018 pbmc_1k_v3_S1_L001_R1_001.fastq.gz -rw-r–r– 1 5062 5000 1772725195 Nov…
Trimmomatic generated two (reverse-forward) paired-files with different number of reads
Trimmomatic generated two (reverse-forward) paired-files with different number of reads 0 Hi all, Through the RNA-seq analysis workflow using Linux, Trimmomatic generates 4 out-put files; forward-paired.fq.gz, reverse-paired.fq.gz, and the 2 unpaired files. As I read in several threads, Trimomatic is expected to; Remove the adapters and the low-quality reads. generates…
Can’t add read group correctly to minimap2 sam alignmnet
Can’t add read group correctly to minimap2 sam alignmnet 1 Hello I am running minimap2 in a pipeline with GATK that needs read group data @RG with sample information. minimap2 -ax sr -t 20 -I 100G -R @RG\\tID:A00253_251_HTN2JDSXY.2\\tPL:ILLUMINA\tLB:LB1\\tSM:TA90 ref.mmi reads_1.fq.gz reads_2.fq.gz | samtools view -bh -F 260 -T ref.fa >out.bam…
snakemake wildcard in shell
snakemake wildcard in shell 0 I’m a newbie at snakemake. I’m trying to implement the GATK FastqToSam in a rule. I have the following and it works if I hard code the samplename into the shell part but I was wanting to get the samplename from the config file. I…
forcing read error correction using SPAdes
forcing read error correction using SPAdes 2 Given that this is my code below, why is SPAdes giving me the following message?: Mode: ONLY assembling (without read error correction) Debug mode is turned OFF I would like for the assembly to complete the read error correction step if possible. Based…
XenoCell fq.gz output files
XenoCell fq.gz output files 0 Hi, I followed the tutorial of XenoCell (see below) and extracted the graft barcodes by using hgmm_5k_v3 example dataset. I got 3 output files (cellular_barcodes.txt, fq_barcode.fq.gz, and fq_transcript.fq.gz) under graft folder. Does anyone know how to convert these files and feed them into 10x genomic…
STAR is running but .sam file size does not increase after hours mapping
STAR is running but .sam file size does not increase after hours mapping 0 Hi there, I’m using STAR with a small genome. My samples are paired. The commands are: For genome indexes STAR –runThreadN 20 –runMode genomeGenerate –genomeDir /path/to/folder/Analyses/STAR/ –genomeFastaFiles /path/to/genome_reference/genome.fna –readFilesCommand zcat path/to/folder/with/giz_samples/R1.fq.gz R2.fq.gz –sjdbGTFfile path/to/genome_reference/genome.gff –genomeSAindexNbases 11…
RNAseq for DE purpose
RNAseq for DE purpose 0 Hi all, I am totally new in the bioinformatic analysis. I am working on a project that looks at DGE among different time treatments. Besides, there is no reference genome (meaning that I need a de novo assembly step). So far, after struggling and navigating…
Randomize Read Order In Multigbp Fastq File?
Randomize Read Order In Multigbp Fastq File? 3 Is there any method to randomize the read order in a multi-Gbp fastq file? fastq • 6.0k views Assuming you are talking about a single-end file, you can use awk to put each 4-line fastq entry on a single line. You then…
Error 134 while aligning using hisat2
Error 134 while aligning using hisat2 0 Hello, I am using the below command to align the reads and get bam file: hisat2 -x /hisat/grch38/genome -1 /fastq/output_forward_paired.fq.gz -2 /fastq/output_reverse_paired.fq.gz | samtools sort -o /bams/outout.bam This was running perfectly ok for the last try, however, for the new try I got…
Error in trimmomatic
Error in trimmomatic 1 Hi! Trust you are well. I am trying to run this program but I get the following error and I dont know to fix it neither understand it. Could you help me please? java -jar trimmomatic-0.36.jar PE -phred33 white_replicate1.R1paired.fq.gz white_replicate1.R2paired.fq.gz white_replicate1.R1paired.fq.gz white_replicate1.R1unpaired.fq.gz white_replicate1.R2paired.fq.gz white_replicate1.R2unpaired.fq.gz ILLUMINACLIP:/mnt/g/poolseq_tutorial_1/poolseq_tutorial/adapters/TruSeq3- PE.fa:2:20:10:1:true…
cutadapt installed via conda igzip error for some fastq files
Only very recently (~2 weeks ago), cutadapt installed via conda has the following error: This is cutadapt 3.2 with Python 3.8.6 Command line parameters: -j 4 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC in2438_3_CKDL210000739-2a-AK5142-AK6697_HVHF2DSXY_L2_1.fq.gz Processing reads on 4 cores in single-end mode … [———>8 ] 00:00:26 5,536,084 reads @…
htslib/c what is the correct way to use bgzf_thread_pool ?
htslib/c what is the correct way to use bgzf_thread_pool ? 1 I try to split fastq files into ‘N’ chunks using a simple CC program and htslib-C . It works fine: ./split2file -o TMP S1.R1.fq.gz S1.R2.fq.gz -n 10 but when I use a thread pool ( As far as I…
Getting information on CRAM files from headers inside the files
Getting information on CRAM files from headers inside the files 1 Hello. I wish to know if one can find the following information in CRAM files’ headers: 1) Whether or not sequencing data in CRAM files is from WGS or WES, and if so, where? and 2) In case one…
Converting Bam file to Fasta (Zipped)
Converting Bam file to Fasta (Zipped) 0 I would like to convert .bam files to fq.gz (zipped fasta files) for paired reads. bedtools bamtofastq seems to be a commonly recommended method, I have also seen samtools fastq as a possible alternative. bedtools bamtofastq -i inputfile.bam -fq outputR1.fq -fq2 outputR2.fq samtools…
error when fastp filters data
Use fastp filter to appear sequence and quality have different lengths fastp -i CK-2_R1.fq.gz -o CK-2_R1.clean.fq.gz -I CK-2_R2.fq.gz -O CK-2_R2.clean.fq.gz After filtering the data for a while, it will not be updated anymore. [pengliang@fat01 CK-2]$ ERROR: sequence and quality have different length: WARNNIG: different read numbers of the 4852 packRead1…
BBmap bbduk.sh for filtering reads
I’m looking to filter reads that contain a stretch of A’s, I found these posts looking for polyA tails, meaning this should work all the same (Identify RNA-seq reads containing polyA sequence, Identifying RNA-seq reads containing polyA stretch). However, I cannot get it to work. Given just these two reads,…
Hisat2 – stringtie – deseq2 pipeline for bulk RNA seq
Software official website : Hisat2: Manual | HISAT2 StringTie:StringTie article :Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown | Nature Protocols It is recommended to watch the nanny level tutorial : 1. RNA-seq : Hisat2+Stringtie+DESeq2 – Hengnuo Xinzhi 2. RNA-seq use hisat2、stringtie、DESeq2 analysis – Simple books Basic usage…
Detailed differences between sambamba and samtools
3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…
The low successful assignment ratio of FeatureCounts
Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…
Trimmomatic/ linux system
Trimmomatic/ linux system 1 Hi all, I am trying to remove adapters and clean my RNA-seq.gz files using Trimmomatic, loaded on a Linux system (supercomputer server) Following the steps for Pair ends reads, explained in the manual (www.usadellab.org/cms/?page=trimmomatic) java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3…
Fastp file merge append | Develop Paper
Interpretation of fastq file formatwww.jianshu.com/p/39115d21ee17 Sometimes, the sequencing results of a species will return two double ended fastps.r1.fq.gz l1.fq.gzr2.fq.gz l2.fq.gzThe content of sequencing data is actually one piece, but it is divided into two parts during transmission.When we use it, we are used to merging it into a double ended…
Snakemake using multi inputs – Stackify
You need to define target output files using rule all. SAMPLES = [‘1’, ‘2’, ‘3’, ‘4’] rule all: input: expand(“sample{sample}.R{read_no}.fq.gz.out”, sample=SAMPLES, read_no=[‘1’, ‘2’]) rule fastp: input: reads1=”sample{sample}.R1.fq.gz”, reads2=”sample{sample}.R2.fq.gz” output: reads1out=”sample{sample}.R1.fq.gz.out”, reads2out=”sample{sample}.R2.fq.gz.out” shell: “fastp -i {input.reads1} -I {input.reads2} -o {output.reads1out} -O {output.reads2out}” This is the output of command snakemake -np, with…
bwa , 2 files fastq to 1 sam
bwa , 2 files fastq to 1 sam 1 i have this problem, please, help me, I’m trying it too from Mac OS Catalina I am creating a sam file, with 2 fastq files, using bwa I apply the following command bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz > V350019555_L03_B5GHUMqcnrRAABA-556.sam…
Secret BBMAP helper page – HRGV/Marmics_Metagenomics Wiki
#How to map to the assembled scaffolds.fasta bbmap is a powerful and highly flexible read mapper jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/. For the upcoming analysis you are not interested in the typical mapping output but in statistics on the coverage on every scaffold, you can get them with scaffstats. We want to be specific…
Trimmomatic parameters
Trimmomatic parameters 0 $java -jar /apps/eb/Trimmomatic/0.39-Java-1.8.0_144/trimmomatic-0.39.jar PE -phred33 seq1_L2_1.fq.gz seq1_L2_2.fq.gz _L2_r1_paired_fq.gz seq1_L2_r1_unpaired.fq.gz seq_L2_r2_paired.fq.gz Seq1_L2_r2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:5 ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:5 Trimmomatic • 137 views • link updated 15 hours ago by GenoMax 110k • written 17 hours ago by ronny • 0 Login before adding your…
Using STAR SJ.out.tab file to identify novel ncRNAs
Using STAR SJ.out.tab file to identify novel ncRNAs 0 Hi All, I am attempting to identify novel ncRNAs from a circadian RNAseq dataset. Specifically I have a ribo-depleted RNAseq timecourse with 31 samples (sample every 2 hours for 60hrs). I have run STAR (code below). I am trying to follow…
Mapping multiples
Mapping multiples 1 Hi, I am coming to you for help. I am doing a mapping on short and long read files with BWA and MINIMAP2 My problem is that, I want to make an if loop that would allow me to choose either BWA if I work with short…
STAR+RSEM pippline without gtf
STAR+RSEM pippline without gtf 0 Dear all, I have question I mapped reads on cds sequence through STAR I don’t have gtf file and want to calculate read count using RSEM but I am stuck by error “RSEM error: RSEM currently does not support gapped alignments” as I don’t have…
BBMerge / Tadpole error correction
I’ve been using BBMerge recently to address a very specific problem: I am sequencing pooled short DNA molecules (< 400bps) using paired end reads (average length ~ 230 bps post trimming) Each molecule can be assumed to be different (i.e. contains sequence differences – substitutions & indels – with respect…
How to pass custom software specific variables to nf-core/sarek nextflow pipeline?
How to pass custom software specific variables to nf-core/sarek nextflow pipeline? 0 I’m attempting to call whole genome variants using nf-core/sarek nextflow pipeline. In QC step there is an option that invokes trim_galore quality trimming, but i don’t know how to pass my custom adapters to be cut as well….
STAR align multiple files
STAR align multiple files 1 Hi everybody, I am doing alignment to 36 PE samples using star. to make it little bit easy task I wrote a bash loop to align them all with the same command. here is my loop: for i in $(ls raw_data); do STAR –genomeDir index.150…
Biostar Systems
Comment: STAR vs Novoalign IGV Browser visualization by chasem ▴ 10 That is good to know that it isn’t just my set of reads…still concerning, though. Comment: STAR vs Novoalign IGV Browser visualization by chasem ▴ 10 I was not expecting this — not sure what to make of it…
question about running CIRI-full
question about running CIRI-full 1 I’m using ciri-full to calculate the full length sequence of circRNAs ,and I can run the test data set successfully, but I can’t run my own data running test data set: java -jar ../CIRI-full.jar Pipeline -1 test_1.fq.gz -2 test_2.fq.gz -a test_anno.gtf -r test_ref.fa -d test_output/…
I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv.
I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv. 0 Hey everyone, before i start apologies for the inconvenience cause of my wrong or inappropriate use of terms. I take some fails of bwa mem lately. As i…