Tag: picard

Efficiently merge two BAM files while retaining reads from only one file in overlapping regions

Efficiently merge two BAM files while retaining reads from only one file in overlapping regions 1 I have a WGS BAM file that is fairly large (>150GB) and a smaller BAM file (<5GB) with reads in a small 10Mbp region. I want to (efficiently) merge the two BAM files while…

Continue Reading Efficiently merge two BAM files while retaining reads from only one file in overlapping regions

sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…

Continue Reading sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds

is BBMap/Qualimap affected by log4j vulnerability

is BBMap/Qualimap affected by log4j vulnerability 2 no, unless the tools are used as a library in a web server. It’s worth noting picard.jar and abra.jar are affected (even though as Pierre L says, these are unlikely to be attacked on most systems). If you’re responsible for systems, esp web…

Continue Reading is BBMap/Qualimap affected by log4j vulnerability

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

[moiexpositoalonsolab/grenepipe] freebayes causes early error about number of threads

Hi Lucas, got a weird one for you. If I change the caller from hapotypecaller to freebayes, I get the error below. It’s doubly strange because it seems to occur well before freebayes would be used in the pipeline. [Sat Dec 11 11:13:02 2021] rule samtools_stats: input: dedup/111D03-1.bam output: qc/samtools-stats/111D03-1.txt…

Continue Reading [moiexpositoalonsolab/grenepipe] freebayes causes early error about number of threads

Trouble running vcf2bam jvarkit tool

Trouble running vcf2bam jvarkit tool 2 I am trying to use the tool called vcf2bam from jvarkit on a server and I have the following 2 files: GRCh38_latest_genomic.fna – the file is of format FASTQ , and 00-common_all.vcf. I used samtools faidx and also picard CreateSequenceDictionary, but when I try…

Continue Reading Trouble running vcf2bam jvarkit tool

converting Bam to fastq while removing clipping(hard/soft clip bases)

converting Bam to fastq while removing clipping(hard/soft clip bases) 0 Hello, I want to do some analysis and my raw data is paired-end reads fastq files. So far: I used BWA mem to convert them to Sam file then used samtools to convert to BAM file. My next step is…

Continue Reading converting Bam to fastq while removing clipping(hard/soft clip bases)

Liftover nonmodel VCF

Liftover nonmodel VCF 1 Hi all, I have a FASTA genome assembly and a VCF for my (nonmodel) study species. Now I want to liftover the VCF to the Zebra Finch genome (www.ncbi.nlm.nih.gov/assembly/GCF_003957565.1). I’ve found Picard LiftOver GATK and CrossMap, but both require a UCSC chain file, which apparently can…

Continue Reading Liftover nonmodel VCF

Troubleshooting Tips – bcl2fastq creates duplicate reads

Forum:Troubleshooting Tips – bcl2fastq creates duplicate reads 1 Hi, I have seen a few times where bcl2fastq (v2.20) will produce duplicate FASTQ entries in sequencing read IDs, raw sequences, & quality scores. This causes issues with downstreams tools like Picard MarkDuplicates (e.g. Exception in thread “main” htsjdk.samtools.SAMException: Value was put…

Continue Reading Troubleshooting Tips – bcl2fastq creates duplicate reads

Error in merged bam files

Error in merged bam files 0 Hello I am trying to merge unmapped and mapped bam files. I merged the bam files using the picard tool (gatk.broadinstitute.org/hc/en-us/articles/360036883871-MergeBamAlignment-Picard). I checked the merged bam using ValidateSamFile command (gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard-) and it showed the below errors: Error Type Count ERROR:MATES_ARE_SAME_END 5496 ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 5478 ERROR:MISMATCH_MATE_CIGAR_STRING…

Continue Reading Error in merged bam files

High frequency of an otherwise rare phenotype in a small and isolated tiger population

Significance Small and isolated populations have low genetic variation due to founding bottlenecks and genetic drift. Few empirical studies demonstrate visible phenotypic change associated with drift using genetic data in endangered species. We used genomic analyses of a captive tiger pedigree to identify the genetic basis for a rare trait,…

Continue Reading High frequency of an otherwise rare phenotype in a small and isolated tiger population

Picard CalculateHsMetrics perTargetCoverage for Novaseq bams

Picard CalculateHsMetrics perTargetCoverage for Novaseq bams 0 Hello, I would like to use Picard’s CalculateHsMetrics to calculate per target coverage for Novaseq bam files. It seems that the tool is not able to calculate mean/normalized coverage for Novaseq bams but works well with Hiseq bams. Novaseq bams report quality scores…

Continue Reading Picard CalculateHsMetrics perTargetCoverage for Novaseq bams

allele balance gatk

allele balance gatk 0 Hi, I am trying to calculate allele balance for both the heterozygous (.40 to .60 ) and homozygous base through vcf file. plz let me know how to achieve it through gatk. I tried using FilterVCF(picard) command as follows –I inputFile.vcf –MIN_AB -O outputfile. I would…

Continue Reading allele balance gatk

Paired-end reads reported without mates: how to play matchmaker?

Hi Everyone, I am currently looking at Acute Myeloid Leukemia (AML) paired-end WGS samples from the TARGET data ocg.cancer.gov/programs/target/target-methods#3241. A bioinformatician in our group remapped the samples from hg19 to hg38. Unfortunately, we do not have any copies of the hg19 version anymore. However, when I try to run anything…

Continue Reading Paired-end reads reported without mates: how to play matchmaker?

Fastqc user manual – vodosp.ru

FASTQ format – Wikipedia 06 September 2021 – by TC Collin · 2020 · Cited by 3 — Be accompanied by a step-by-step user-friendly manual, If the user performs FastQC prior to the removal of adapters (step 3), the length Both programs can be used on Linux/MacOS X machines and quite…

Continue Reading Fastqc user manual – vodosp.ru

Twist Bioscience hiring Bioinformatics Scientist, Production Bioinformatics in South San Francisco, California, United States

Twist is looking for a Bioinformatics Scientist to join our Production Bioinformatics Team. You will work alongside research scientists, software engineers and data scientists to further deliver on our mission to expand access to best-in-class synthetic biology and next-generation sequencing applications. You will be developing and engineering tools to better…

Continue Reading Twist Bioscience hiring Bioinformatics Scientist, Production Bioinformatics in South San Francisco, California, United States

Snakemake-Aligment using BWA-MEM2

Hello I have started using snakemake 6.5.2 to align fastq files with reference file. I have pasted the error below in this question. How to allocate memory in the snakefile and read the header from samfile, ‘-‘. This is the snakefile (wrapper for running alignment): rule bwa_mem2_mem: input: reads=[“/scicore/home/cichon/GROUP/test_workflow/samples/{sample}.1.fq”, “/scicore/home/cichon/GROUP/test_workflow/samples/{sample}.2.fq”]…

Continue Reading Snakemake-Aligment using BWA-MEM2

Mapping reads and quantifying genes

Mapping reads and quantifying genes – Metagenomic workshop 0 Hello, I am using the following metagenomic workshop tutorial to analyse my own metagenomic data. metagenomics-workshop.readthedocs.io/en/latest/annotation/quantification.html I performed the following steps: mapped reads with bowtie2 and generated .bam file with samtools sort. Removed duplicates with picard Extracted gene information from prokka…

Continue Reading Mapping reads and quantifying genes

Can I sort my bam files with Picard MergeSamFiles?

Can I sort my bam files with Picard MergeSamFiles? 0 Hi! I noticed this in the picard MergeSamFiles help: –SORT_ORDER,-SO:SortOrder Sort order of output file Default value: coordinate. Possible values: {unsorted, queryname, coordinate, duplicate, unknown} Does this mean that it is unnecessary to use picard SortSam before? can MergeSamFiles do…

Continue Reading Can I sort my bam files with Picard MergeSamFiles?

Missing read group in BAM files

Missing read group in BAM files 1 Hello everyone, I have processed PE reads through the pipeline HybPiper to align them to a reference genome with GATK. But inspecting the output BAM files with the GATK tool ValidateSamFile, I found out a very common error in the error report: WARNING::RECORD_MISSING_READ_GROUP…

Continue Reading Missing read group in BAM files

Looking for a tool which provides mapping quality score distributions from BAM files

Looking for a tool which provides mapping quality score distributions from BAM files 0 Hello BioStars, Is there a tool which generates mapping quality score distributions from bam files? I know I could potentially do this myself, but I am looking for something which would essentially do the work for…

Continue Reading Looking for a tool which provides mapping quality score distributions from BAM files

So many variants detected.

So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…

Continue Reading So many variants detected.

CROP-seq data analysis

CROP-seq data analysis 1 Hi, I am a new bie to single cell sequencing analysis. I have to analyze CROP-seq data, I am going through the following paper, www.nature.com/articles/nmeth.4177. I have to use cell ranger ( instead of DROP-seq software) as the first step to process single cell data.I wanted…

Continue Reading CROP-seq data analysis