Tag: picard
PCR duplicates in FFPE RNASeq
PCR duplicates in FFPE RNASeq 0 Dear all, I am working on 100 RNASeq data generated with a stranded protocol and a Novaseq run. I need to perform variant calling on these samples, however I am facing some problem. I have not access to DNA so exome/targeted amplification is not…
Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?
Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname? 0 When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other, when sorty by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield,…
SMARCE1 deficiency generates a targetable mSWI/SNF dependency in clear cell meningioma
Clapier, C. R., Iwasa, J., Cairns, B. R. & Peterson, C. L. Mechanisms of action and regulation of ATP-dependent chromatin-remodelling complexes. Nat. Rev. Mol. Cell Biol. 18, 407–422 (2017). CAS PubMed PubMed Central Article Google Scholar Mashtalir, N. et al. Modular organization and assembly of SWI/SNF family chromatin remodeling complexes….
Extract R1 and R2 from sam file generated by bowtie2
Extract R1 and R2 from sam file generated by bowtie2 1 Hi every one How to extract R1 and R2 from sam file generated by bowtie2 ? sam bowtie2 samtools bam • 137 views • link updated 14 hours ago by iraun ★ 4.4k • written 15 hours ago by…
linux merge multiple files in picard
Why not use samtools? for folder in my_bam_folders/*; do samtools merge $folder.bam $folder/*.bam done In general, samtools merge can merge all the bam files in a given directory like this: samtools merge merged.bam *.bam EDIT: If samtools isn’t an option and you have to use Picard, what about something like…
a strange pattern of repetitive summits
Problem with the output of Deeptools PlotProfile: a strange pattern of repetitive summits 0 Hi! I am trying to plot DNA binding profiles of my ChIP-seq bw files using Deeptools plotProfile. I generated the matrix using the computeMatrix reference-point. I used some publicly available bed files as my regions of…
Detailed differences between sambamba and samtools
3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…
A genome-scale screen for synthetic drivers of T cell proliferation
Abramson, J. S. et al. Transcend NHL 001: immunotherapy with the CD19-directed CAR T-cell product JCAR017 results in high complete response rates in relapsed or refractory B-cell non-Hodgkin lymphoma. Blood 128, 4192–4192 (2016). Google Scholar Shifrut, E. et al. Genome-wide CRISPR screens in primary human T cells reveal key regulators…
Low transcript quantification with Salmon using GRCm39 annotations
Hi everyone, first time working with mouse samples and unfortunately, there are fewer resources available for the latest mouse Ensembl genome than I was expecting. What I’ve done: I performed rRNA depletion on total RNA extracted from mouse tissue and created Illumina libraries using a cDNA synthesis kit with random…
Population genomics of Escherichia coli in livestock-keeping households across a rapidly developing urban landscape
Karesh, W. B. et al. Ecology of zoonoses: natural and unnatural histories. Lancet 380, 1936–1945 (2012). PubMed PubMed Central Google Scholar Wolfe, N. D., Dunavan, C. P. & Diamond, J. Origins of major human infectious diseases. Nature 447, 279–283 (2007). CAS PubMed PubMed Central Google Scholar Allen, T. et al….
HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd
Bioinformatics Scientist 2 Hours: 37.5 hours per week Salary: Competitive Ref No: HRJOB7442 Business Unit: Diagnostic Services Location: Craigavon or Manchester Open To: Internal and External Applicants The Company Almac Diagnostic Services is a leading stratified medicine business, specialising in biomarker-driven clinical trials. We are incredibly proud to be involved…
Genomic analysis on Galaxy using Azure CycleCloud
Cloud computing and digital transformation have been powerful enablers for genomics. Genomics is expected to be an exabase-scale big data domain by 2025, posing data acquisition and storage challenges on par with other major generators of big data. Embracing digital transformation offers a practically limitless ability to meet the genomic…
python – Packages Not Found Error: Not available from current channel- Bioconda
Using a Mac with M1 chip, I’m trying to install the following Bioconda packages: cutadapttrim-galoresamtoolsbedtools.htseq.bowtie2.deeptools.macs2 I’ve been able to install picard and fastqc with no issues, but all others turn out one of two error messages: PackagesNotFoundError: The following packages are not available from current channels: or Found conflicts! Looking…
java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread
I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…
Efficiently merge two BAM files while retaining reads from only one file in overlapping regions
Efficiently merge two BAM files while retaining reads from only one file in overlapping regions 1 I have a WGS BAM file that is fairly large (>150GB) and a smaller BAM file (<5GB) with reads in a small 10Mbp region. I want to (efficiently) merge the two BAM files while…
sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds
[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…
is BBMap/Qualimap affected by log4j vulnerability
is BBMap/Qualimap affected by log4j vulnerability 2 no, unless the tools are used as a library in a web server. It’s worth noting picard.jar and abra.jar are affected (even though as Pierre L says, these are unlikely to be attacked on most systems). If you’re responsible for systems, esp web…
Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample
I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…
[moiexpositoalonsolab/grenepipe] freebayes causes early error about number of threads
Hi Lucas, got a weird one for you. If I change the caller from hapotypecaller to freebayes, I get the error below. It’s doubly strange because it seems to occur well before freebayes would be used in the pipeline. [Sat Dec 11 11:13:02 2021] rule samtools_stats: input: dedup/111D03-1.bam output: qc/samtools-stats/111D03-1.txt…
Strange speed up in GATK LeftAlignIndels
Strange speed up in GATK LeftAlignIndels 1 Hi! I noticed a strange thing, I have been running a DNA-seq pipeline like this: reads -> bwa-mem2 -> picard SortSam -> picard MergeSamFiles -> picard MarkDuplicates -> gatk LeftAlignIndels … gatk LeftAlignIndels has always taken around 4 hours to complete with the…
Trouble running vcf2bam jvarkit tool
Trouble running vcf2bam jvarkit tool 2 I am trying to use the tool called vcf2bam from jvarkit on a server and I have the following 2 files: GRCh38_latest_genomic.fna – the file is of format FASTQ , and 00-common_all.vcf. I used samtools faidx and also picard CreateSequenceDictionary, but when I try…
converting Bam to fastq while removing clipping(hard/soft clip bases)
converting Bam to fastq while removing clipping(hard/soft clip bases) 0 Hello, I want to do some analysis and my raw data is paired-end reads fastq files. So far: I used BWA mem to convert them to Sam file then used samtools to convert to BAM file. My next step is…
Liftover nonmodel VCF
Liftover nonmodel VCF 1 Hi all, I have a FASTA genome assembly and a VCF for my (nonmodel) study species. Now I want to liftover the VCF to the Zebra Finch genome (www.ncbi.nlm.nih.gov/assembly/GCF_003957565.1). I’ve found Picard LiftOver GATK and CrossMap, but both require a UCSC chain file, which apparently can…
Troubleshooting Tips – bcl2fastq creates duplicate reads
Forum:Troubleshooting Tips – bcl2fastq creates duplicate reads 1 Hi, I have seen a few times where bcl2fastq (v2.20) will produce duplicate FASTQ entries in sequencing read IDs, raw sequences, & quality scores. This causes issues with downstreams tools like Picard MarkDuplicates (e.g. Exception in thread “main” htsjdk.samtools.SAMException: Value was put…
Error in merged bam files
Error in merged bam files 0 Hello I am trying to merge unmapped and mapped bam files. I merged the bam files using the picard tool (gatk.broadinstitute.org/hc/en-us/articles/360036883871-MergeBamAlignment-Picard). I checked the merged bam using ValidateSamFile command (gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard-) and it showed the below errors: Error Type Count ERROR:MATES_ARE_SAME_END 5496 ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 5478 ERROR:MISMATCH_MATE_CIGAR_STRING…
High frequency of an otherwise rare phenotype in a small and isolated tiger population
Significance Small and isolated populations have low genetic variation due to founding bottlenecks and genetic drift. Few empirical studies demonstrate visible phenotypic change associated with drift using genetic data in endangered species. We used genomic analyses of a captive tiger pedigree to identify the genetic basis for a rare trait,…
Picard CalculateHsMetrics perTargetCoverage for Novaseq bams
Picard CalculateHsMetrics perTargetCoverage for Novaseq bams 0 Hello, I would like to use Picard’s CalculateHsMetrics to calculate per target coverage for Novaseq bam files. It seems that the tool is not able to calculate mean/normalized coverage for Novaseq bams but works well with Hiseq bams. Novaseq bams report quality scores…
allele balance gatk
allele balance gatk 0 Hi, I am trying to calculate allele balance for both the heterozygous (.40 to .60 ) and homozygous base through vcf file. plz let me know how to achieve it through gatk. I tried using FilterVCF(picard) command as follows –I inputFile.vcf –MIN_AB -O outputfile. I would…
Paired-end reads reported without mates: how to play matchmaker?
Hi Everyone, I am currently looking at Acute Myeloid Leukemia (AML) paired-end WGS samples from the TARGET data ocg.cancer.gov/programs/target/target-methods#3241. A bioinformatician in our group remapped the samples from hg19 to hg38. Unfortunately, we do not have any copies of the hg19 version anymore. However, when I try to run anything…
Fastqc user manual – vodosp.ru
FASTQ format – Wikipedia 06 September 2021 – by TC Collin · 2020 · Cited by 3 — Be accompanied by a step-by-step user-friendly manual, If the user performs FastQC prior to the removal of adapters (step 3), the length Both programs can be used on Linux/MacOS X machines and quite…
Twist Bioscience hiring Bioinformatics Scientist, Production Bioinformatics in South San Francisco, California, United States
Twist is looking for a Bioinformatics Scientist to join our Production Bioinformatics Team. You will work alongside research scientists, software engineers and data scientists to further deliver on our mission to expand access to best-in-class synthetic biology and next-generation sequencing applications. You will be developing and engineering tools to better…
Snakemake-Aligment using BWA-MEM2
Hello I have started using snakemake 6.5.2 to align fastq files with reference file. I have pasted the error below in this question. How to allocate memory in the snakefile and read the header from samfile, ‘-‘. This is the snakefile (wrapper for running alignment): rule bwa_mem2_mem: input: reads=[“/scicore/home/cichon/GROUP/test_workflow/samples/{sample}.1.fq”, “/scicore/home/cichon/GROUP/test_workflow/samples/{sample}.2.fq”]…
Mapping reads and quantifying genes
Mapping reads and quantifying genes – Metagenomic workshop 0 Hello, I am using the following metagenomic workshop tutorial to analyse my own metagenomic data. metagenomics-workshop.readthedocs.io/en/latest/annotation/quantification.html I performed the following steps: mapped reads with bowtie2 and generated .bam file with samtools sort. Removed duplicates with picard Extracted gene information from prokka…
Can I sort my bam files with Picard MergeSamFiles?
Can I sort my bam files with Picard MergeSamFiles? 0 Hi! I noticed this in the picard MergeSamFiles help: –SORT_ORDER,-SO:SortOrder Sort order of output file Default value: coordinate. Possible values: {unsorted, queryname, coordinate, duplicate, unknown} Does this mean that it is unnecessary to use picard SortSam before? can MergeSamFiles do…
Missing read group in BAM files
Missing read group in BAM files 1 Hello everyone, I have processed PE reads through the pipeline HybPiper to align them to a reference genome with GATK. But inspecting the output BAM files with the GATK tool ValidateSamFile, I found out a very common error in the error report: WARNING::RECORD_MISSING_READ_GROUP…
Looking for a tool which provides mapping quality score distributions from BAM files
Looking for a tool which provides mapping quality score distributions from BAM files 0 Hello BioStars, Is there a tool which generates mapping quality score distributions from bam files? I know I could potentially do this myself, but I am looking for something which would essentially do the work for…
So many variants detected.
So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…
CROP-seq data analysis
CROP-seq data analysis 1 Hi, I am a new bie to single cell sequencing analysis. I have to analyze CROP-seq data, I am going through the following paper, www.nature.com/articles/nmeth.4177. I have to use cell ranger ( instead of DROP-seq software) as the first step to process single cell data.I wanted…