Tag: BAM

UMI workflow resulting in bams with empty reads

Hello all, In my NGS workflow for UMI based reads, I first tried identifying and removing sequence adapters using bbmerge and cutcadapt: BBMERGE -Xmx1g -ignorejunk in1=SAMPLE_R1 in2=SAMPLE_R2 outa= adapters.fa itn CUTADAPT -a forward_adapter -A reverse_adapter -o s_2_1_sequence_trimmed_UN.fastq.gz -p s_2_2_sequence_trimmed_UN.fastq.gz SAMPLE_R1 SAMPLE_R2 Then, I converted the trimmed fastq files to an…

Continue Reading UMI workflow resulting in bams with empty reads

A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

Introduction Short-read metagenomic sequencing is the technique most widely used to explore the natural habitat of millions of bacteria. In comparison with 16S rRNA sequencing, shotgun metagenomic sequencing (MGS) provides sequence information of the whole genomes, which can be used to identify different genes present in an individual bacterium and…

Continue Reading A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

Bioconductor – genomation

DOI: 10.18129/B9.bioc.genomation     Summary, annotation and visualization of genomic data Bioconductor version: Release (3.6) A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome…

Continue Reading Bioconductor – genomation

Ubuntu Manpage: samtools-quickcheck – a rapid sanity check on input files

Provided by: samtools_1.19-1_amd64 NAME samtools-quickcheck – a rapid sanity check on input files SYNOPSIS samtools quickcheck [options] in.sam|in.bam|in.cram [ … ] DESCRIPTION Quickly check that input files appear to be intact. Checks that beginning of the file contains a valid header (all formats) containing at least one target sequence and…

Continue Reading Ubuntu Manpage: samtools-quickcheck – a rapid sanity check on input files

Remote Software Quality Engineer III – Bioinformatics Job at Natera

JOB TITLE: Software Quality Engineer III – Bioinformatics LOCATION: Remote, USA PRIMARY RESPONSIBILITIES: Perform software verification, define and execute test cases and scenarios required for software quality assurance and regulatory compliance. Perform system analysis, assess risk, and develop strong test strategies by analyzing product design and technical specifications, and by…

Continue Reading Remote Software Quality Engineer III – Bioinformatics Job at Natera

Ubuntu Manpage: FastQC – high throughput sequence QC analysis tool

Provided by: fastqc_0.11.9+dfsg-5_all NAME FastQC – high throughput sequence QC analysis tool SYNOPSIS fastqc seqfile1 seqfile2 .. seqfileN fastqc [-o output dir] [–(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN DESCRIPTION FastQC reads a set of sequence files and produces from each one a quality control report consisting of…

Continue Reading Ubuntu Manpage: FastQC – high throughput sequence QC analysis tool

Insight Global hiring Bioinformatics Software Engineer in Tennessee, United States

Job Description: 1) Use Nextflow to build bioinformatics pipelines that take FASTQ or BAM files as input and process them using bioinformatic tools. 2) Write Python/R scripts to process, summarize, and visualize outputs created by other tools. 3) Ensure that the pipeline is modular and flexible, with the ability to…

Continue Reading Insight Global hiring Bioinformatics Software Engineer in Tennessee, United States

sam file error

sam file error 1 Hi, I was converting my sam file (after alignment with bowtie2) in bam format. I encountered the error: [E::sam_parse1] invalid QUAL character [W::sam_read1_sam] Parse error at line 11129453 command: samtools view -S -b -o input.bam ../alignment/input.sam Alignment works fine. This is the output: 22504890 reads; of…

Continue Reading sam file error

java -jar picard.jar manual | BioQueue Encyclopedia

Category Sam/Bam Manipulation Usage java -jar picard.jar SetNmMDAndUqTags I=sorted.bam O=fixed.bam \ Manual INPUT (File)    The BAM or SAM file to fix. Required. OUTPUT (File)    The fixed BAM or SAM output file. Required. IS_BISULFITE_SEQUENCE (Boolean)    Whether the file contains bisulfite sequence (used when calculating the NM tag). Default value: false. This option can be…

Continue Reading java -jar picard.jar manual | BioQueue Encyclopedia

Trying to understand STAR fastqLog.final.out File

Trying to understand STAR fastqLog.final.out File 0 Hello, I am analyzing ribo-seq data and am trying to understand if my interpretation of star’s log file is correct. I do not have extensive bioinformatics/computational experience, so it’s been a bit difficult trying to understand how to proceed (the guides online are…

Continue Reading Trying to understand STAR fastqLog.final.out File

My paired end data became single end data after mapping

My paired end data became single end data after mapping 1 Dear community, Something weird happened to me, my public dataset is obviously paired-end data (stated in ‘metadata’ part of ENA database, and there are two seperate fastq files (R1 & R2) and index file (I1) per sequencing run). After…

Continue Reading My paired end data became single end data after mapping

‘Resources’ object has no attribute ‘tmpdir’

Snakemake error AttributeError: ‘Resources’ object has no attribute ‘tmpdir’ 0 I have built a Snakemake pipeline which has been designed for paired-end reads. I have made a trial with single-end reads, and got this error. I am not sure it is related to the change of reads design, and to…

Continue Reading ‘Resources’ object has no attribute ‘tmpdir’

Error in schicexplorer’s hicbuildmatrix

Error in schicexplorer’s hicbuildmatrix 1 I use schicexplorer’s hicbuildmatrix code; it complains that two sam files do not have the same reads order hicbuildmatrix schic • 801 views My best guess at the moment is that you need R1 and R2 bam files sorted by read name, not the default…

Continue Reading Error in schicexplorer’s hicbuildmatrix

Genomic hypomethylation in cell-free DNA predicts responses to checkpoint blockade in lung and breast cancer

Lung cancer ICB cohort Advanced non-small cell lung carcinoma patients who were treated with anti-PD-1/PD-L1 monotherapy at Samsung Medical Center, Seoul, Republic of Korea were enrolled for this study. The present study has been reviewed and approved by the Institutional Review Board (IRB) of the Samsung Medical Center (IRB no….

Continue Reading Genomic hypomethylation in cell-free DNA predicts responses to checkpoint blockade in lung and breast cancer

bwa-mem reproducibility

bwa-mem reproducibility 1 I have a set of paired end fastq files, and I run bwa-mem (v0.7.17-r1188) on the files with the same exact parameters, including the same number of threads, in two different computing clusters. I compare the BAM file produced via samtools stats. and the outputs are different…

Continue Reading bwa-mem reproducibility

Enrichment profiles from counts

Enrichment profiles from counts 0 Hello everyone, I’m currently working with single-cell DamID datasets, focusing on studying protein-DNA interactions. In my dataset for each cell, I have aligned BAM files, counts files in HDF5 format, and count files binned at 100kb intervals. These count files contain the number of unique…

Continue Reading Enrichment profiles from counts

DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States

UPMC Presbyterian is hiring a full-time Bioinformatics Scientist to support the Molecular & Genomic Pathology Lab! This role will be scheduled for daylight shifts, Monday-Friday. The Molecular & Genomic Pathology Laboratory is a dynamic, state-of-the-art clinical laboratory that prides itself on delivering the highest quality of patient care through cutting-edge…

Continue Reading DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States

FeatureCounts Invalid Parameter Error

FeatureCounts Invalid Parameter Error 0 Hello! I’m trying to use featureCounts, and it keeps on giving me this error: ERROR: invalid parameter: ‘SRR11860547.bam’ I’m pretty new at using featureCounts, so I have no clue what is wrong. I’ve tried changing the directory and location of the file, but it keeps…

Continue Reading FeatureCounts Invalid Parameter Error

Variant calling using HaplotypeCaller does not show #FILTER information

Variant calling using HaplotypeCaller does not show #FILTER information 0 Hi All, I would like to ask for Variant Calling using HaplotypeCaller. It’s supposed that after doing the HaplotypeCaller, the #FILTER columns in gvcf files shall show the ‘PASS/LowQ’ however in my case, the output #FILTER only shows ‘.’ without…

Continue Reading Variant calling using HaplotypeCaller does not show #FILTER information

Conserved and divergent gene regulatory programs of the mammalian neocortex

Nucleus preparation from frozen brain tissue for Chromium single-cell multiome ATAC and gene expression analysis M1 tissue was obtained from three human donors (male, aged 42, 29 and 58 years), three macaque donors (male, aged 6 (Macaca mulatta), 6 (M. mulatta) and 14 (Macaca fascicularis) years), three marmoset (Callithrix jacchus)…

Continue Reading Conserved and divergent gene regulatory programs of the mammalian neocortex

haplotypecaller – NVIDIA Docs

Run a GPU-accelerated haplotypecaller. This tool applies an accelerated GATK CollectMultipleMetrics for assessing the metrics of a BAM file, such as including alignment success, quality score distributions, GC bias, and sequencing artifacts. This functions as a ‘meta-metrics’ tool, and can run any combination of the available metrics tools in GATK…

Continue Reading haplotypecaller – NVIDIA Docs

convert VCF to gVCF

Your question is not completely clear, but since the most sensible ways to understand it have the same answer, I’m gonna go with that. I have the exact reference fasta used for generating the VCFs TLDR: You don’t have enough information to do this with just VCFs and reference fasta….

Continue Reading convert VCF to gVCF

Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain

Mouse brain tissues All experimental procedures using live animals were approved by the Salk Institute Animal Care and Use Committee under protocol number 18-00006. Adult (P56) C57BL/6J male mice were purchased from the Jackson Laboratory at 7 weeks of age and maintained in the Salk animal barrier facility on 12-h dark–light…

Continue Reading Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain

deeptools.plotCoverage Example

API (Occurances) deeptools.writeBedGraph_bam_and_bw.writeBedGraph(1) deeptools.writeBedGraph.openBam(1) deeptools.writeBedGraph.bedGraphToBigWig(1) deeptools.writeBedGraph.WriteBedGraph(4) deeptools.utilities.toBytes(2) deeptools.utilities.tbitToBamChrName(2) deeptools.utilities.mungeChromosome(2) deeptools.utilities.gtfOptions(1) deeptools.utilities.getTempFileName(8) deeptools.utilities.getGC_content(3) deeptools.utilities.getCommonChrNames(4) deeptools.utilities.bam_total_reads(2) deeptools.plotProfile.main(5) deeptools.plotHeatmap.main(6) deeptools.plotCoverage.main(1) deeptools.parserCommon.heatmapperMatrixArgs(2) deeptools.parserCommon.getParentArgParse(10) deeptools.parserCommon.deepBlueOptionalArgs(1) deeptools.parserCommon.check_float_0_1(1) deeptools.multiBigwigSummary.main(4) deeptools.multiBamSummary.main(2) deeptools.mapReduce.mapReduce(8) deeptools.mapReduce.getUserRegion(3) deeptools.mapReduce.blSubtract(1) deeptools.heatmapper_utilities.plot_single(2) deeptools.heatmapper_utilities.getProfileTicks(2) deeptools.heatmapper.heatmapper(6) deeptools.getScorePerBigWigBin.getScorePerBin(1) deeptools.getScaleFactor.get_scale_factor(1) deeptools.getScaleFactor.get_num_kept_reads(2) deeptools.getFragmentAndReadSize.get_read_and_fragment_length(7) deeptools.deepBlue.isDeepBlue(3) deeptools.countReadsPerBin.CountReadsPerBin(6) deeptools.correlation.Correlation(2) deeptools.computeMatrixOperations.sortMatrix(1) deeptools.computeMatrixOperations.main(5) deeptools.computeMatrix.main(10) deeptools.bigwigCompare.main(2) deeptools.bamHandler.openBam(22) deeptools.bamCoverage.process_args(1) deeptools.bamCoverage.main(9) deeptools.bamCompare.main(7) deeptools.SES_scaleFactor.estimateScaleFactor(2) deeptools.countReadsPerBin.cr.is_proper_pair(1) deeptools.config.config.get(2) deeptools.cfg.config.get(3) Read more…

Continue Reading deeptools.plotCoverage Example

Merge overlapping paired end reads from BAM file.

Merge overlapping paired end reads from BAM file. 0 Hi everyone, Using Trimmomatic and then HISAT2, I have aligned 300 RNA fastq samples (NovaSeq6000, RNA sequencing, paired-end, 150bp sequencing). I have found a percentage of overlapping paired end reads (read through) in the 300 .bam files. I found the overlaps…

Continue Reading Merge overlapping paired end reads from BAM file.

Panel-based RNA fusion sequencing improves diagnostics of pediatric acute myeloid leukemia

Rasche M, Zimmermann M, Borschel L, Bourquin J, Dworzak M, Klingebiel T, et al. Successes and challenges in the treatment of pediatric acute myeloid leukemia: a retrospective analysis of the AML-BFM trials from 1987 to 2012. Leukemia. 2018;32:2167–77. Article  PubMed  PubMed Central  Google Scholar  Manola KN. Cytogenetics of pediatric acute…

Continue Reading Panel-based RNA fusion sequencing improves diagnostics of pediatric acute myeloid leukemia

Variant missing in WGS sample

Variant missing in WGS sample 1 Hi, I have processed a WGS sample including alignment (bwa-mem2), variant calling (GATK HaplotypeCaller) and annotation (ANNOVAR). In the annotated file, a variant fitting the phenotype was identified. However, on visualizing the bam in IGV, this variant was not there. What could be the…

Continue Reading Variant missing in WGS sample

overlapping duplicate dispersed_repeat feature in stringtie

GFF Error: overlapping duplicate dispersed_repeat feature in stringtie 0 Hi. I got following error when I use stringtie. with repeatmasker annotation gff file and RNA-seq bam files which is already sorted with samtools. GFF Error: overlapping duplicate dispersed_repeat feature (ID=461) GFF Error: overlapping duplicate dispersed_repeat feature (ID=712) GFF Error: overlapping…

Continue Reading overlapping duplicate dispersed_repeat feature in stringtie

Thyroid hormone-regulated chromatin landscape and transcriptional sensitivity of the pituitary gland

Mouse genetic models The ThrbHAB allele expresses TRβ proteins (TRβ1 and TRβ2) fused to a peptide with a hemagglutinin (HAx2) tag and a site for biotinylation by prokaryotic BirA ligase, modified from a published tag30. The tag was inserted at the endogenous Thrb gene by homologous recombination in W9.5 (129/Sv)…

Continue Reading Thyroid hormone-regulated chromatin landscape and transcriptional sensitivity of the pituitary gland

Snakemake rule error

Snakemake rule error 0 I have the following rule in snakemake: rule low_coverage_contig_reads: input: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam.bai”, output: r1=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R1.fq.gz”, r2=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R2.fq.gz” threads: 8 params: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam” log: log1=”logs/{sample}_{fraction}_low_coverage_reads.log”, shell: “”” (samtools coverage {params.bam} | awk ‘NR > 1 && $7 < 10 {{print $1}}’ | tr ‘\\n’ ‘ ‘ | samtools view -u {params.bam}…

Continue Reading Snakemake rule error

Help with gatk BaseRecalibrator

Help with gatk BaseRecalibrator 1 Hi Biostars, I try to do variant calling and got error at this step. Would you please have a suggestion? Thank you so much. gatk BaseRecalibrator -I ${aligned_reads}/SRR062634_sorted_dedup_reads.bam -R ${ref} –known-sites ${known_sites} -O ${data}/recal_data.table Invalid argument ‘/recal_data.table GATK variant-calling • 124 views • link updated…

Continue Reading Help with gatk BaseRecalibrator

The Biostar Herald for Monday, December 11, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, cmdcolin, and was edited by…

Continue Reading The Biostar Herald for Monday, December 11, 2023

Calculate Jukes-Cantor, Kimura, Tamura-Nei etc distances from BAM files

Calculate Jukes-Cantor, Kimura, Tamura-Nei etc distances from BAM files 0 Hi, Does anyone know if there’s an existing tool to calculate genetic distances between query and subject sequences in BAMs/SAMs? I’m reasonably sure it’s possible to identify transitions and transversions using data from the MD flag and the sequence, and…

Continue Reading Calculate Jukes-Cantor, Kimura, Tamura-Nei etc distances from BAM files

PacBio subreads.fastq files?

PacBio subreads.fastq files? 0 I have downloaded PacBio isoseq data as subreads.fastq format from NCBI. Most of the isoseq analysis tools require input as Pacbio .bam file, which is unavailable form NCBI. I want to perform differential gene expression analysis and alternative splicing analysis. I have confusion regarding the nature…

Continue Reading PacBio subreads.fastq files?

Insert Size For Illumina Gaiix Paired-End Library From Sam/Bam File

Insert Size For Illumina Gaiix Paired-End Library From Sam/Bam File 2 From the fastq data (read 1 and read 2) from illumina GAIIx platform ( paired-end library), I created the Sam and bam file using BWA. I got the statistics of number of uniquely-paired reads and total reads mapped to…

Continue Reading Insert Size For Illumina Gaiix Paired-End Library From Sam/Bam File

r – Fst calculation from VCF files

I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…

Continue Reading r – Fst calculation from VCF files

How to create interval list from reference fasta or dict file?

How to create interval list from reference fasta or dict file? 3 I am using GATK pipeline on WGS data. My BAM files is aligned to GRCh38 from GENCODE. So I want to create interval file for this GRCh38 instead of download from GATKbundle, because some of their contigs have…

Continue Reading How to create interval list from reference fasta or dict file?

Generating high-quality plant and fish reference genomes from field-collected specimens by optimizing preservation

Sample collection A total of nine species of marine fish were collected across three different sampling days (September 7th, 9th, and 12th 2022) under IACUC Animal Use Protocol S12219 (Supplementary Data 1). Six species were collected using a speargun donated by a local fisher. Fish were transported back to shore, euthanized,…

Continue Reading Generating high-quality plant and fish reference genomes from field-collected specimens by optimizing preservation

Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder

Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder 0 Hi all, I am running a workflow to identify single copy orthogroups from RNAseq data including 9 species in a family of non-model organisms. All 9 species are closely related enough that they can be aligned to…

Continue Reading Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder

CIGAR and query sequence lengths differ

I am developing a program that softclips reads. When I run samtools view the_new_bam_created_with_my_softclipped_read.bam I get this error message MN01972:51:000H5KYKL:1:11101:10749:1220 0 chr2 208248363 60 24S77M25S * 0 0 CAAAATCACATTATTGCCAACATGACTTACTTGATCCCCATAAGCATGACGACCTATGATGATAGGTTTTACCCATCCACTCACAAGCCGGGGGATATTTTTGCAGATAATGGCTTCTCTGAAGAC AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF[E::bam_read1] CIGAR and query sequence lengths differ for MN01972:51:000H5KYKL:1:11101:10753:13456 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFF MD:Z:126 RG:Z:230824_MN01972_0051_A000H5KYKL_RACP1_4poolv7anddirect_A NM:i:0 UQ:i:0 AS:i:126 MN01972:51:000H5KYKL:1:11101:10753:13456 0 chr2 208248339 60 111M2D13M…

Continue Reading CIGAR and query sequence lengths differ

Comparison of DNA sequencing services

This page lists the different DNA sequencing services. 2 main types can be distinguished: Whole exome sequencing is the middle ground between these two types, where a large amount of genes are sequenced, but only those that produce meaningful differences important for practical purposes, which is only 1% of the…

Continue Reading Comparison of DNA sequencing services

sam – Discrepancy in Read Counts Between FastQ and BAM Files in Adapter-Trimmed Pipeline

In a FastQ to BAM pipeline where only adapter trimming is performed, I’ve noticed a potential discrepancy in read counts between the initial FastQ files and their resulting BAM file. Specifically, I’m seeking clarification on whether the following statement holds true: “Total number of reads in R1 and R2 FastQ…

Continue Reading sam – Discrepancy in Read Counts Between FastQ and BAM Files in Adapter-Trimmed Pipeline

Read count vs Depth

Hi! I have been RNA seq short read sequencing data for a 112 dengue samples. I need to know by what percentage transcriptome is covered by our sequencing reads? I found Bedtools as an appropriate tool for this. however, i am unable to understand two different outputs from this tool…..

Continue Reading Read count vs Depth

Very low successfully assigned alignments with feature counts

Hello everyone, I am stuck trying to analyze some single-end RNAseq data from human tissue. My issue is that the alignment with HISAT 2 went very well: 94.95% overall alignment rate. However, when I use featureCounts, I get: 5.7% when I set the strandSpecific parameter to 1. 5.3% when I…

Continue Reading Very low successfully assigned alignments with feature counts

ASEReadCounter output wrong number of coverage

ASEReadCounter output wrong number of coverage 0 Hi, I am using ASEReadCounter to count the number of reads per variant in a BAM file. For some positions, it will report 1 read covered(1 refCount or 1 altCount) while there is no read covered at those positions after checking it in…

Continue Reading ASEReadCounter output wrong number of coverage

megablast taxonomy assign in blobtools

megablast taxonomy assign in blobtools 0 I made taxonomy assignment file using megablast and ran blobtools create, view, plot. However I couldn’t get any taxonmy assignment in the plot, there is only undefined. How can I get bacterial information ? $blastn -task megablast -db ${nrdb} -query scaffold$i.fa -outfmt ‘6 qseqid…

Continue Reading megablast taxonomy assign in blobtools

How To Install bedtools on Debian 11

In this tutorial we learn how to install bedtools on Debian 11. bedtools is suite of utilities for comparing genomic features Introduction In this tutorial we learn how to install bedtools on Debian 11. What is bedtools bedtools is: The BEDTools utilities allow one to address common genomics tasks such…

Continue Reading How To Install bedtools on Debian 11

Sorted bam files are empty after sorting them from bam

Sorted bam files are empty after sorting them from bam 0 Hi, I have been working with all my DNA analysis files in parallels but I got to a point where I had about 15 files get stuck on one step. Specifically, I notice something is wrong because the files…

Continue Reading Sorted bam files are empty after sorting them from bam

Downsampling ATAC-seq BAM files

Downsampling ATAC-seq BAM files 0 Hi all, I have two ATAC datasets sequenced on two sequencers with different read depths. I would like to downsample the one with higher read depth to match that of the other dataset as we are observing batch effects following integration. To my understanding, I…

Continue Reading Downsampling ATAC-seq BAM files

Fetching subsets with slow5curl and samtools

{“payload”:{“allShortcutsEnabled”:false,”fileTree”:{“docs”:{“items”:[{“name”:”data.md”,”path”:”docs/data.md”,”contentType”:”file”},{“name”:”mount.md”,”path”:”docs/mount.md”,”contentType”:”file”},{“name”:”slow5curl.md”,”path”:”docs/slow5curl.md”,”contentType”:”file”}],”totalCount”:3},””:{“items”:[{“name”:”docs”,”path”:”docs”,”contentType”:”directory”},{“name”:”README.md”,”path”:”README.md”,”contentType”:”file”}],”totalCount”:2}},”fileTreeProcessingTime”:21.958637,”foldersToFetch”:[],”reducedMotionEnabled”:null,”repo”:{“id”:641926755,”defaultBranch”:”main”,”name”:”gtgseq”,”ownerLogin”:”GenTechGp”,”currentUserCanPush”:false,”isFork”:false,”isEmpty”:false,”createdAt”:”2023-05-17T13:03:07.000Z”,”ownerAvatar”:”avatars.githubusercontent.com/u/133880336?v=4&#8243;,”public”:true,”private”:false,”isOrgOwned”:true},”symbolsExpanded”:false,”treeExpanded”:true,”refInfo”:{“name”:”main”,”listCacheKey”:”v0:1684328588.326433″,”canEdit”:false,”refType”:”branch”,”currentOid”:”4079e27791c34880ca1a3a9bba9e2b2fc2885bab”},”path”:”docs/slow5curl.md”,”currentUser”:null,”blob”:{“rawLines”:null,”stylingDirectives”:null,”csv”:null,”csvError”:null,”dependabotInfo”:{“showConfigurationBanner”:false,”configFilePath”:null,”networkDependabotPath”:”/GenTechGp/gtgseq/network/updates”,”dismissConfigurationNoticePath”:”/settings/dismiss-notice/dependabot_configuration_notice”,”configurationNoticeDismissed”:null,”repoAlertsPath”:”/GenTechGp/gtgseq/security/dependabot”,”repoSecurityAndAnalysisPath”:”/GenTechGp/gtgseq/settings/security_analysis”,”repoOwnerIsOrg”:true,”currentUserCanAdminRepo”:false},”displayName”:”slow5curl.md”,”displayUrl”:”github.com/GenTechGp/gtgseq/blob/main/docs/slow5curl.md?raw=true&#8221;,”headerInfo”:{“blobSize”:”3.77 KB”,”deleteInfo”:{“deleteTooltip”:”You must be signed in to make or propose changes”},”editInfo”:{“editTooltip”:”You must be signed in to make or propose changes”},”ghDesktopPath”:”desktop.github.com&#8221;,”gitLfsPath”:null,”onBranch”:true,”shortPath”:”59fb302″,”siteNavLoginPath”:”/login?return_to=https%3A%2F%2Fgithub.com%2FGenTechGp%2Fgtgseq%2Fblob%2Fmain%2Fdocs%2Fslow5curl.md”,”isCSV”:false,”isRichtext”:true,”toc”:[{“level”:1,”text”:”Fetching subsets with slow5curl and samtools”,”anchor”:”fetching-subsets-with-slow5curl-and-samtools”,”htmlText”:”Fetching subsets with slow5curl and samtools”},{“level”:2,”text”:”Installing necessary tools”,”anchor”:”installing-necessary-tools”,”htmlText”:”Installing necessary tools”},{“level”:2,”text”:”Example: Fetching a subset of reads”,”anchor”:”example-fetching-a-subset-of-reads”,”htmlText”:”Example: Fetching a subset of reads”},{“level”:2,”text”:”Example: Fetching and basecalling a subset of…

Continue Reading Fetching subsets with slow5curl and samtools

The role of APOBEC3B in lung tumor evolution and targeted cancer therapy resistance

Cell line and growth assays Cell lines were grown in Roswell Park Memorial Institute-1640 medium (RPMI-1640) with 1% penicillin–streptomycin (10,000 U ml−1) and 10% FBS or in Iscove’s modified Dulbecco’s medium (IMDM) with 1% penicillin–streptomycin (10,000 U ml−1), l-glutamine (200 mM) and 10% FBS in a humidified incubator with 5% CO2 maintained at 37 °C. Drugs…

Continue Reading The role of APOBEC3B in lung tumor evolution and targeted cancer therapy resistance

Annotation GTF/GFF Arabidopsis thaliana

Annotation GTF/GFF Arabidopsis thaliana 0 Hello, this is my first time working with Arabidopsis and I am quantifying with featureCounts as follows: featureCounts -p –countReadPairs -t exon -g gene_id -a ../genome_arabidopsis/Arabidopsis_thaliana.TAIR10.57.gtf -o SRR14059988.txt ../alignment_hisat2/SRR14059988_sorted.bam However, in my counts I am having counts associated with long non conding, ribosomals, mitochondrial and…

Continue Reading Annotation GTF/GFF Arabidopsis thaliana

Filling gaps in BAM file

Filling gaps in BAM file 0 Hi! I have BAM files that contain fairly large numbers of gaps due to the fact they are aDNA data. The BAM files have EOFs and look like this (only a snippet shown below): 11:57001065-57004724 SN7001204_0523_AHJLV3BCXX_R_PEdi_L5727_37_1:1:1106:9922:91821c 16 11 57000970 37 113M * 0 0…

Continue Reading Filling gaps in BAM file

How To Separate Illumina Based Strand Specific Rna-Seq Alignments By Strand

Today we have run into the task of having to split strand specific RNA-Seq data by strand and we had to make an effort to get it right (hopefully it is right). Maybe there is even an easier way to do it. The Illumina strand specific protocol is such that…

Continue Reading How To Separate Illumina Based Strand Specific Rna-Seq Alignments By Strand

sequencing data from different samples in the Integrative Genome Viewer (IGV)

sequencing data from different samples in the Integrative Genome Viewer (IGV) 0 Greetings, I need to carry out an activity for my master’s degree in Biostatistics and Bioinformatics that consists of viewing sequencing data from different samples in the Integrative Genome Viewer (IGV) in order to analyze alignments and variants….

Continue Reading sequencing data from different samples in the Integrative Genome Viewer (IGV)

Bam files generated with STAR cause a segmentation fault core dump error when used with another tool

I am mapping RNA-Seq data using STAR, using multi-sample two-pass mapping. I first mapped all samples with one-pass then concatenated their SJOut files and filtered junctions. I launched the second mapping by using this SJOut file. I used this command to generate genome : ` /home/STAR-2.7.10b/bin/Linux_x86_64/STAR \ –runThreadN 10 \…

Continue Reading Bam files generated with STAR cause a segmentation fault core dump error when used with another tool

Are 10x cellranger-arc ATAC bam files deduplicated?

Are 10x cellranger-arc ATAC bam files deduplicated? 1 I am working with some atac files from cellranger-arc v.2.0. I was wondering whether the atac_possorted_bam.bam produced as the output was deduplicated? I believe the fragment files that are generated detect duplicate reads (as represented by reads with the fifth column >=…

Continue Reading Are 10x cellranger-arc ATAC bam files deduplicated?

Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m5C at GCU motifs

Introduction Oxford Nanopore Technologies (ONT) direct RNA sequencing (Fig 1A) enables detection of RNA modifications. A modified base produces an altered electrical current and/or dwell time relative to a canonical base that can be detected with algorithms (Garalde et al, 2018; Smith et al, 2019; Workman et al, 2019). Figure…

Continue Reading Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m5C at GCU motifs

MSL2 ensures biallelic gene expression in mammals

Materials Animals All of the mice were kept in the animal facility of the Max Planck Institute of Immunobiology and Epigenetics. The mice were maintained under specific-pathogen-free conditions, with 2 to 5 mice housed in individually ventilated cages (Techniplast). The cages were equipped with bedding material, nesting material, a paper…

Continue Reading MSL2 ensures biallelic gene expression in mammals

Issue softclipping reads when they belong and don’t belong to a common amplicon

I need to soft-clip the primers of my amplicon reads and I have the following problem In scenario 1 I have forward and reverse reads in the same coordinates and I only want to soft-clip the first bases of each read (as shown in the example below) becaouse the reads…

Continue Reading Issue softclipping reads when they belong and don’t belong to a common amplicon

Extracting only soft/hard clipped reads from a bam file

Extracting only soft/hard clipped reads from a bam file 4 Hello all! I am working on some data but need a little bit of help with a bit of an unusual task. We are looking at where lentiviral DNA has inserted itself in our host genome, and to do this…

Continue Reading Extracting only soft/hard clipped reads from a bam file

BBtools bug in reporting the number of substitutions in the console output, it seems to report insanely high rates of heterozygosity

Hello, I know Brian is sometimes around, but here is my command: while read p; do callvariants.sh in=${p}.recal.bam ploidy=2 vcf=${p}.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g ; done <ID java -ea -Xmx50g -Xms50g -cp /home/alessandro/software/bbmap/current/ var2.CallVariants in=ancestor.recal.bam ploidy=2 vcf=ancestor.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g Executing var2.CallVariants [in=ancestor.recal.bam, ploidy=2, vcf=ancestor.20score.vcf, useiden tity=f, overwrite=true, ref=Adineta_vaga.fsa,…

Continue Reading BBtools bug in reporting the number of substitutions in the console output, it seems to report insanely high rates of heterozygosity

Where is the index command?

Where is the index command? 1 I unpacked Samtools in Ubuntu using apt install make. The directory is listed below and includes folders and files. There is no index function. I have bam file and am trying to make bai file, not sure what to do next? index command •…

Continue Reading Where is the index command?

filtering SAM/BAM to remove hits spanning short combined alignment lengths and low counts

filtering SAM/BAM to remove hits spanning short combined alignment lengths and low counts 0 hi folks, apologies if this has been answered elsewhere. I’m using read mapping to quantitate the abundance of viral metagenome assembled genomes (MAGs) across samples and I’d like to do a bit of data cleaning that’s…

Continue Reading filtering SAM/BAM to remove hits spanning short combined alignment lengths and low counts

Extracting chimeric reads from mapping

Hello, I am struggling to processing and analyse bam files (from bwa alignment), to extracting the chimeric read alignment. I am aligning human cell line RNA-seq data (paired end) to virus, aimed to find the viral integration sites in the genome. For that, after reading a bit here from following…

Continue Reading Extracting chimeric reads from mapping

Longitudinal detection of circulating tumor DNA

Analysis of Roche KAPA Target Enrichment kit experimental data obtained on an Illumina sequencing system is most frequently performed using a variety of publicly available, open-source analysis tools. The typical variant calling analysis workflow consists of sequencing read quality assessment, read filtering, mapping against the reference genome, duplicate removal, coverage…

Continue Reading Longitudinal detection of circulating tumor DNA

ESRP1 controls biogenesis and function of a large abundant multiexon circRNA | Nucleic Acids Research

Abstract While the majority of circRNAs are formed from infrequent back-splicing of exons from protein coding genes, some can be produced at quite high level and in a regulated manner. We describe the regulation, biogenesis and function of circDOCK1(2–27), a large, abundant circular RNA that is highly regulated during epithelial-mesenchymal…

Continue Reading ESRP1 controls biogenesis and function of a large abundant multiexon circRNA | Nucleic Acids Research

Viral genes not showing up in combined mouse+virus alignment

Viral genes not showing up in combined mouse+virus alignment 1 I created a combined MHV-A59 and mm10 fasta and GTF file using the linux cat command. The last two entries of the mm10 and first two of the A59 of the combined GTF looks like this: I then made a…

Continue Reading Viral genes not showing up in combined mouse+virus alignment

How to get unaligned reads and aligned reads into separate files from SAM/BAM?

How to get unaligned reads and aligned reads into separate files from SAM/BAM? 0 I have long reads aligned with MiniMap2 in the form of SAM file. I want to get my unmapped reads into a file called unmapped.fastq.gz and my aligned reads into a file called mapped.fastq.gz. How can…

Continue Reading How to get unaligned reads and aligned reads into separate files from SAM/BAM?

Senior Bioinformatics Software Engineer – Land A Remote Job From Top Employers

The Center for Applied Bioinformatics (CAB) at the St. Jude Children’s Research Hospital (SJCRH) is seeking a creative Software Engineer with a strong background in bioinformatics to join our development team to create and maintain our vital analytical infrastructure. The new hire will work closely with a team of computer…

Continue Reading Senior Bioinformatics Software Engineer – Land A Remote Job From Top Employers

Generate Read counts from bam file

Generate Read counts from bam file 2 Currently i am working on a project related to LHON disease (rare mitochondrial disorder which leads to progressive visual loss). I have 9 RNA-seq fastq files out of which 3 are for carriers, 3 for affected and 3 for control. Data downloaded is…

Continue Reading Generate Read counts from bam file

bam or VCF files from GSE75010

bam or VCF files from GSE75010 1 Hi all I’m planning to run a variant calling analysis using Microarray data GSE75010 that contains GSE75010_RAW.tar and GSE75010_complete_dataset.csv.gz. I used to download the .fastq files using SRA Run numbers through Ubuntu/Linux to get .bam and VCF files. However, this is not the…

Continue Reading bam or VCF files from GSE75010

low rate of ‘Successfully assigned alignments’

Hello everybody. I’m a newbie in RNA-seq Analysis, and I have this situation that I don’t really understand. While working with featureCounts for RNA-seq read quantification, I came across an intriguing issue. The rate of successfully assigned alignments turned out to be unexpectedly low, totalling just 15463270 (7.6%). This was…

Continue Reading low rate of ‘Successfully assigned alignments’

I made an error when using metawrap to binning

I made an error when using metawrap to binning 1 my code metawrap binning -o bin_out -t 24 -m 200 -a all_contig/all_merge.fasta –metabat2 –maxbin2 –concoct all_fastq/*fastq Error reported as follows sorting the SRR10492802 alignment file [bam_sort_core] merging from 24 files and 24 in-memory blocks… [E::sam_hdr_sanitise] Malformed SAM header at line…

Continue Reading I made an error when using metawrap to binning

Visualize and explore eventalign data against reference

Visualize and explore eventalign data against reference 0 Hi all, Is anyone aware of a tool (GUI or python/R/bash package) to explore eventalign nanopore data (or fast5 raw data) with the corresponding alignment to a reference genome? Kind of like viewing how reads in a bam file align against a…

Continue Reading Visualize and explore eventalign data against reference

Running STAR on fastq file generated from a RNA-seq experiment

Running STAR on fastq file generated from a RNA-seq experiment 1 Hi, I am new to bioinformatics, especially on the command line. I am trying to run STAR alignment on pairs of fastq.gz files from several samples generated as part of an RNAseq experiment. My goal is to perform splice…

Continue Reading Running STAR on fastq file generated from a RNA-seq experiment

H101 for cervical cancer | DDDT

Introduction Patients with persistent, recurrent, or metastatic (P/R/M) cervical carcinoma respond poorly to treatment despite the best available therapeutic regimens, with a 5-year survival of 17%.1 Most of them are heavily pretreated with chemotherapy and/or radiotherapy, and many patients experience complications related to treatment or advanced disease, which exclude them…

Continue Reading H101 for cervical cancer | DDDT

HTseq reports missing attribute name

HTseq reports missing attribute name 1 Hello, I am running this htseq command htseq-count -r pos -t gene -i gene -s yes -f bam \ /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam \ /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3 > \ /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt However I get this error: Error processing GFF file (line 75 of file /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3): Feature gene:ENSTGUG00000013637 does not contain…

Continue Reading HTseq reports missing attribute name

Best practices for unstranded sequences in featureCounts

Hi everyone, I’m using featureCounts to analyze some RNA-Seq data, but I have several doubts in the use with unstranded library. First, when I analyze some SRA sequences or when I don’t know the library type, I use Salmon to know it with the next command: salmon quant -p 32…

Continue Reading Best practices for unstranded sequences in featureCounts

fragments.tsv.gz file in ATAC seq

fragments.tsv.gz file in ATAC seq 0 Hi all, I looked at some tutorials ATAC seq. They use fragments.tsv.gz at the beginning of the analysis. For my ATAC seq data, I have fastq, bam and bw files but not fragment file. So the fragments files will be created from fastq files,…

Continue Reading fragments.tsv.gz file in ATAC seq

Python Tools for Genomic Data Analysis: From Sequences to Structures | by Bao Tram Duong | Nov, 2023

Analyzing genomic data, from sequences to structures, is a critical aspect of bioinformatics. Python has a rich ecosystem of tools and libraries specifically designed for genomic data analysis. Here’s an overview of key tools and libraries for various stages of genomic data analysis: Description: Biopython is a comprehensive open-source collection…

Continue Reading Python Tools for Genomic Data Analysis: From Sequences to Structures | by Bao Tram Duong | Nov, 2023

subset a bam file

subset a bam file 1 I possess numerous sorted BAM files; however, for my project, I am required to randomly select a subset of reads (1e5) from them. I have explored the option of converting a pysam object to a list, but encountered issues with substantial memory usage and slow…

Continue Reading subset a bam file

issue in RNA -seq analysis

Forum:issue in RNA -seq analysis 0 hello all. i am working on RNA seq analysis. i would like to know following things: first i downloaded genome fasta file for non-coding rna from ensembl and got the gtf file for hg38 from there itself. performed hist2 and got 17% alignment for…

Continue Reading issue in RNA -seq analysis

Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA

Summary Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY,…

Continue Reading Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA

Chromatin priming elements direct tissue-specific gene activity before hematopoietic specification

Introduction The development of multicellular organisms requires the activation of different gene batteries which specify the identity of each individual cell type. Such shifts in cellular identity are driven by shifts in the gene regulatory network (GRN) consisting of transcription factors (TFs) binding to the enhancers and promoters of their…

Continue Reading Chromatin priming elements direct tissue-specific gene activity before hematopoietic specification

Use of IDR after running MACS3 for ATAC-seq data

Use of IDR after running MACS3 for ATAC-seq data 0 Cross-posted from github (a little worried about the inactivity there; please let me know if this is not good practice) I am analyzing ATAC-seq data from human cells. I am planning to perform the following as part of my pipeline:…

Continue Reading Use of IDR after running MACS3 for ATAC-seq data

TF Footprinting using HINT ATAC module from RGT

I was successful in running the first three commands of HINT ATAC as shown below following this tutorial github.com/sufyazi/sufyazi.github.io/wiki/TF-Footprinting-Tutorial-using-HINT-ATAC-module-from-RGT-toolbox rgt-hint footprinting –atac-seq –paired-end –organism=mm10 –output-location=/XXX –output-prefix=Afootprints A.mRp.clN.sorted.bam A.mRp.clN_peaks.narrowPeak rgt-hint footprinting –atac-seq –paired-end –organism=mm10 –output-location=/XXX –output-prefix=Afootprints B.mRp.clN.sorted.bam B.mRp.clN_peaks.narrowPeak rgt-motifanalysis matching –organism=mm10 –input-files Afootprints.bed Bfootprints.bed But when I ran the command below…

Continue Reading TF Footprinting using HINT ATAC module from RGT

Phenotypic drug-susceptibility profiles and genetic analysis based on whole-genome sequencing of Mycobacterium avium complex isolates in Thailand

Abstract Mycobacterium avium complex (MAC) infections are a significant clinical challenge. Determining drug-susceptibility profiles and the genetic basis of drug resistance is crucial for guiding effective treatment strategies. This study aimed to determine the drug-susceptibility profiles of MAC clinical isolates and to investigate the genetic basis conferring drug resistance using…

Continue Reading Phenotypic drug-susceptibility profiles and genetic analysis based on whole-genome sequencing of Mycobacterium avium complex isolates in Thailand

Find all locations of aligned reads with MQ (Mapping Quality) = 0.

Find all locations of aligned reads with MQ (Mapping Quality) = 0. 1 I have white reads in BAM file with MQ = 0. I know that it means that reads align to multiple locations. I know location of one align, but can i find another location/locations in IGV? If…

Continue Reading Find all locations of aligned reads with MQ (Mapping Quality) = 0.

did pilon improve my genome?

did pilon improve my genome? 0 my doubt is if my sequence has actually improved? refseq is the reference sequence. polished is the output from pilon and contigs.fasta is my file generated from spades. pls help I used the command Java -Xmx2048m -jar pilon-1.24.jar –genome refseq.fasta –frags sorted bam.bam –output…

Continue Reading did pilon improve my genome?

Salmon (or other pseudo-mappers) for multi-species RNAseq read filtering

Hello all, Background: I’ve inherited a new RNAseq data set and am thinking about updating my approaches (last time I did this I was using HISAT and Cuffdiff). I’d like some opinions on best strategies to disentangle/filter out parasite microbe reads from infected host reads before preforming a differential gene…

Continue Reading Salmon (or other pseudo-mappers) for multi-species RNAseq read filtering

Problem aligning target capture sequencing of a few hundred regions to the human reference genome

Hello, We have been trying to develop a target capture sequencing of around one hundred regions in the human genome. The probes capture a region of approximately 500bp around some genetic variants of interest. Our bioinformatics pipeline uses bwa-mem for aligning the captured reads to a custom genome reference. These…

Continue Reading Problem aligning target capture sequencing of a few hundred regions to the human reference genome

A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei

Cell growth and transfections Procyclic form (PCF) T. brucei, strain 29-1354, which carries integrated genes for the T7 polymerase and the tetracycline repressor, was grown in SDM-79 medium supplemented with 10% fetal calf serum, in the presence of 50 μg/ml hygromycin. Cells were grown in the presence of 15 μg/ml G418 for…

Continue Reading A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei

Filter out ALT contigs from CRAM

Filter out ALT contigs from CRAM 1 Dear community members, I got a CRAM aligned to a very customised reference with weird (not even “canonical” alt) contigs. They are not covered except several accidental reads and I can safely filter them out. Is there a way to do it for…

Continue Reading Filter out ALT contigs from CRAM

Intrinsic deletion at 10q23.31, including the PTEN gene locus, is aggravated upon CRISPR-Cas9-mediated genome engineering in HAP1 cells mimicking cancer profiles

Introduction The CRISPR-Cas system is a widely used genome engineering technology because of its simple programmability, versatile scalability, and targeting efficiency (Wang & Doudna, 2023). Although researchers are rapidly developing CRISPR-Cas9 tools, the biggest challenge remains to overcome undesired on- and off-targeting outcomes. Previous studies have reported unintended genomic alterations,…

Continue Reading Intrinsic deletion at 10q23.31, including the PTEN gene locus, is aggravated upon CRISPR-Cas9-mediated genome engineering in HAP1 cells mimicking cancer profiles

The Biostar Herald for Monday, November 20, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…

Continue Reading The Biostar Herald for Monday, November 20, 2023

STAR alignment speed

STAR alignment speed 1 Hello, I am trying to align RNA sequencing data from the NCBI SRA database to the Apis mellifera genome with STAR. The alignment worked fine. However, the mapping step of the alignment seems to be a bit slow. Furthermore, increasing the number of available threads does…

Continue Reading STAR alignment speed

LncRNA INHEG promotes glioma stem cell maintenance and tumorigenicity through regulating rRNA 2’-O-methylation

Ethics statement All mice procedures in this study were performed under an animal protocol approved by the Institutional Animal Care and Use Committee guidelines of Westlake University. The procedures and protocols for glioma patients were approved by the institutional review board of Beijing Tiantan Hospital. Informed consent was obtained from…

Continue Reading LncRNA INHEG promotes glioma stem cell maintenance and tumorigenicity through regulating rRNA 2’-O-methylation

BaseRecalibrator takes forever to run. Any suggestions?

BaseRecalibrator takes forever to run. Any suggestions? 1 Hello, I am trying to run BaseRecalibrator tool from GATK package and it takes forever (more than 4 days per one bam file). The command I’m using is: gatk BaseRecalibrator -I NG-01_1_S1_dedup_bwa.bam -R /rumi/shams/genomes/hg38/hg38.fa –known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz –known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz –known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table…

Continue Reading BaseRecalibrator takes forever to run. Any suggestions?

Qualimap bamqc v2.2.2 Cannot invoke “org.bioinfo.ngs.qc.qualimap.beans.XYVector.getXVector()” because “this.data” is null

That solves the problem! It is a panel data so I thought it would be expected not to have regions outside the given intervals. Maybe just for completeness, what are the implications of not supplying a –feature-file ?  Otherwise, the problem is resolved: “` QualiMap v.2.2.2-devBuilt on 2019-11-11 14:05 Selected…

Continue Reading Qualimap bamqc v2.2.2 Cannot invoke “org.bioinfo.ngs.qc.qualimap.beans.XYVector.getXVector()” because “this.data” is null