Tag: cram

Ubuntu Manpage: samtools-quickcheck – a rapid sanity check on input files

Provided by: samtools_1.19-1_amd64 NAME samtools-quickcheck – a rapid sanity check on input files SYNOPSIS samtools quickcheck [options] in.sam|in.bam|in.cram [ … ] DESCRIPTION Quickly check that input files appear to be intact. Checks that beginning of the file contains a valid header (all formats) containing at least one target sequence and…

Continue Reading Ubuntu Manpage: samtools-quickcheck – a rapid sanity check on input files

The Biostar Herald for Monday, December 11, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, cmdcolin, and was edited by…

Continue Reading The Biostar Herald for Monday, December 11, 2023

Filter out ALT contigs from CRAM

Filter out ALT contigs from CRAM 1 Dear community members, I got a CRAM aligned to a very customised reference with weird (not even “canonical” alt) contigs. They are not covered except several accidental reads and I can safely filter them out. Is there a way to do it for…

Continue Reading Filter out ALT contigs from CRAM

ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications | BMC Bioinformatics

Pipeline architecture and configuration file Genomic data processing poses a challenge for genetic research studies because it involves multiple program dependency installations, vast numbers of samples with raw data from various next-generation sequencing (NGS) platforms, and inconsistent genetic variant ID and/or positions among datasets. The Iliad suite of genomic data…

Continue Reading ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications | BMC Bioinformatics

How to slice a CRAM file into the 50kb regions padded with 1kb?

How to slice a CRAM file into the 50kb regions padded with 1kb? 0 Hello, I am working on whole genome sequencing CRAM files and I want to perform GATK best practice. Before that, I want to slice each CRAM into smaller chunks, 50kb regions with 1kb padding, and avoid…

Continue Reading How to slice a CRAM file into the 50kb regions padded with 1kb?

Samtools filtering based on PNEXT

Samtools filtering based on PNEXT 0 Hi all, I was wondering if I’m missing something obvious: samtools can filter your BAM file based on many criteria (such as flags, tags, qlen etc) – but what is the correct way to get rid of the chimeric mappings (at least the type…

Continue Reading Samtools filtering based on PNEXT

Different number of reads when converting data from FASTQ to BAM and CRAM to FASTQ

Different number of reads when converting data from FASTQ to BAM and CRAM to FASTQ 1 This post is following up on this other question FASTQ to BAM to CRAM to FASTQ. I have developed an NGS pipeline for calling variants from amplicon data. Regarding the backup, we want to…

Continue Reading Different number of reads when converting data from FASTQ to BAM and CRAM to FASTQ

From 9 patients undergoing hip joint replacement surgery for osteoarthritis, we collected 3 cartilage samples each: a low-grade sample (no obvious evidence of damage or fibrillation); a high-grade sample (damaged and fibrillated cartilage); an osteophytic sample (overlaid bony protrusions mainly around the margins of the articular surface). Multiplexed libraries were sequenced on Illumina HiSeq 2000 (75bp paired-end read length) and a cram file was produced for each sample. This dataset contains all the data available for this study on 2017-06-09.

Dataset Description From 9 patients undergoing hip joint replacement surgery for osteoarthritis, we collected 3 cartilage samples each: a low-grade sample (no obvious evidence of damage or fibrillation); a high-grade sample (damaged and fibrillated cartilage); an osteophytic sample (overlaid bony protrusions mainly around the margins of the articular surface). Multiplexed…

Continue Reading From 9 patients undergoing hip joint replacement surgery for osteoarthritis, we collected 3 cartilage samples each: a low-grade sample (no obvious evidence of damage or fibrillation); a high-grade sample (damaged and fibrillated cartilage); an osteophytic sample (overlaid bony protrusions mainly around the margins of the articular surface). Multiplexed libraries were sequenced on Illumina HiSeq 2000 (75bp paired-end read length) and a cram file was produced for each sample. This dataset contains all the data available for this study on 2017-06-09.

NGS one-liner to call variants

Tutorial:NGS one-liner to call variants 0 This is a tutorial about creating a pipeline for sequence analysis in a single line. It is made for capture/amplicon short read sequencing in mind for human DNA and tested with reference exome sequencing data described here. I share the process and debuging steps…

Continue Reading NGS one-liner to call variants

Question on samtools view with –fast option

Question on samtools view with –fast option 0 Hi I had a question on samtools view with –fast option. I was trying to find any relevant docs and/or blogs detailing its usage and how best to use it. I could not find any and I thought I will ask the…

Continue Reading Question on samtools view with –fast option

NGS oneliner

Tutorial:NGS oneliner 0 This is a tutorial about creating a pipeline for sequence analysis in a single line.I share the process and debuging steps gone through while putting it together.Source is available at: github.com/barslmn/ngsoneliner/I couldn’t make a longer post, complete version of this post: omics.sbs/blog/NGSoneliner/NGSoneliner.html Pipeline # fastp –in1 “$R1″…

Continue Reading NGS oneliner

FASTQ to BAM to CRAM to FASTQ

My NGS bioinformatics analysis starts with an amplicon FASTQ file (only the R1). In my workflow, I finally created a BAM file. Then, I convert this BAM in CRAM for backup apptainer exec –bind “$ref_folder”:”$ref_folder” “$samtools” samtools view \ -C -T $bwarefgenomepath \ -o ART03_FINAL.cram \ ART03_FINAL.bam We will backup…

Continue Reading FASTQ to BAM to CRAM to FASTQ

Does GATK SetNmMdAndUqTags reduces the size of a CRAM?

Does GATK SetNmMdAndUqTags reduces the size of a CRAM? 0 I performed GATK SetNmMdAndUqTags on a CRAM file for Whole Genome Sequencing after completing the MarkDuplicates step. The initial size of the CRAM file was 19GB, and after performing the SetNmMdAndUqTags operation, its size reduced to 8GB. The following is…

Continue Reading Does GATK SetNmMdAndUqTags reduces the size of a CRAM?

ILIAD: A suite of automated Snakemake workflows for processing genomic data for downstream applications

Abstract Background: Processing raw genomic data for downstream applications such as imputation, association studies, and modeling requires numerous third-party bioinformatics software tools. It is highly time-consuming and resource-intensive with computational demands and storage limitations that pose significant challenges that increase cost. The use of software tools independent of one another,…

Continue Reading ILIAD: A suite of automated Snakemake workflows for processing genomic data for downstream applications

Quantify gene expression from CRAM file

Quantify gene expression from CRAM file 0 Hello Biostars folks, Do you know any tools that can quantify gene expression from aligned CRAM files in RNASeq? In the past I used featureCounts but it doesn’t accept CRAM file. I am trying to quantify gene expression from the CRAM files downloaded…

Continue Reading Quantify gene expression from CRAM file

Genotyping, sequencing and analysis of 140,000 adults from Mexico City

Recruitment of study participants The MCPS was established in the late 1990s following discussions between Mexican scientists at the National Autonomous University of Mexico (UNAM) and British scientists at the University of Oxford about how best to measure the changing health effects of tobacco in Mexico. These discussions evolved into…

Continue Reading Genotyping, sequencing and analysis of 140,000 adults from Mexico City

Ralph Cram – File 770

(1) WAYS IN WHICH PRATCHETT IS STILL WITH US. Sam Jordison discusses “Pratchett power: from lost stories to new adaptations, how the late Discworld author lives on” in the Guardian. “Of all the dead authors in the world, Terry Pratchett is the most alive,” said John Lloyd at the author’s memorial in…

Continue Reading Ralph Cram – File 770

Issue VEP installation MacOS

Issue VEP installation MacOS 0 Hi, I’m trying to install VEP on Mac. I’ve tried on the Anaconda Navigator, but I couldn’t install. I also tried through the terminal, but also in this way I can’t install. The last error that I’ve got is: User cram/cram_io.c:61:10: fatal error: ‘lzma.h’ file…

Continue Reading Issue VEP installation MacOS

Building mosdepth on macOS

This is just a tiny tutorial on how to build mosdepth on Mac. There is currently no version for Mac available at conda (hope that changes soon, edit (3/2021): it did change, see anaconda.org/bioconda/mosdepth), and from what I’ve read building from source was a pain so far, still these simple…

Continue Reading Building mosdepth on macOS

Find reference genome regions spanned by only mapping quality 0 reads in multiple WGS samples

Find reference genome regions spanned by only mapping quality 0 reads in multiple WGS samples 0 For the parallelization of multi-sample variant calling I am looking for reference genome regions to split on. With the T2T reference genomes, there are not that many polyN regions left to split on. I…

Continue Reading Find reference genome regions spanned by only mapping quality 0 reads in multiple WGS samples

Sarek did not perform variant calling?

I’m trying to check for mutations from whole exome sequencing of two samples from the same patient, and was recommended to use the nextflow sarek pipeline. I assembled the fastq files I needed, made the csv file describing the patient sample information (patient, sample, lane, fastq_1, fastq_2), and entered the…

Continue Reading Sarek did not perform variant calling?

File Format Archives | The Golden Helix Blog

Unlocking the Potential of CRAM Files: The New VarSeq 2.3.0 Release for Enhanced Plotting, Coverage Analysis, and CNV Detection The CRAM (Compressed Reference-oriented Alignment Map) file format was conceived in 2011 as a more space-efficient way to store alignment…

Continue Reading File Format Archives | The Golden Helix Blog

Hwo to identify that BQSR is performed on CRAM file

Hwo to identify that BQSR is performed on CRAM file 1 Hi, I have a bunch of CRAM files of WGS that I want to check if Base Quality Score Recalibration (BQSR) has been done or not. Does anyone can help me how can I check it? Illumina GATK WGS…

Continue Reading Hwo to identify that BQSR is performed on CRAM file

refget v2.0 links the hidden dictionaries of

image: How refget works view more  Credit: Stephanie Li / GA4GH   A widely-used tool that finds the exact references needed to pinpoint differences in our DNA just got a refresh. On 17 July, the Standards Steering Committee of the Global Alliance for Genomics and Health (GA4GH) voted to release refget v2.0….

Continue Reading refget v2.0 links the hidden dictionaries of

refget v2.0 links the hidden dictionaries of DNA

How refget works. Credit: Stephanie Li / GA4GH A widely-used tool that finds the exact references needed to pinpoint differences in our DNA just got a refresh. On 17 July, the Standards Steering Committee of the Global Alliance for Genomics and Health (GA4GH) voted to release refget v2.0. With better…

Continue Reading refget v2.0 links the hidden dictionaries of DNA

Spatially resolved multiomics of human cardiac niches

Research ethics for donor tissues All heart tissue samples were obtained from transplant donors after Research Ethics Committee approval and written informed consent from donor families as previously described2. The following ethics approvals for donors of additional heart tissue were obtained: D8 and A61 (REC reference 15/EE/0152, East of England…

Continue Reading Spatially resolved multiomics of human cardiac niches

131releng-armv7-quarterly][biology/htslib] Failed for htslib-1.17 in build

You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: j…@freebsd.org Log URL: pkg-status.freebsd.org/ampere1/data/131releng-armv7-quarterly/bee14067723b/logs/htslib-1.17.log Build URL: pkg-status.freebsd.org/ampere1/build.html?mastername=131releng-armv7-quarterly&build=bee14067723b Log: =>> Building biology/htslib build started at Mon Jul 10…

Continue Reading 131releng-armv7-quarterly][biology/htslib] Failed for htslib-1.17 in build

rust-htslib 0.44.1 – Docs.rs

This library provides HTSlib bindings and a high level Rust API for reading and writing BAM files. To clone this repository, issue $ git clone –recursive github.com/rust-bio/rust-htslib.git ensuring that the HTSlib submodule is fetched, too. If you only want to use the library, there is no need to clone the…

Continue Reading rust-htslib 0.44.1 – Docs.rs

How can there be numerous high quality heterozygous y chromosome alleles not within pseudoautosomal regions across chrY in WGS data?

Sorry if this seems ignorant, but that is why one asks questions: to learn. While investigating a WGS sequence within IGV, there appear numerous heterozygous y alleles across the full Y chromosome. How can this occur in general? How common is this? At what point is it not common, i.e….

Continue Reading How can there be numerous high quality heterozygous y chromosome alleles not within pseudoautosomal regions across chrY in WGS data?

cram file to fastq conversion

cram file to fastq conversion 1 Hi all, I received some cram files from the 1000 genomes data. I am trying to convert them back to a fastq file, but can’t seem to figure out how to do this. I’ve tried using samtools fastq -1 out.R1.fastq -2 out.R2.fastq input.cram but…

Continue Reading cram file to fastq conversion

how to pass Bam and Bam index as Input Channel?

Nextflow: how to pass Bam and Bam index as Input Channel? 2 I would like to pass in bam files pair_id.sorted.bam and their corresponding index files pair_id.sorted.bam.csi into a nextflow workflow. However I am having trouble passing in the files, with errors being thrown for def indexFile = new File(“${it.getPath()}.bai”)….

Continue Reading how to pass Bam and Bam index as Input Channel?

Haplotypecaller batch mode – Parabricks

when haplotypecaller runs in batch mode, it get errors, as below singularity exec –nv clara-parabricks_4.0.1-1.sif pbrun haplotypecaller –batch –ref ref.fa –in-bam /data/bam/ –out-variants /date/gvcf/ –gvcfPlease visit NVIDIA Clara – NVIDIA Docs for detailed documentation [E::hts_hopen] Failed to open file /data/bam/[E::hts_open_format] Failed to open file “/data/bam/” : Is a directorysamtools view:…

Continue Reading Haplotypecaller batch mode – Parabricks

Merged CRAM output

Merged CRAM output 0 Hi here I recently merged a bunch of CRAM files with samtools. One thing I notice is that for each one of them the .log output reported the following: [W::cram_populate_ref] Creating reference cache directory /home/<user>/.cache/hts-ref This may become large; see the samtools(1) manual page REF_CACHE discussion…

Continue Reading Merged CRAM output

Genozip 15 with co-compression of BAM and FASTQ

Tool:Launched: Genozip 15 with co-compression of BAM and FASTQ 1 I am excited to announce the launch of our new version of Genozip – Genozip 15 – a genomic compressor for FASTQ, BAM, VCF and many other genomic formats. The key new capability in version 15 is our patent-pending method…

Continue Reading Genozip 15 with co-compression of BAM and FASTQ

Merging CRAM files

Merging CRAM files 1 Hi there I’m facing the task of merging the CRAM files for 25 human samples. Each on is divided into 12-13 CRAM files (total of 322 individual CRAMs), for which I have set a sample identifier and number as follow code_number where the code refers to…

Continue Reading Merging CRAM files

samtools collate

samtools collate 0 Hi all, I am using samtools collate to convert my bam files to paired end fastq files. here is the command that I am using samtools view -h -T mm10.fa {input.bam} | samtools collate -O -u -@ {threads} – | samtools fastq -1 output_paired1.fq.gz -2 output_paired2.fq.gz -0…

Continue Reading samtools collate

ftbfs and test failure against htslib 1.17

Source: samtools Version: 1.16.1-1 Severity: important Tags: ftbfs Hi, When samtools is tested against htslib 1.17 now available in experimental, I witness the following error, either from build time checks or from autopkgtest: The command failed [256]: /tmp/autopkgtest.PsRbbX/autopkgtest_tmp/samtools view -e ‘pos<1000||pos>1200’ -O cram,embed_ref=1 -T test/dat/mpileup.ref.fa -o /tmp/autopkgtest.PsRbbX/autopkgtest_tmp/test/reference/mpileup.1.tmp.cram test/dat/mpileup.1.sam out: err:[E::validate_md5]…

Continue Reading ftbfs and test failure against htslib 1.17

No @hd header returned in sam file when running bwa mem

No @hd header returned in sam file when running bwa mem 1 Hello, I produced sam files with the below command: bwa mem -M -t 10\ IndexedReference\ ${sample}_R1.fastq.gz ${sample}_R2.fastq.gz\ 2> ${sample}_bwa.err > ${sample}.sam` The resulting sam file doesn’t have an @hd header. Example output of samtools view: samtools view -H…

Continue Reading No @hd header returned in sam file when running bwa mem

Epigenetic dysregulation from chromosomal transit in micronuclei

Cell culture Cell lines (MDA-MB-231, 4T1 and RPE-1) were purchased from the American Type Culture Collection (ATCC). TP53-knockout MCF10A, TP53-knockout RPE-1 and Trex1 knockout 4T1 cells were gifts from the Maciejowski laboratory at the Memorial Sloan Kettering Cancer Center (MSKCC). OVCAR-3 cells were a gift from J. D. Gonzales. All…

Continue Reading Epigenetic dysregulation from chromosomal transit in micronuclei

Giants’ Brian Daboll treating OTAs like a ‘teaching camp’

This week, New York Giants head coach Brian Daboll began his second round of organized team activities (OTAs) with the team. There’s a big difference in the air from this time last year when everything and almost everyone was new to one another and their surroundings. The Giants entered Phase…

Continue Reading Giants’ Brian Daboll treating OTAs like a ‘teaching camp’

How to Split 3000 WGS CRAM files into 1Mbp length chunks

How to Split 3000 WGS CRAM files into 1Mbp length chunks 1 Hello, I have 3000 WGS CRAM files and I want to split them into 1Mbp chunks. I want to split with exact genomic coordinate locations, e.g. starting from 1 to 1000000bp, 1000001bp to 2000000bp, 2000001bp to 3000000 etc….

Continue Reading How to Split 3000 WGS CRAM files into 1Mbp length chunks

Answer: Estimate sizes of repeats in a especific Gene

Tell me if I’m in the way. I have the CRAM file and the respective CRAI (index). So I just ran the SAM like this, clipping my area of interest: > $ samtools view -b NG1PSZ7BE9.mm2.sortdup.bqsr.cram “chrX:147912050-147912110” > result.bam Then I indexed the .bam file: > $ samtools index result.bam…

Continue Reading Answer: Estimate sizes of repeats in a especific Gene

Estimate sizes of repeats in a especific Gene

Estimate sizes of repeats in a especific Gene 0 Amateur problem here: We know that it is possible to use the ExpansionHunter tool to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat….

Continue Reading Estimate sizes of repeats in a especific Gene

Predictive network analysis identifies JMJD6 and other potential key drivers in Alzheimer’s disease

Cerejeira, J., Lagarto, L. & Mukaetova-Ladinska, E. B. Behavioral and psychological symptoms of dementia. Front. Neurol. 3, 73 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  Murphy, M. P. & LeVine, H. III Alzheimer’s disease and the amyloid-beta peptide. J. Alzheimers Dis. 19, 311–323 (2010). Article  PubMed  PubMed Central  Google…

Continue Reading Predictive network analysis identifies JMJD6 and other potential key drivers in Alzheimer’s disease

A draft human pangenome reference

Sample selection We identified parent–child trios from the 1KG in which the child cell line banked within the NHGRI Sample Repository for Human Genetic Research at the Coriell Institute for Medical Research was listed as having zero expansions and two or fewer passages, and rank-ordered representative individuals as follows. Loci…

Continue Reading A draft human pangenome reference

No differentially expressed genes after multiple testing correction in mice

No differentially expressed genes after multiple testing correction in mice 0 Hi all, I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified. I have had .cram files from mouse…

Continue Reading No differentially expressed genes after multiple testing correction in mice

Missing columns in meta table from SRA Selector

Unfortunately there is not enforced standard of what metadata must make into the SRA, it is very frustrating actually and makes reproducing any analysis needlessly complicated. You can look at what EBI fields are there, and sometimes they produce more fields than SRA: pip install bio then look at the…

Continue Reading Missing columns in meta table from SRA Selector

Best Practices for CRAM BAM

Forum:Best Practices for CRAM <-> BAM 0 Hi, I am looking for advice about transitioning from bam/bai to cram for archival purposes. General advice is appreciated, but I’m specifically looking for answers to these two questions – Does samtools offer the best performance for converting to and from CRAMs? Do…

Continue Reading Best Practices for CRAM BAM

Comparing Alignment Files (CRAM)

Comparing Alignment Files (CRAM) 0 Hello all, Just checked different forums and generally, I see that it would be useful to use samtools or picard-tools for comparing alignment files. Here I want to compare the aligned output files using two different alignment algorithms. In this case, I had some general…

Continue Reading Comparing Alignment Files (CRAM)

Issue With CRAM -> BAM -> FASTQ Conversion

Issue With CRAM -> BAM -> FASTQ Conversion 2 Please help! I am trying to obtain fastq files from the GDSC, all we have in the lab is CRAM files. Unfortunately, the reference genome seems to not exist when pulled from an online source. I have attempted to use the…

Continue Reading Issue With CRAM -> BAM -> FASTQ Conversion

Supported Tools – MultiQC

Tool Tool Name Description Removes adapter sequences and trims low quality bases from the 3′ end of reads. Overlapping paired-ended reads can be merged into consensus sequences and adapter sequence can be found for paired-ended data if not known. Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data….

Continue Reading Supported Tools – MultiQC

storage – Good / recommended way to archive fastq and bam files?

The only free and open source tool I know that can help is zstd. Their github repository’s README describes it as: Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios. It’s backed by a very fast entropy…

Continue Reading storage – Good / recommended way to archive fastq and bam files?

The Biostar Herald for Monday, April 03, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…

Continue Reading The Biostar Herald for Monday, April 03, 2023

Manta and alignment name collision

Manta and alignment name collision 0 Dear community members, I received hundreds of CRAM files which I have to run through Manta SV calling and they fail due to “Unexpected alignment name collision” – this file contains tens (out of millions) of reads which were multi-mapped, so they have 2…

Continue Reading Manta and alignment name collision

Running accurate, comprehensive, and efficient genomics workflows on AWS using Illumina DRAGEN v4.0

Introduction The reduced cost of DNA sequencing technology has led to an exponential growth of raw sequencing data. To keep pace with this development, secondary analysis tools that can provide fast and accurate results in a cost-effective manner are needed to extract actionable genomic insights. Illumina’s DRAGENTM (Dynamic Read Analysis for GENomics) addresses…

Continue Reading Running accurate, comprehensive, and efficient genomics workflows on AWS using Illumina DRAGEN v4.0

bwa-mem2 vs htslib – compare differences and reviews?

What are some alternatives? When comparing bwa-mem2 and htslib you can also consider the following projects: minimap2 – A versatile pairwise aligner for genomic and spliced nucleotide sequences bowtie2 – A fast and sensitive gapped read aligner genozip – A modern compressor for genomic files (FASTQ, SAM/BAM/CRAM, VCF, FASTA, GFF/GTF/GVF,…

Continue Reading bwa-mem2 vs htslib – compare differences and reviews?

converting cram to ubam

converting cram to ubam 1 How can I convert a cram to an unmapped bam file with samtools? samtools view -b -T ref.fasta input.cram > output.bam is this correct? cram • 21 views samtools collate -O -u input.cram | \ samtools reset -O BAM -o out.bam Login before adding your…

Continue Reading converting cram to ubam

Unravelling microalgal-bacterial interactions in aquatic ecosystems through 16S rRNA gene-based co-occurrence networks

Croft, M. T., Lawrence, A. D., Raux-Deery, E., Warren, M. J. & Smith, A. G. Algae acquire vitamin B12 through a symbiotic relationship with bacteria. Nature doi.org/10.1038/nature04056 (2005). Article  PubMed  Google Scholar  Kazamia, E. et al. Mutualistic interactions between vitamin B12-dependent algae and heterotrophic bacteria exhibit regulation. Environ. Microbiol. doi.org/10.1111/j.1462-2920.2012.02733.x…

Continue Reading Unravelling microalgal-bacterial interactions in aquatic ecosystems through 16S rRNA gene-based co-occurrence networks

Everything You need to know about the CRAM Format

This tutorial teaches everything you need to know about the CRAM format, bam to cram compression ratio, cramtools, etc 1. What is a BAM, SAM, and CRAM format BAM, SAM, and CRAM are file formats used to store and exchange alignment data in bioinformatics. BAM (Binary Alignment/Map) format is a…

Continue Reading Everything You need to know about the CRAM Format

SAMtools – PACE Cluster Documentation

Updated 2023-01-06 Overview SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM, and CRAM formats. This guide will cover how to run SAMtools on the Cluster. This is the link to the SAMtools Homepage. Summary SAMtools has a set…

Continue Reading SAMtools – PACE Cluster Documentation

CYP2D6: Phase I Oxidative Metabolism Enzyme – 474 Words

CYP2D6: Phase I Oxidative Metabolism Enzyme – 474 Words | Cram Home Page CYP2D6: Phase I Oxidative Metabolism Enzyme CYP2D6 is a Phase I oxidative metabolism enzyme that is clinically important because about 20-25% of clinically used drug are metabolized by the CYP2D6 enzyme. CYP2D6 substrates are typically lipophilic and…

Continue Reading CYP2D6: Phase I Oxidative Metabolism Enzyme – 474 Words

Ubuntu Manpage: samtools index – indexes SAM/BAM/CRAM files

Provided by: samtools_1.10-3_amd64 NAME samtools index – indexes SAM/BAM/CRAM files SYNOPSIS samtools index [-bc] [-m INT] aln.bam|aln.cram [out.index] DESCRIPTION Index a coordinate-sorted BGZIP-compressed SAM, BAM or CRAM file for fast random access. (Note that this does not work with uncompressed SAM files.) This index is needed when region arguments are…

Continue Reading Ubuntu Manpage: samtools index – indexes SAM/BAM/CRAM files

Cost-effective and accurate genomics analysis with Sentieon on AWS

This blog post was contributed by Don Freed, Senior Bioinformatics Scientist, and Brendan Gallagher, Head of Business Development at Sentieon; and Olivia Choudhury, PhD, Senior Partner Solutions Architect, Sujaya Srinivasan, Genomics Solutions Architect, and Aniket Deshpande, Senior Specialist, HPC HCLS at AWS. The year 2022 was an exciting one for genomics…

Continue Reading Cost-effective and accurate genomics analysis with Sentieon on AWS

Plink duplicate ID

Plink duplicate ID 1 Hi, I’ve converted the reich dataset to plunk format along with my vcf file provided from my full genome, I merged the both together which led to getting an error and output two files. The two files it output was .fam and .missnp, now it tried…

Continue Reading Plink duplicate ID

find tandem repeats in DNA

find tandem repeats in DNA 1 @07a6aebe Last seen 8 hours ago United Kingdom I want to find tandem repeats in DNA. I have access to CRAM file and the VCF file. I initially tried to get the insertions from the VCF file, but I am not sure if the…

Continue Reading find tandem repeats in DNA

Remote Visualization of Local Genome Alignments Aids Pathogenic Variant Evaluation for Rare Disease

CHICAGO – A group at Spain’s National Center for Genomic Analysis-Center for Genomic Regulation (CNAG-CRG) in Barcelona has harnessed a protocol for accessing sequencing and variant data to help assess potentially pathogenic genetic variants within the context of a European Union-funded program to improve diagnosis of rare diseases. The CNAG-CRG…

Continue Reading Remote Visualization of Local Genome Alignments Aids Pathogenic Variant Evaluation for Rare Disease

find tandem repeats in DNA from CRAM/VCF file

find tandem repeats in DNA from CRAM/VCF file 0 I want to find tandem repeats in DNA. I have access to CRAM file and the VCF file. I initially tried to get the insertions from the VCF file, but I am not sure if the variant caller has included all…

Continue Reading find tandem repeats in DNA from CRAM/VCF file

Standards, Regulation, Funding Move Bioinformatics in 2022, But Hurdles to Precision Medicine Remain

CHICAGO – Although the US Food and Drug Administration (FDA) provided some long-sought clarity in 2022 on how it would regulate clinical decision support and in vitro diagnostic software, technology developers and healthcare organizations still struggled with how to integrate genomics data into clinical practice. It will likely take more…

Continue Reading Standards, Regulation, Funding Move Bioinformatics in 2022, But Hurdles to Precision Medicine Remain

Compressing BAM, SAM, CRAM | Genozip

How good is Genozip at compressing BAM files? ​ See Benchmarks. ​ Compressing a BAM, SAM or CRAM file  ​ In the rest of this page we will give examples of BAM files. Genozip is also capable of compressing SAM files, and with some limitations, CRAM files as well. ​…

Continue Reading Compressing BAM, SAM, CRAM | Genozip

Getting information on CRAM files from headers inside the files

Getting information on CRAM files from headers inside the files 1 Hello. I wish to know if one can find the following information in CRAM files’ headers: 1) Whether or not sequencing data in CRAM files is from WGS or WES, and if so, where? and 2) In case one…

Continue Reading Getting information on CRAM files from headers inside the files

Samtools Convert Sam To Bam With Code Examples

Samtools Convert Sam To Bam With Code Examples In this session, we’ll try our hand at solving the Samtools Convert Sam To Bam puzzle by using the computer language. The code that follows serves to illustrate this point. # Basic syntax: samtools view -S -b sam_file.sam > bam_file.bam # Where:…

Continue Reading Samtools Convert Sam To Bam With Code Examples

Index of /~psgendb/doc/pkg/samtools-1.7/htslib-1.7/cram

Name Last modified Size Description Parent Directory   –   cram.h 2015-06-24 11:00 2.4K   cram_codecs.c 2017-09-26 09:28 50K   cram_codecs.h 2016-03-17 07:48 6.0K   cram_codecs.o 2018-03-04 16:57 175K   cram_decode.c 2018-01-26 05:33 84K   cram_decode.h 2013-10-16 06:15 3.4K   cram_decode.o 2018-03-04 16:57 236K   cram_encode.c 2017-07-03 16:45 87K  …

Continue Reading Index of /~psgendb/doc/pkg/samtools-1.7/htslib-1.7/cram

CNV Pipeline Options

The following are the top-level options that are shared with the DRAGEN Host Software to control the CNV pipeline. You can input a BAM or CRAM file into the CNV pipeline. If you are using the DRAGEN mapper and aligner, you can use FASTQ files. …

Continue Reading CNV Pipeline Options

How to trim the length of reads in a CRAM file?

How to trim the length of reads in a CRAM file? 0 I have a CRAM file with paired reads which looks like this: im13@node-13-21:~/scratch_im13_projects/im13_basespace_runs$ samtools view ./walkup_194_repeat/CRAM/A01_FR_KAPA_25x_1ug_SR_1ngx4rxns_S1.cram | head D00586:937:HVCWGBCX3:1:1101:1485:1803 77 * 0 0 * * 0 0 NCAGAGGAAGCGGAACGCATGTTTC #<GGGIIGIGGGIIGIGIIGGG.<< D00586:937:HVCWGBCX3:1:1101:1485:1803 141 * 0 0 * * 0 0…

Continue Reading How to trim the length of reads in a CRAM file?

Index of /~psgendb/birchhomedir/public_html/doc/pkg/samtools-1.7/htslib-1.7/htslib

Name Last modified Size Description Parent Directory   –   bgzf.h 2018-01-10 07:45 14K   cram.h 2015-09-25 05:36 15K   faidx.h 2017-02-07 11:06 5.6K   hfile.h 2018-01-26 05:33 9.6K   hts.h 2017-11-24 09:46 29K   hts_defs.h 2017-08-10 11:07 3.3K   hts_endian.h 2017-09-27 10:40 11K   hts_log.h 2017-06-03 15:45 3.8K  …

Continue Reading Index of /~psgendb/birchhomedir/public_html/doc/pkg/samtools-1.7/htslib-1.7/htslib

How To Install libhts-dev on Kali Linux

In this tutorial we learn how to install libhts-dev on Kali Linux. libhts-dev is development files for the HTSlib Introduction In this tutorial we learn how to install libhts-dev on Kali Linux. What is libhts-dev HTSlib is an implementation of a unified C library for accessing common file formats, such…

Continue Reading How To Install libhts-dev on Kali Linux

Samtools Htslib Issues

Issue Title State Comments Created Date Updated Date How to get a specific chromosome open 1 2022-07-14 2022-07-18 tabix returns row from VCF file multiple times open 4 2022-07-11 2022-07-18 Modified base parsing failure failure closed 0 2022-07-01 2022-07-18 extract genotype information open 1 2022-06-24 2022-07-18 sam_hdr_remove_lines is inefficient if…

Continue Reading Samtools Htslib Issues

Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

Provided by: liballelecount-perl_4.2.1-1_all NAME alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each specified locus. SYNOPSIS Where possible use the C version for large data (it’s also more configurable). alleleCounts.pl Required: -bam -b BAM/CRAM file (expects co-located index) – if CRAM see ‘-ref’ -output -o Output…

Continue Reading Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

Provided by: biobambam2_2.0.179+ds-1_amd64 NAME bamfillquery – fill query sequences into BAM files SYNOPSIS bamfillquery [options] <in.bam queries.fasta >out.bam DESCRIPTION bamfillquery reads a SAM/BAM/CRAM file and a FastA file, copies the sequences found in the FastA file into the query sequence field of the SAM/BAM/CRAM file and writes the resulting data…

Continue Reading Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

[SpotBugs] htsjdk.samtools.cram.structure.CramHeader defines clone() but doesn’t implement Cloneable

Cloneable is not used very much so maybe deprecate and remove the clone() method? /cc @jmthibault79, @cmnbroad See spotbugs.readthedocs.io/en/stable/bugDescriptions.html#cn-class-defines-clone-but-doesn-t-implement-cloneable-cn-implements-clone-but-not-cloneable Part of #1267 Report: In class htsjdk.samtools.cram.structure.CramHeader In method htsjdk.samtools.cram.structure.CramHeader.clone() At CramHeader.java:[lines 80-85] Read more here: Source link

Continue Reading [SpotBugs] htsjdk.samtools.cram.structure.CramHeader defines clone() but doesn’t implement Cloneable

bioconductor – Trouble installing Rhtslib in R/R studio

I’m using RStudio on Ubuntu 18 and I’m trying to install the htslib package from the Bioconductor repo, but I’m stuck now. This is what I get: * installing *source* package ‘Rhtslib’ … ** using non-staged installation via StagedInstall field ** libs cd “htslib-1.7” && make -f “/usr/lib/R/etc/Makeconf” -f “Makefile.Rhtslib”…

Continue Reading bioconductor – Trouble installing Rhtslib in R/R studio

Read bam/cram file with IGV from aws s3

Hi all, We store our alignment files on aws s3. I would like to be able to open them with IGV without needing to download them completely, but I can’t find an optimal solution. If I get a pre-signed url it works but it’s not convenient. I try to follow…

Continue Reading Read bam/cram file with IGV from aws s3

Ubuntu Manpage: samtools reheader – replaces the header in the input file

Provided by: samtools_1.13-2_amd64 NAME samtools reheader – replaces the header in the input file SYNOPSIS samtools reheader [-iP] [-c CMD | in.header.sam ] in.bam DESCRIPTION Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion. By default…

Continue Reading Ubuntu Manpage: samtools reheader – replaces the header in the input file

The Biostar Herald for Tuesday, September 21, 2021

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…

Continue Reading The Biostar Herald for Tuesday, September 21, 2021

Bedtools: Merging Many Bed Files

Bedtools: Merging Many Bed Files 2 I am using the algorithm CookHLA for my research. As part of its preparation, I need to feed it a bed file representing at least 100 of my samples. I have made the bed files for 500 samples using samtools and bedtools in a…

Continue Reading Bedtools: Merging Many Bed Files

Best Omic file compressor?

Best Omic file compressor? 1 Our team has been having storage space issues; we predicted that we will not have enough available memory to store the files generated by our pipelines. Standard file compressors (gzip, bzip2, 7zip) weren’t cutting it and I started experimenting with file-specific compressors. This is where…

Continue Reading Best Omic file compressor?

[main_samview] fail to read the header from “-“.

[main_samview] fail to read the header from “-“. 1 I am attempting to run a file through an algorithm I have been using, HLA*LA. On running the samtools command within the algorithm, I have unfortunately been getting this error. After trying to debug this following other guides, I am seeking…

Continue Reading [main_samview] fail to read the header from “-“.

How to extract all sequences mapped to a transcript from Kallisto output

How to extract all sequences mapped to a transcript from Kallisto output 0 I ran Kallisto with the –pseudobam option. How do I extract all the short reads that are mapped to a single transcript (e.g. ENST00000367969.8)? As a person without any previous SAM/BAM experience, I tried the following things…

Continue Reading How to extract all sequences mapped to a transcript from Kallisto output

install GenomicFeatures fail

install GenomicFeatures fail 1 @5b9023e7 Last seen 19 hours ago China BiocManager::install(‘GenomicFeatures’) results show ‘getOption(“repos”)’ replaces Bioconductor standard repositories, see ‘?repositories’ for details replacement repositories: CRAN: mirrors.tuna.tsinghua.edu.cn/CRAN/ Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.0 (2021-05-18) Installing package(s) ‘GenomicFeatures’ also installing the dependencies ‘Rhtslib’, ‘Rsamtools’, ‘GenomicAlignments’, ‘rtracklayer’ Packages which are only…

Continue Reading install GenomicFeatures fail

.tar.gz = same size as before?

BAM compression: .tar.gz = same size as before? 2 I tried to compress 5 bam files using: tar -czvf original_bams.tar.gz *.bam The resulting file sizes (“ll –block-size=M”) are: 8067M file1.bam 6962M file2.bam 10662M file3.bam 7794M file4.bam 7346M file5.bam 40828M original_bams.tar.gz There’s a difference of 3MB between the archive and the…

Continue Reading .tar.gz = same size as before?