Categories
Tag: BAM
UMI workflow resulting in bams with empty reads
Hello all, In my NGS workflow for UMI based reads, I first tried identifying and removing sequence adapters using bbmerge and cutcadapt: BBMERGE -Xmx1g -ignorejunk in1=SAMPLE_R1 in2=SAMPLE_R2 outa= adapters.fa itn CUTADAPT -a forward_adapter -A reverse_adapter -o s_2_1_sequence_trimmed_UN.fastq.gz -p s_2_2_sequence_trimmed_UN.fastq.gz SAMPLE_R1 SAMPLE_R2 Then, I converted the trimmed fastq files to an…
A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
Introduction Short-read metagenomic sequencing is the technique most widely used to explore the natural habitat of millions of bacteria. In comparison with 16S rRNA sequencing, shotgun metagenomic sequencing (MGS) provides sequence information of the whole genomes, which can be used to identify different genes present in an individual bacterium and…
Bioconductor – genomation
DOI: 10.18129/B9.bioc.genomation Summary, annotation and visualization of genomic data Bioconductor version: Release (3.6) A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome…
Ubuntu Manpage: samtools-quickcheck – a rapid sanity check on input files
Provided by: samtools_1.19-1_amd64 NAME samtools-quickcheck – a rapid sanity check on input files SYNOPSIS samtools quickcheck [options] in.sam|in.bam|in.cram [ … ] DESCRIPTION Quickly check that input files appear to be intact. Checks that beginning of the file contains a valid header (all formats) containing at least one target sequence and…
Remote Software Quality Engineer III – Bioinformatics Job at Natera
JOB TITLE: Software Quality Engineer III – Bioinformatics LOCATION: Remote, USA PRIMARY RESPONSIBILITIES: Perform software verification, define and execute test cases and scenarios required for software quality assurance and regulatory compliance. Perform system analysis, assess risk, and develop strong test strategies by analyzing product design and technical specifications, and by…
Ubuntu Manpage: FastQC – high throughput sequence QC analysis tool
Provided by: fastqc_0.11.9+dfsg-5_all NAME FastQC – high throughput sequence QC analysis tool SYNOPSIS fastqc seqfile1 seqfile2 .. seqfileN fastqc [-o output dir] [–(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN DESCRIPTION FastQC reads a set of sequence files and produces from each one a quality control report consisting of…
Insight Global hiring Bioinformatics Software Engineer in Tennessee, United States
Job Description: 1) Use Nextflow to build bioinformatics pipelines that take FASTQ or BAM files as input and process them using bioinformatic tools. 2) Write Python/R scripts to process, summarize, and visualize outputs created by other tools. 3) Ensure that the pipeline is modular and flexible, with the ability to…
sam file error
sam file error 1 Hi, I was converting my sam file (after alignment with bowtie2) in bam format. I encountered the error: [E::sam_parse1] invalid QUAL character [W::sam_read1_sam] Parse error at line 11129453 command: samtools view -S -b -o input.bam ../alignment/input.sam Alignment works fine. This is the output: 22504890 reads; of…
java -jar picard.jar manual | BioQueue Encyclopedia
Category Sam/Bam Manipulation Usage java -jar picard.jar SetNmMDAndUqTags I=sorted.bam O=fixed.bam \ Manual INPUT (File) The BAM or SAM file to fix. Required. OUTPUT (File) The fixed BAM or SAM output file. Required. IS_BISULFITE_SEQUENCE (Boolean) Whether the file contains bisulfite sequence (used when calculating the NM tag). Default value: false. This option can be…
Trying to understand STAR fastqLog.final.out File
Trying to understand STAR fastqLog.final.out File 0 Hello, I am analyzing ribo-seq data and am trying to understand if my interpretation of star’s log file is correct. I do not have extensive bioinformatics/computational experience, so it’s been a bit difficult trying to understand how to proceed (the guides online are…
My paired end data became single end data after mapping
My paired end data became single end data after mapping 1 Dear community, Something weird happened to me, my public dataset is obviously paired-end data (stated in ‘metadata’ part of ENA database, and there are two seperate fastq files (R1 & R2) and index file (I1) per sequencing run). After…
‘Resources’ object has no attribute ‘tmpdir’
Snakemake error AttributeError: ‘Resources’ object has no attribute ‘tmpdir’ 0 I have built a Snakemake pipeline which has been designed for paired-end reads. I have made a trial with single-end reads, and got this error. I am not sure it is related to the change of reads design, and to…
Error in schicexplorer’s hicbuildmatrix
Error in schicexplorer’s hicbuildmatrix 1 I use schicexplorer’s hicbuildmatrix code; it complains that two sam files do not have the same reads order hicbuildmatrix schic • 801 views My best guess at the moment is that you need R1 and R2 bam files sorted by read name, not the default…
Genomic hypomethylation in cell-free DNA predicts responses to checkpoint blockade in lung and breast cancer
Lung cancer ICB cohort Advanced non-small cell lung carcinoma patients who were treated with anti-PD-1/PD-L1 monotherapy at Samsung Medical Center, Seoul, Republic of Korea were enrolled for this study. The present study has been reviewed and approved by the Institutional Review Board (IRB) of the Samsung Medical Center (IRB no….
bwa-mem reproducibility
bwa-mem reproducibility 1 I have a set of paired end fastq files, and I run bwa-mem (v0.7.17-r1188) on the files with the same exact parameters, including the same number of threads, in two different computing clusters. I compare the BAM file produced via samtools stats. and the outputs are different…
Enrichment profiles from counts
Enrichment profiles from counts 0 Hello everyone, I’m currently working with single-cell DamID datasets, focusing on studying protein-DNA interactions. In my dataset for each cell, I have aligned BAM files, counts files in HDF5 format, and count files binned at 100kb intervals. These count files contain the number of unique…
DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States
UPMC Presbyterian is hiring a full-time Bioinformatics Scientist to support the Molecular & Genomic Pathology Lab! This role will be scheduled for daylight shifts, Monday-Friday. The Molecular & Genomic Pathology Laboratory is a dynamic, state-of-the-art clinical laboratory that prides itself on delivering the highest quality of patient care through cutting-edge…
FeatureCounts Invalid Parameter Error
FeatureCounts Invalid Parameter Error 0 Hello! I’m trying to use featureCounts, and it keeps on giving me this error: ERROR: invalid parameter: ‘SRR11860547.bam’ I’m pretty new at using featureCounts, so I have no clue what is wrong. I’ve tried changing the directory and location of the file, but it keeps…
Variant calling using HaplotypeCaller does not show #FILTER information
Variant calling using HaplotypeCaller does not show #FILTER information 0 Hi All, I would like to ask for Variant Calling using HaplotypeCaller. It’s supposed that after doing the HaplotypeCaller, the #FILTER columns in gvcf files shall show the ‘PASS/LowQ’ however in my case, the output #FILTER only shows ‘.’ without…
Conserved and divergent gene regulatory programs of the mammalian neocortex
Nucleus preparation from frozen brain tissue for Chromium single-cell multiome ATAC and gene expression analysis M1 tissue was obtained from three human donors (male, aged 42, 29 and 58 years), three macaque donors (male, aged 6 (Macaca mulatta), 6 (M. mulatta) and 14 (Macaca fascicularis) years), three marmoset (Callithrix jacchus)…
haplotypecaller – NVIDIA Docs
Run a GPU-accelerated haplotypecaller. This tool applies an accelerated GATK CollectMultipleMetrics for assessing the metrics of a BAM file, such as including alignment success, quality score distributions, GC bias, and sequencing artifacts. This functions as a ‘meta-metrics’ tool, and can run any combination of the available metrics tools in GATK…
convert VCF to gVCF
Your question is not completely clear, but since the most sensible ways to understand it have the same answer, I’m gonna go with that. I have the exact reference fasta used for generating the VCFs TLDR: You don’t have enough information to do this with just VCFs and reference fasta….
Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain
Mouse brain tissues All experimental procedures using live animals were approved by the Salk Institute Animal Care and Use Committee under protocol number 18-00006. Adult (P56) C57BL/6J male mice were purchased from the Jackson Laboratory at 7 weeks of age and maintained in the Salk animal barrier facility on 12-h dark–light…
deeptools.plotCoverage Example
API (Occurances) deeptools.writeBedGraph_bam_and_bw.writeBedGraph(1) deeptools.writeBedGraph.openBam(1) deeptools.writeBedGraph.bedGraphToBigWig(1) deeptools.writeBedGraph.WriteBedGraph(4) deeptools.utilities.toBytes(2) deeptools.utilities.tbitToBamChrName(2) deeptools.utilities.mungeChromosome(2) deeptools.utilities.gtfOptions(1) deeptools.utilities.getTempFileName(8) deeptools.utilities.getGC_content(3) deeptools.utilities.getCommonChrNames(4) deeptools.utilities.bam_total_reads(2) deeptools.plotProfile.main(5) deeptools.plotHeatmap.main(6) deeptools.plotCoverage.main(1) deeptools.parserCommon.heatmapperMatrixArgs(2) deeptools.parserCommon.getParentArgParse(10) deeptools.parserCommon.deepBlueOptionalArgs(1) deeptools.parserCommon.check_float_0_1(1) deeptools.multiBigwigSummary.main(4) deeptools.multiBamSummary.main(2) deeptools.mapReduce.mapReduce(8) deeptools.mapReduce.getUserRegion(3) deeptools.mapReduce.blSubtract(1) deeptools.heatmapper_utilities.plot_single(2) deeptools.heatmapper_utilities.getProfileTicks(2) deeptools.heatmapper.heatmapper(6) deeptools.getScorePerBigWigBin.getScorePerBin(1) deeptools.getScaleFactor.get_scale_factor(1) deeptools.getScaleFactor.get_num_kept_reads(2) deeptools.getFragmentAndReadSize.get_read_and_fragment_length(7) deeptools.deepBlue.isDeepBlue(3) deeptools.countReadsPerBin.CountReadsPerBin(6) deeptools.correlation.Correlation(2) deeptools.computeMatrixOperations.sortMatrix(1) deeptools.computeMatrixOperations.main(5) deeptools.computeMatrix.main(10) deeptools.bigwigCompare.main(2) deeptools.bamHandler.openBam(22) deeptools.bamCoverage.process_args(1) deeptools.bamCoverage.main(9) deeptools.bamCompare.main(7) deeptools.SES_scaleFactor.estimateScaleFactor(2) deeptools.countReadsPerBin.cr.is_proper_pair(1) deeptools.config.config.get(2) deeptools.cfg.config.get(3) Read more…
Merge overlapping paired end reads from BAM file.
Merge overlapping paired end reads from BAM file. 0 Hi everyone, Using Trimmomatic and then HISAT2, I have aligned 300 RNA fastq samples (NovaSeq6000, RNA sequencing, paired-end, 150bp sequencing). I have found a percentage of overlapping paired end reads (read through) in the 300 .bam files. I found the overlaps…
Panel-based RNA fusion sequencing improves diagnostics of pediatric acute myeloid leukemia
Rasche M, Zimmermann M, Borschel L, Bourquin J, Dworzak M, Klingebiel T, et al. Successes and challenges in the treatment of pediatric acute myeloid leukemia: a retrospective analysis of the AML-BFM trials from 1987 to 2012. Leukemia. 2018;32:2167–77. Article PubMed PubMed Central Google Scholar Manola KN. Cytogenetics of pediatric acute…
Variant missing in WGS sample
Variant missing in WGS sample 1 Hi, I have processed a WGS sample including alignment (bwa-mem2), variant calling (GATK HaplotypeCaller) and annotation (ANNOVAR). In the annotated file, a variant fitting the phenotype was identified. However, on visualizing the bam in IGV, this variant was not there. What could be the…
overlapping duplicate dispersed_repeat feature in stringtie
GFF Error: overlapping duplicate dispersed_repeat feature in stringtie 0 Hi. I got following error when I use stringtie. with repeatmasker annotation gff file and RNA-seq bam files which is already sorted with samtools. GFF Error: overlapping duplicate dispersed_repeat feature (ID=461) GFF Error: overlapping duplicate dispersed_repeat feature (ID=712) GFF Error: overlapping…
Thyroid hormone-regulated chromatin landscape and transcriptional sensitivity of the pituitary gland
Mouse genetic models The ThrbHAB allele expresses TRβ proteins (TRβ1 and TRβ2) fused to a peptide with a hemagglutinin (HAx2) tag and a site for biotinylation by prokaryotic BirA ligase, modified from a published tag30. The tag was inserted at the endogenous Thrb gene by homologous recombination in W9.5 (129/Sv)…
Snakemake rule error
Snakemake rule error 0 I have the following rule in snakemake: rule low_coverage_contig_reads: input: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam.bai”, output: r1=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R1.fq.gz”, r2=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R2.fq.gz” threads: 8 params: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam” log: log1=”logs/{sample}_{fraction}_low_coverage_reads.log”, shell: “”” (samtools coverage {params.bam} | awk ‘NR > 1 && $7 < 10 {{print $1}}’ | tr ‘\\n’ ‘ ‘ | samtools view -u {params.bam}…
Help with gatk BaseRecalibrator
Help with gatk BaseRecalibrator 1 Hi Biostars, I try to do variant calling and got error at this step. Would you please have a suggestion? Thank you so much. gatk BaseRecalibrator -I ${aligned_reads}/SRR062634_sorted_dedup_reads.bam -R ${ref} –known-sites ${known_sites} -O ${data}/recal_data.table Invalid argument ‘/recal_data.table GATK variant-calling • 124 views • link updated…
The Biostar Herald for Monday, December 11, 2023
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, cmdcolin, and was edited by…
Calculate Jukes-Cantor, Kimura, Tamura-Nei etc distances from BAM files
Calculate Jukes-Cantor, Kimura, Tamura-Nei etc distances from BAM files 0 Hi, Does anyone know if there’s an existing tool to calculate genetic distances between query and subject sequences in BAMs/SAMs? I’m reasonably sure it’s possible to identify transitions and transversions using data from the MD flag and the sequence, and…
PacBio subreads.fastq files?
PacBio subreads.fastq files? 0 I have downloaded PacBio isoseq data as subreads.fastq format from NCBI. Most of the isoseq analysis tools require input as Pacbio .bam file, which is unavailable form NCBI. I want to perform differential gene expression analysis and alternative splicing analysis. I have confusion regarding the nature…
Insert Size For Illumina Gaiix Paired-End Library From Sam/Bam File
Insert Size For Illumina Gaiix Paired-End Library From Sam/Bam File 2 From the fastq data (read 1 and read 2) from illumina GAIIx platform ( paired-end library), I created the Sam and bam file using BWA. I got the statistics of number of uniquely-paired reads and total reads mapped to…
r – Fst calculation from VCF files
I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…
How to create interval list from reference fasta or dict file?
How to create interval list from reference fasta or dict file? 3 I am using GATK pipeline on WGS data. My BAM files is aligned to GRCh38 from GENCODE. So I want to create interval file for this GRCh38 instead of download from GATKbundle, because some of their contigs have…
Generating high-quality plant and fish reference genomes from field-collected specimens by optimizing preservation
Sample collection A total of nine species of marine fish were collected across three different sampling days (September 7th, 9th, and 12th 2022) under IACUC Animal Use Protocol S12219 (Supplementary Data 1). Six species were collected using a speargun donated by a local fisher. Fish were transported back to shore, euthanized,…
Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder
Transcript Assembly for Multiple Species Using StringTie and Orthogroup Discovery using OrthoFinder 0 Hi all, I am running a workflow to identify single copy orthogroups from RNAseq data including 9 species in a family of non-model organisms. All 9 species are closely related enough that they can be aligned to…
CIGAR and query sequence lengths differ
I am developing a program that softclips reads. When I run samtools view the_new_bam_created_with_my_softclipped_read.bam I get this error message MN01972:51:000H5KYKL:1:11101:10749:1220 0 chr2 208248363 60 24S77M25S * 0 0 CAAAATCACATTATTGCCAACATGACTTACTTGATCCCCATAAGCATGACGACCTATGATGATAGGTTTTACCCATCCACTCACAAGCCGGGGGATATTTTTGCAGATAATGGCTTCTCTGAAGAC AFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF[E::bam_read1] CIGAR and query sequence lengths differ for MN01972:51:000H5KYKL:1:11101:10753:13456 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFF MD:Z:126 RG:Z:230824_MN01972_0051_A000H5KYKL_RACP1_4poolv7anddirect_A NM:i:0 UQ:i:0 AS:i:126 MN01972:51:000H5KYKL:1:11101:10753:13456 0 chr2 208248339 60 111M2D13M…
Comparison of DNA sequencing services
This page lists the different DNA sequencing services. 2 main types can be distinguished: Whole exome sequencing is the middle ground between these two types, where a large amount of genes are sequenced, but only those that produce meaningful differences important for practical purposes, which is only 1% of the…
sam – Discrepancy in Read Counts Between FastQ and BAM Files in Adapter-Trimmed Pipeline
In a FastQ to BAM pipeline where only adapter trimming is performed, I’ve noticed a potential discrepancy in read counts between the initial FastQ files and their resulting BAM file. Specifically, I’m seeking clarification on whether the following statement holds true: “Total number of reads in R1 and R2 FastQ…
Read count vs Depth
Hi! I have been RNA seq short read sequencing data for a 112 dengue samples. I need to know by what percentage transcriptome is covered by our sequencing reads? I found Bedtools as an appropriate tool for this. however, i am unable to understand two different outputs from this tool…..
Very low successfully assigned alignments with feature counts
Hello everyone, I am stuck trying to analyze some single-end RNAseq data from human tissue. My issue is that the alignment with HISAT 2 went very well: 94.95% overall alignment rate. However, when I use featureCounts, I get: 5.7% when I set the strandSpecific parameter to 1. 5.3% when I…
ASEReadCounter output wrong number of coverage
ASEReadCounter output wrong number of coverage 0 Hi, I am using ASEReadCounter to count the number of reads per variant in a BAM file. For some positions, it will report 1 read covered(1 refCount or 1 altCount) while there is no read covered at those positions after checking it in…
megablast taxonomy assign in blobtools
megablast taxonomy assign in blobtools 0 I made taxonomy assignment file using megablast and ran blobtools create, view, plot. However I couldn’t get any taxonmy assignment in the plot, there is only undefined. How can I get bacterial information ? $blastn -task megablast -db ${nrdb} -query scaffold$i.fa -outfmt ‘6 qseqid…
How To Install bedtools on Debian 11
In this tutorial we learn how to install bedtools on Debian 11. bedtools is suite of utilities for comparing genomic features Introduction In this tutorial we learn how to install bedtools on Debian 11. What is bedtools bedtools is: The BEDTools utilities allow one to address common genomics tasks such…
Sorted bam files are empty after sorting them from bam
Sorted bam files are empty after sorting them from bam 0 Hi, I have been working with all my DNA analysis files in parallels but I got to a point where I had about 15 files get stuck on one step. Specifically, I notice something is wrong because the files…
Downsampling ATAC-seq BAM files
Downsampling ATAC-seq BAM files 0 Hi all, I have two ATAC datasets sequenced on two sequencers with different read depths. I would like to downsample the one with higher read depth to match that of the other dataset as we are observing batch effects following integration. To my understanding, I…
Fetching subsets with slow5curl and samtools
{“payload”:{“allShortcutsEnabled”:false,”fileTree”:{“docs”:{“items”:[{“name”:”data.md”,”path”:”docs/data.md”,”contentType”:”file”},{“name”:”mount.md”,”path”:”docs/mount.md”,”contentType”:”file”},{“name”:”slow5curl.md”,”path”:”docs/slow5curl.md”,”contentType”:”file”}],”totalCount”:3},””:{“items”:[{“name”:”docs”,”path”:”docs”,”contentType”:”directory”},{“name”:”README.md”,”path”:”README.md”,”contentType”:”file”}],”totalCount”:2}},”fileTreeProcessingTime”:21.958637,”foldersToFetch”:[],”reducedMotionEnabled”:null,”repo”:{“id”:641926755,”defaultBranch”:”main”,”name”:”gtgseq”,”ownerLogin”:”GenTechGp”,”currentUserCanPush”:false,”isFork”:false,”isEmpty”:false,”createdAt”:”2023-05-17T13:03:07.000Z”,”ownerAvatar”:”avatars.githubusercontent.com/u/133880336?v=4″,”public”:true,”private”:false,”isOrgOwned”:true},”symbolsExpanded”:false,”treeExpanded”:true,”refInfo”:{“name”:”main”,”listCacheKey”:”v0:1684328588.326433″,”canEdit”:false,”refType”:”branch”,”currentOid”:”4079e27791c34880ca1a3a9bba9e2b2fc2885bab”},”path”:”docs/slow5curl.md”,”currentUser”:null,”blob”:{“rawLines”:null,”stylingDirectives”:null,”csv”:null,”csvError”:null,”dependabotInfo”:{“showConfigurationBanner”:false,”configFilePath”:null,”networkDependabotPath”:”/GenTechGp/gtgseq/network/updates”,”dismissConfigurationNoticePath”:”/settings/dismiss-notice/dependabot_configuration_notice”,”configurationNoticeDismissed”:null,”repoAlertsPath”:”/GenTechGp/gtgseq/security/dependabot”,”repoSecurityAndAnalysisPath”:”/GenTechGp/gtgseq/settings/security_analysis”,”repoOwnerIsOrg”:true,”currentUserCanAdminRepo”:false},”displayName”:”slow5curl.md”,”displayUrl”:”github.com/GenTechGp/gtgseq/blob/main/docs/slow5curl.md?raw=true”,”headerInfo”:{“blobSize”:”3.77 KB”,”deleteInfo”:{“deleteTooltip”:”You must be signed in to make or propose changes”},”editInfo”:{“editTooltip”:”You must be signed in to make or propose changes”},”ghDesktopPath”:”desktop.github.com”,”gitLfsPath”:null,”onBranch”:true,”shortPath”:”59fb302″,”siteNavLoginPath”:”/login?return_to=https%3A%2F%2Fgithub.com%2FGenTechGp%2Fgtgseq%2Fblob%2Fmain%2Fdocs%2Fslow5curl.md”,”isCSV”:false,”isRichtext”:true,”toc”:[{“level”:1,”text”:”Fetching subsets with slow5curl and samtools”,”anchor”:”fetching-subsets-with-slow5curl-and-samtools”,”htmlText”:”Fetching subsets with slow5curl and samtools”},{“level”:2,”text”:”Installing necessary tools”,”anchor”:”installing-necessary-tools”,”htmlText”:”Installing necessary tools”},{“level”:2,”text”:”Example: Fetching a subset of reads”,”anchor”:”example-fetching-a-subset-of-reads”,”htmlText”:”Example: Fetching a subset of reads”},{“level”:2,”text”:”Example: Fetching and basecalling a subset of…
The role of APOBEC3B in lung tumor evolution and targeted cancer therapy resistance
Cell line and growth assays Cell lines were grown in Roswell Park Memorial Institute-1640 medium (RPMI-1640) with 1% penicillin–streptomycin (10,000 U ml−1) and 10% FBS or in Iscove’s modified Dulbecco’s medium (IMDM) with 1% penicillin–streptomycin (10,000 U ml−1), l-glutamine (200 mM) and 10% FBS in a humidified incubator with 5% CO2 maintained at 37 °C. Drugs…
Annotation GTF/GFF Arabidopsis thaliana
Annotation GTF/GFF Arabidopsis thaliana 0 Hello, this is my first time working with Arabidopsis and I am quantifying with featureCounts as follows: featureCounts -p –countReadPairs -t exon -g gene_id -a ../genome_arabidopsis/Arabidopsis_thaliana.TAIR10.57.gtf -o SRR14059988.txt ../alignment_hisat2/SRR14059988_sorted.bam However, in my counts I am having counts associated with long non conding, ribosomals, mitochondrial and…
Filling gaps in BAM file
Filling gaps in BAM file 0 Hi! I have BAM files that contain fairly large numbers of gaps due to the fact they are aDNA data. The BAM files have EOFs and look like this (only a snippet shown below): 11:57001065-57004724 SN7001204_0523_AHJLV3BCXX_R_PEdi_L5727_37_1:1:1106:9922:91821c 16 11 57000970 37 113M * 0 0…
How To Separate Illumina Based Strand Specific Rna-Seq Alignments By Strand
Today we have run into the task of having to split strand specific RNA-Seq data by strand and we had to make an effort to get it right (hopefully it is right). Maybe there is even an easier way to do it. The Illumina strand specific protocol is such that…
sequencing data from different samples in the Integrative Genome Viewer (IGV)
sequencing data from different samples in the Integrative Genome Viewer (IGV) 0 Greetings, I need to carry out an activity for my master’s degree in Biostatistics and Bioinformatics that consists of viewing sequencing data from different samples in the Integrative Genome Viewer (IGV) in order to analyze alignments and variants….
Bam files generated with STAR cause a segmentation fault core dump error when used with another tool
I am mapping RNA-Seq data using STAR, using multi-sample two-pass mapping. I first mapped all samples with one-pass then concatenated their SJOut files and filtered junctions. I launched the second mapping by using this SJOut file. I used this command to generate genome : ` /home/STAR-2.7.10b/bin/Linux_x86_64/STAR \ –runThreadN 10 \…
Are 10x cellranger-arc ATAC bam files deduplicated?
Are 10x cellranger-arc ATAC bam files deduplicated? 1 I am working with some atac files from cellranger-arc v.2.0. I was wondering whether the atac_possorted_bam.bam produced as the output was deduplicated? I believe the fragment files that are generated detect duplicate reads (as represented by reads with the fifth column >=…
Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m5C at GCU motifs
Introduction Oxford Nanopore Technologies (ONT) direct RNA sequencing (Fig 1A) enables detection of RNA modifications. A modified base produces an altered electrical current and/or dwell time relative to a canonical base that can be detected with algorithms (Garalde et al, 2018; Smith et al, 2019; Workman et al, 2019). Figure…
MSL2 ensures biallelic gene expression in mammals
Materials Animals All of the mice were kept in the animal facility of the Max Planck Institute of Immunobiology and Epigenetics. The mice were maintained under specific-pathogen-free conditions, with 2 to 5 mice housed in individually ventilated cages (Techniplast). The cages were equipped with bedding material, nesting material, a paper…
Issue softclipping reads when they belong and don’t belong to a common amplicon
I need to soft-clip the primers of my amplicon reads and I have the following problem In scenario 1 I have forward and reverse reads in the same coordinates and I only want to soft-clip the first bases of each read (as shown in the example below) becaouse the reads…
Extracting only soft/hard clipped reads from a bam file
Extracting only soft/hard clipped reads from a bam file 4 Hello all! I am working on some data but need a little bit of help with a bit of an unusual task. We are looking at where lentiviral DNA has inserted itself in our host genome, and to do this…
BBtools bug in reporting the number of substitutions in the console output, it seems to report insanely high rates of heterozygosity
Hello, I know Brian is sometimes around, but here is my command: while read p; do callvariants.sh in=${p}.recal.bam ploidy=2 vcf=${p}.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g ; done <ID java -ea -Xmx50g -Xms50g -cp /home/alessandro/software/bbmap/current/ var2.CallVariants in=ancestor.recal.bam ploidy=2 vcf=ancestor.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g Executing var2.CallVariants [in=ancestor.recal.bam, ploidy=2, vcf=ancestor.20score.vcf, useiden tity=f, overwrite=true, ref=Adineta_vaga.fsa,…
Where is the index command?
Where is the index command? 1 I unpacked Samtools in Ubuntu using apt install make. The directory is listed below and includes folders and files. There is no index function. I have bam file and am trying to make bai file, not sure what to do next? index command •…
filtering SAM/BAM to remove hits spanning short combined alignment lengths and low counts
filtering SAM/BAM to remove hits spanning short combined alignment lengths and low counts 0 hi folks, apologies if this has been answered elsewhere. I’m using read mapping to quantitate the abundance of viral metagenome assembled genomes (MAGs) across samples and I’d like to do a bit of data cleaning that’s…
Extracting chimeric reads from mapping
Hello, I am struggling to processing and analyse bam files (from bwa alignment), to extracting the chimeric read alignment. I am aligning human cell line RNA-seq data (paired end) to virus, aimed to find the viral integration sites in the genome. For that, after reading a bit here from following…
Longitudinal detection of circulating tumor DNA
Analysis of Roche KAPA Target Enrichment kit experimental data obtained on an Illumina sequencing system is most frequently performed using a variety of publicly available, open-source analysis tools. The typical variant calling analysis workflow consists of sequencing read quality assessment, read filtering, mapping against the reference genome, duplicate removal, coverage…
ESRP1 controls biogenesis and function of a large abundant multiexon circRNA | Nucleic Acids Research
Abstract While the majority of circRNAs are formed from infrequent back-splicing of exons from protein coding genes, some can be produced at quite high level and in a regulated manner. We describe the regulation, biogenesis and function of circDOCK1(2–27), a large, abundant circular RNA that is highly regulated during epithelial-mesenchymal…
Viral genes not showing up in combined mouse+virus alignment
Viral genes not showing up in combined mouse+virus alignment 1 I created a combined MHV-A59 and mm10 fasta and GTF file using the linux cat command. The last two entries of the mm10 and first two of the A59 of the combined GTF looks like this: I then made a…
How to get unaligned reads and aligned reads into separate files from SAM/BAM?
How to get unaligned reads and aligned reads into separate files from SAM/BAM? 0 I have long reads aligned with MiniMap2 in the form of SAM file. I want to get my unmapped reads into a file called unmapped.fastq.gz and my aligned reads into a file called mapped.fastq.gz. How can…
Senior Bioinformatics Software Engineer – Land A Remote Job From Top Employers
The Center for Applied Bioinformatics (CAB) at the St. Jude Children’s Research Hospital (SJCRH) is seeking a creative Software Engineer with a strong background in bioinformatics to join our development team to create and maintain our vital analytical infrastructure. The new hire will work closely with a team of computer…
Generate Read counts from bam file
Generate Read counts from bam file 2 Currently i am working on a project related to LHON disease (rare mitochondrial disorder which leads to progressive visual loss). I have 9 RNA-seq fastq files out of which 3 are for carriers, 3 for affected and 3 for control. Data downloaded is…
bam or VCF files from GSE75010
bam or VCF files from GSE75010 1 Hi all I’m planning to run a variant calling analysis using Microarray data GSE75010 that contains GSE75010_RAW.tar and GSE75010_complete_dataset.csv.gz. I used to download the .fastq files using SRA Run numbers through Ubuntu/Linux to get .bam and VCF files. However, this is not the…
low rate of ‘Successfully assigned alignments’
Hello everybody. I’m a newbie in RNA-seq Analysis, and I have this situation that I don’t really understand. While working with featureCounts for RNA-seq read quantification, I came across an intriguing issue. The rate of successfully assigned alignments turned out to be unexpectedly low, totalling just 15463270 (7.6%). This was…
I made an error when using metawrap to binning
I made an error when using metawrap to binning 1 my code metawrap binning -o bin_out -t 24 -m 200 -a all_contig/all_merge.fasta –metabat2 –maxbin2 –concoct all_fastq/*fastq Error reported as follows sorting the SRR10492802 alignment file [bam_sort_core] merging from 24 files and 24 in-memory blocks… [E::sam_hdr_sanitise] Malformed SAM header at line…
Visualize and explore eventalign data against reference
Visualize and explore eventalign data against reference 0 Hi all, Is anyone aware of a tool (GUI or python/R/bash package) to explore eventalign nanopore data (or fast5 raw data) with the corresponding alignment to a reference genome? Kind of like viewing how reads in a bam file align against a…
Running STAR on fastq file generated from a RNA-seq experiment
Running STAR on fastq file generated from a RNA-seq experiment 1 Hi, I am new to bioinformatics, especially on the command line. I am trying to run STAR alignment on pairs of fastq.gz files from several samples generated as part of an RNAseq experiment. My goal is to perform splice…
H101 for cervical cancer | DDDT
Introduction Patients with persistent, recurrent, or metastatic (P/R/M) cervical carcinoma respond poorly to treatment despite the best available therapeutic regimens, with a 5-year survival of 17%.1 Most of them are heavily pretreated with chemotherapy and/or radiotherapy, and many patients experience complications related to treatment or advanced disease, which exclude them…
HTseq reports missing attribute name
HTseq reports missing attribute name 1 Hello, I am running this htseq command htseq-count -r pos -t gene -i gene -s yes -f bam \ /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam \ /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3 > \ /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt However I get this error: Error processing GFF file (line 75 of file /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3): Feature gene:ENSTGUG00000013637 does not contain…
Best practices for unstranded sequences in featureCounts
Hi everyone, I’m using featureCounts to analyze some RNA-Seq data, but I have several doubts in the use with unstranded library. First, when I analyze some SRA sequences or when I don’t know the library type, I use Salmon to know it with the next command: salmon quant -p 32…
fragments.tsv.gz file in ATAC seq
fragments.tsv.gz file in ATAC seq 0 Hi all, I looked at some tutorials ATAC seq. They use fragments.tsv.gz at the beginning of the analysis. For my ATAC seq data, I have fastq, bam and bw files but not fragment file. So the fragments files will be created from fastq files,…
Python Tools for Genomic Data Analysis: From Sequences to Structures | by Bao Tram Duong | Nov, 2023
Analyzing genomic data, from sequences to structures, is a critical aspect of bioinformatics. Python has a rich ecosystem of tools and libraries specifically designed for genomic data analysis. Here’s an overview of key tools and libraries for various stages of genomic data analysis: Description: Biopython is a comprehensive open-source collection…
subset a bam file
subset a bam file 1 I possess numerous sorted BAM files; however, for my project, I am required to randomly select a subset of reads (1e5) from them. I have explored the option of converting a pysam object to a list, but encountered issues with substantial memory usage and slow…
issue in RNA -seq analysis
Forum:issue in RNA -seq analysis 0 hello all. i am working on RNA seq analysis. i would like to know following things: first i downloaded genome fasta file for non-coding rna from ensembl and got the gtf file for hg38 from there itself. performed hist2 and got 17% alignment for…
Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA
Summary Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY,…
Chromatin priming elements direct tissue-specific gene activity before hematopoietic specification
Introduction The development of multicellular organisms requires the activation of different gene batteries which specify the identity of each individual cell type. Such shifts in cellular identity are driven by shifts in the gene regulatory network (GRN) consisting of transcription factors (TFs) binding to the enhancers and promoters of their…
Use of IDR after running MACS3 for ATAC-seq data
Use of IDR after running MACS3 for ATAC-seq data 0 Cross-posted from github (a little worried about the inactivity there; please let me know if this is not good practice) I am analyzing ATAC-seq data from human cells. I am planning to perform the following as part of my pipeline:…
TF Footprinting using HINT ATAC module from RGT
I was successful in running the first three commands of HINT ATAC as shown below following this tutorial github.com/sufyazi/sufyazi.github.io/wiki/TF-Footprinting-Tutorial-using-HINT-ATAC-module-from-RGT-toolbox rgt-hint footprinting –atac-seq –paired-end –organism=mm10 –output-location=/XXX –output-prefix=Afootprints A.mRp.clN.sorted.bam A.mRp.clN_peaks.narrowPeak rgt-hint footprinting –atac-seq –paired-end –organism=mm10 –output-location=/XXX –output-prefix=Afootprints B.mRp.clN.sorted.bam B.mRp.clN_peaks.narrowPeak rgt-motifanalysis matching –organism=mm10 –input-files Afootprints.bed Bfootprints.bed But when I ran the command below…
Phenotypic drug-susceptibility profiles and genetic analysis based on whole-genome sequencing of Mycobacterium avium complex isolates in Thailand
Abstract Mycobacterium avium complex (MAC) infections are a significant clinical challenge. Determining drug-susceptibility profiles and the genetic basis of drug resistance is crucial for guiding effective treatment strategies. This study aimed to determine the drug-susceptibility profiles of MAC clinical isolates and to investigate the genetic basis conferring drug resistance using…
Find all locations of aligned reads with MQ (Mapping Quality) = 0.
Find all locations of aligned reads with MQ (Mapping Quality) = 0. 1 I have white reads in BAM file with MQ = 0. I know that it means that reads align to multiple locations. I know location of one align, but can i find another location/locations in IGV? If…
did pilon improve my genome?
did pilon improve my genome? 0 my doubt is if my sequence has actually improved? refseq is the reference sequence. polished is the output from pilon and contigs.fasta is my file generated from spades. pls help I used the command Java -Xmx2048m -jar pilon-1.24.jar –genome refseq.fasta –frags sorted bam.bam –output…
Salmon (or other pseudo-mappers) for multi-species RNAseq read filtering
Hello all, Background: I’ve inherited a new RNAseq data set and am thinking about updating my approaches (last time I did this I was using HISAT and Cuffdiff). I’d like some opinions on best strategies to disentangle/filter out parasite microbe reads from infected host reads before preforming a differential gene…
Problem aligning target capture sequencing of a few hundred regions to the human reference genome
Hello, We have been trying to develop a target capture sequencing of around one hundred regions in the human genome. The probes capture a region of approximately 500bp around some genetic variants of interest. Our bioinformatics pipeline uses bwa-mem for aligning the captured reads to a custom genome reference. These…
A single pseudouridine on rRNA regulates ribosome structure and function in the mammalian parasite Trypanosoma brucei
Cell growth and transfections Procyclic form (PCF) T. brucei, strain 29-1354, which carries integrated genes for the T7 polymerase and the tetracycline repressor, was grown in SDM-79 medium supplemented with 10% fetal calf serum, in the presence of 50 μg/ml hygromycin. Cells were grown in the presence of 15 μg/ml G418 for…
Filter out ALT contigs from CRAM
Filter out ALT contigs from CRAM 1 Dear community members, I got a CRAM aligned to a very customised reference with weird (not even “canonical” alt) contigs. They are not covered except several accidental reads and I can safely filter them out. Is there a way to do it for…
Intrinsic deletion at 10q23.31, including the PTEN gene locus, is aggravated upon CRISPR-Cas9-mediated genome engineering in HAP1 cells mimicking cancer profiles
Introduction The CRISPR-Cas system is a widely used genome engineering technology because of its simple programmability, versatile scalability, and targeting efficiency (Wang & Doudna, 2023). Although researchers are rapidly developing CRISPR-Cas9 tools, the biggest challenge remains to overcome undesired on- and off-targeting outcomes. Previous studies have reported unintended genomic alterations,…
The Biostar Herald for Monday, November 20, 2023
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…
STAR alignment speed
STAR alignment speed 1 Hello, I am trying to align RNA sequencing data from the NCBI SRA database to the Apis mellifera genome with STAR. The alignment worked fine. However, the mapping step of the alignment seems to be a bit slow. Furthermore, increasing the number of available threads does…
LncRNA INHEG promotes glioma stem cell maintenance and tumorigenicity through regulating rRNA 2’-O-methylation
Ethics statement All mice procedures in this study were performed under an animal protocol approved by the Institutional Animal Care and Use Committee guidelines of Westlake University. The procedures and protocols for glioma patients were approved by the institutional review board of Beijing Tiantan Hospital. Informed consent was obtained from…
BaseRecalibrator takes forever to run. Any suggestions?
BaseRecalibrator takes forever to run. Any suggestions? 1 Hello, I am trying to run BaseRecalibrator tool from GATK package and it takes forever (more than 4 days per one bam file). The command I’m using is: gatk BaseRecalibrator -I NG-01_1_S1_dedup_bwa.bam -R /rumi/shams/genomes/hg38/hg38.fa –known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz –known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz –known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table…
Qualimap bamqc v2.2.2 Cannot invoke “org.bioinfo.ngs.qc.qualimap.beans.XYVector.getXVector()” because “this.data” is null
That solves the problem! It is a panel data so I thought it would be expected not to have regions outside the given intervals. Maybe just for completeness, what are the implications of not supplying a –feature-file ? Otherwise, the problem is resolved: “` QualiMap v.2.2.2-devBuilt on 2019-11-11 14:05 Selected…