Tag: BAM

Possible bugs in Rsubread/stad-alone featureCounts options fracOverlap and largestOverlap with fractional counts

Hi, running Rsubread 2.8.2/2.12.0 or featureCounts 2.0.3/2.0.1, I stumbled over two issues when allowing ambiguous read assignment (-O/allowMultiOverlap) 1) regarding assignment via minimum fractional overlap (–fracOverlap) using featureCounts stand-alone binary. 2) when combined with –/largestOverlap and –/fraction using Rsubread featureCounts function or the stand-alone binary. to 1) Assume a read…

Continue Reading Possible bugs in Rsubread/stad-alone featureCounts options fracOverlap and largestOverlap with fractional counts

Datasets | TogoVar

Variant frequencies for which you can apply for use of individual-level data∗1 to the NBDC human databases∗2 Click the links at the Included controlled-access datasets to apply for use of individual-level data ∗1:fastq/bam/cel files and/or lists of genotype data etc.∗2:Japanese Genotype-phenotype Archive (JGA) / AMED Genome group sharing Database (AGD)…

Continue Reading Datasets | TogoVar

Scatter Gather principle by chromosome on Gatk

Scatter Gather principle by chromosome on Gatk 0 Hi all, On a quest to optimize gatk pipeline, I met scatter gather principle, so I did following, pids= for chr in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20…

Continue Reading Scatter Gather principle by chromosome on Gatk

Bioinformatics Analyst II (Remote) Position In North Chicago, IL

Job Description To discuss more about this job opportunity, please reach out to Chitrank Rastogi (LinkedIn URL – www.linkedin.com/in/chitrank-rastogi-55119a102/), email your updated resume at chitrank.rastogi@collabera.com or give me a call at (425) 523-1648. Thank you! Job Description:Job Roles & Responsibilities: We have an exciting contract opportunity for a Bioinformatics analyst…

Continue Reading Bioinformatics Analyst II (Remote) Position In North Chicago, IL

How can I visualize the output of BAM metrics files generated by samtools stats?

How can I visualize the output of BAM metrics files generated by samtools stats? 1 I have a few dozen BAM files and I used samtools stats to generate .txt files containing the outputted BAM metrics for each of my BAMs. I would now like to visualize this output to…

Continue Reading How can I visualize the output of BAM metrics files generated by samtools stats?

normalization of ChIP-seq data by using the spike-ins or by using total library sizes

Dear all, This question may have been asked before, I have searched the mailing list and I can not find an answer. The question is about the correct way of setting the SizeFactors() in DESeq2 in 3 situations. I would like to double check with you. Although the R code…

Continue Reading normalization of ChIP-seq data by using the spike-ins or by using total library sizes

Rsubread featureCounts outputs dozens of temp files, no counts

Rsubread featureCounts outputs dozens of temp files, no counts 1 @83165de1 Last seen 16 hours ago United States Hello, I am having trouble getting an output file in Rsubreads using featureCounts. I want to set up my data to run analysis of differential expresssion in EdgeR. I’m running about 40…

Continue Reading Rsubread featureCounts outputs dozens of temp files, no counts

smallRNA profiling using HTSeq error

smallRNA profiling using HTSeq error 1 Hello, I want to create a “count” file using HTseq. I have both BAM file and gtf file: htseq-count -f bam -s no -i AK1a_clean_Aligned.sortedByCoord.out.bam gencode.v42.chr_patch_hapl_scaff.annotation.gtf >> AK1a_counts.txt It still gives an error: htseq-count: error: the following arguments are required: featuresfilename Can someone please…

Continue Reading smallRNA profiling using HTSeq error

ANGSD | FSU Research Computing Center

Introduction ANGSD is a software for analyzing next generation sequencing data intended for use with mapped reads to imputed genotype probabilities. ANGSD can work with BAM files but is not meant for manipulating them. SAMTools is best for that. ANGSD is ideal for use with low to medium depth genomic…

Continue Reading ANGSD | FSU Research Computing Center

Samtools Convert Sam To Bam With Code Examples

Samtools Convert Sam To Bam With Code Examples In this session, we’ll try our hand at solving the Samtools Convert Sam To Bam puzzle by using the computer language. The code that follows serves to illustrate this point. # Basic syntax: samtools view -S -b sam_file.sam > bam_file.bam # Where:…

Continue Reading Samtools Convert Sam To Bam With Code Examples

Phenotypic plasticity and genetic control in colorectal cancer evolution

Sample preparation and sequencing The method of sample collection and processing is described in a companion article (ref. 23). Sequencing and basic bioinformatic processing of DNA-, RNA- and ATAC-seq data are included there as well. Gene expression normalization and filtering The number of non-ribosomal protein-coding genes on the 23 canonical chromosome pairs…

Continue Reading Phenotypic plasticity and genetic control in colorectal cancer evolution

Converting Bam file to Fasta (Zipped)

Converting Bam file to Fasta (Zipped) 0 I would like to convert .bam files to fq.gz (zipped fasta files) for paired reads. bedtools bamtofastq seems to be a commonly recommended method, I have also seen samtools fastq as a possible alternative. bedtools bamtofastq -i inputfile.bam -fq outputR1.fq -fq2 outputR2.fq samtools…

Continue Reading Converting Bam file to Fasta (Zipped)

Detecting de novo SNV with vcftools

Detecting de novo SNV with vcftools 1 Hi, all. I have a raw whole genome sequence data of a kind of fish trio: father, mother and offspring. I would like to know how many SNV loci there are in the child but not in the parent (i.e. de novo SNV…

Continue Reading Detecting de novo SNV with vcftools

Bedtools Bam To Bed With Code Examples

Bedtools Bam To Bed With Code Examples With this article, we’ll look at some examples of how to address the Bedtools Bam To Bed problem . bedtools bamtobed [OPTIONS] -i <BAM> As we have seen, a large number of examples were utilised in order to solve the Bedtools Bam To…

Continue Reading Bedtools Bam To Bed With Code Examples

How do I get separate ADT / CITE-seq fastq’s from single SRA / BAM files? (originally generated from cellranger)

How do I get separate ADT / CITE-seq fastq’s from single SRA / BAM files? (originally generated from cellranger) 0 Hello all. I am trying to pre-process some single cell RNA and ADT (Totalseq-C) data from an GEO SRA, but having some issues getting separate fastq’s for the “CITE-seq” (ADT)…

Continue Reading How do I get separate ADT / CITE-seq fastq’s from single SRA / BAM files? (originally generated from cellranger)

Job – Principal Biostistician/Bioinformatics job at Kenya Medical Research

Vacancy title: Principal Biostistician/Bioinformatics [ Type: FULL TIME , Industry: Research , Category: Research ] Jobs at: Kenya Medical Research – KEMRI Deadline of this Job: 06 October 2022   Duty Station: Within Kenya , Kisumu , East Africa SummaryDate Posted: Tuesday, September 20, 2022 , Base Salary: Not Disclosed…

Continue Reading Job – Principal Biostistician/Bioinformatics job at Kenya Medical Research

Comment: How to resolve a ValueError: Multiple 'HD' Lines are not permitted when I run Ci

I tried your suggestion **samtools view -H qname_unknown_circle.bam** and the output result is like this: yu@root:~$ samtools view -H qname_unknown_circle.bam @HD VN:1.5 SO:queryname @SQ SN:chr1 LN:248956422 @SQ SN:chr10 LN:133797422 @SQ SN:chr11 LN:135086622 ………(Many lines like this ‘@SQ SN:chrxx LN:xxxx’ are omitted) @SQ SN:chrY_KI270740v1_random LN:37240 @HD VN:1.5 SO:unsorted GO:query @PG ID:bwa…

Continue Reading Comment: How to resolve a ValueError: Multiple 'HD' Lines are not permitted when I run Ci

CNV Pipeline Options

The following are the top-level options that are shared with the DRAGEN Host Software to control the CNV pipeline. You can input a BAM or CRAM file into the CNV pipeline. If you are using the DRAGEN mapper and aligner, you can use FASTQ files. …

Continue Reading CNV Pipeline Options

Bioinformatics Scientist in Pittsburgh, PA

Description Purpose:The scientist works independently using a robust math toolbox to discover solutions for a diverse portfolio of interesting and challenging problems. The scientist develops, implements, and monitors advanced analytic, medical informatics, and predictive modeling tools for health care programs at the UPMC. The scientist normally works Monday through Friday…

Continue Reading Bioinformatics Scientist in Pittsburgh, PA

Filteration of uniquely mapped reads

Filteration of uniquely mapped reads 2 Hello I have BAM-full file with reads mapped to “human and mouse” chromosome file. Now I would like to extract reads mapped only to “mouse” (means not mapped to human chromosome”. This is the protocol I am using : From BAM-full, extract reads mapped…

Continue Reading Filteration of uniquely mapped reads

Understanding bam tracks

Understanding bam tracks 0 Sorry i am having trouble understanding this concept. For example, when I view a bam file after alignment in igv, I see that there are different tracks that form. How are these tracks formed/why do some aligned sequences belong together or are part of the same…

Continue Reading Understanding bam tracks

A7993 – YFull YTree Info

R-A7993 – YFull YTree Info SNPs currently defining R-A7993 A7993     Sample ID Country / Language Info Ref File Testing company Statistics Status YF063745 —— R-A7993 R-A7993*, R-FGC59783* Hg38 .BAM FTDNA (Y700) 30X, 18.6 Mbp, 151 bp YF015291 Germany (Rheinland-Pfalz) R-A7993 R-A7993*, R-FGC59783* Hg38 .BAM FTDNA (Y500) 28X, 12.1 Mbp,…

Continue Reading A7993 – YFull YTree Info

BLAST unmapped reads from BAM

BLAST unmapped reads from BAM 0 Hey, I have sequenced a recombinant plasmid and I am trying to characterize the insert sequences that have not aligned to the reference. I have generated a BAM file with the unmapped sequences and now I am trying to BLAST these in various databases…

Continue Reading BLAST unmapped reads from BAM

Live-seq enables temporal transcriptomic recording of single cells

Biological materials RAW264.7, 293T and HeLa cells were obtained from ATCC. RAW264.7 cells with Tnf-mCherry reporter and relA-GFP fusion protein (RAW-G9 clone) were kindly provided by I.D.C. Fraser (National Institutes of Health). The IBA cell line derived from the stromal vascular fraction of interscapular brown adipose tissue of young male…

Continue Reading Live-seq enables temporal transcriptomic recording of single cells

using gatk haplotypecaller for variants extraction

using gatk haplotypecaller for variants extraction 0 Hi, I have rna-sequenced data from covid patients. I am using hisat2 for aligning the reads to reference. So, the resulted bam files after indexing are now ready. I would like to use gatk happlotypecaller for extracting variants from my bam files. First,…

Continue Reading using gatk haplotypecaller for variants extraction

mapping – STAR error in snakemake pipeline: “EXITING because of FATAL ERROR: could not open genome file”

I’m trying to use a 2 pass STAR mapping strategy (also explained here informatics.fas.harvard.edu/rsem-example-on-odyssey.html), but I’m getting an error. I’ve read through this page [https://github.com/alexdobin/STAR/issues/181] and I have a similar issue, but the discussed solutions don’t seem to help. Perhaps this is more a snakemake issue rather than a STAR…

Continue Reading mapping – STAR error in snakemake pipeline: “EXITING because of FATAL ERROR: could not open genome file”

Bootcamp02_04_FileFormats.pptx – Bioinformatics File Formats WV-INBRE Bioinformatics Bootcamp 2022 Marshall University Joan C. Edwards School of

Bioinformatics file formats•There are many file formats defined by bioinformaticians•fasta files: define and name sequences•can be DNA, RNA, or protein sequences•fastq files: sequences, typically generated by a sequencer, with “quality”information associated with each sequence•sam/bam files: “sequence alignment mapping” define the results of aligning aset of sequences to another set of…

Continue Reading Bootcamp02_04_FileFormats.pptx – Bioinformatics File Formats WV-INBRE Bioinformatics Bootcamp 2022 Marshall University Joan C. Edwards School of

Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

Sampling the radiation To understand the phylogenetic relationships between Alpine whitefish, we carried out whole-genome resequencing on 96 previously collected whitefish (with associated phenotypic measurements including standard length and gill-raker counts; collected in accordance with permits issued by the cantons of Zurich (ZH128/15), Bern (BE68/15), and Lucerne (LU04/14); these fish…

Continue Reading Genomic architecture of adaptive radiation and hybridization in Alpine whitefish

How To Install libhts-dev on Kali Linux

In this tutorial we learn how to install libhts-dev on Kali Linux. libhts-dev is development files for the HTSlib Introduction In this tutorial we learn how to install libhts-dev on Kali Linux. What is libhts-dev HTSlib is an implementation of a unified C library for accessing common file formats, such…

Continue Reading How To Install libhts-dev on Kali Linux

Freebayes-parallel with large bam file – individual threads running for >6 days

Context: I’m trying to call variants on a sequencing project using pooled genotyping-by-sequencing. Pools consist of 94 samples each, alongside a number of individuals. Sequence data was demultiplexed and then aligned to a reference genome using hisat2, and the resultant bams were merged with samtools merge. The problem bam is…

Continue Reading Freebayes-parallel with large bam file – individual threads running for >6 days

Samtools Htslib Issues

Issue Title State Comments Created Date Updated Date How to get a specific chromosome open 1 2022-07-14 2022-07-18 tabix returns row from VCF file multiple times open 4 2022-07-11 2022-07-18 Modified base parsing failure failure closed 0 2022-07-01 2022-07-18 extract genotype information open 1 2022-06-24 2022-07-18 sam_hdr_remove_lines is inefficient if…

Continue Reading Samtools Htslib Issues

Senior Scientist Applied Bioinformatics Job In San Francisco, CA 94103| TechCareers

At Bristol Myers Squibb, we are inspired by a single vision – transforming patients’ lives through science. In oncology, hematology, immunology and cardiovascular disease – and one of the most diverse and promising pipelines in the industry – each of our passionate colleagues contribute to innovations that drive meaningful change….

Continue Reading Senior Scientist Applied Bioinformatics Job In San Francisco, CA 94103| TechCareers

BWA alignment/Samtools; Fail to read the header

BWA alignment/Samtools; Fail to read the header 0 Hello, I have an issue with my alignment. This is an error in my log file: fail to read the header from “-“. Here is my script: bwa mem -t 8 -R “@RG\tID:$2\tSM:$3” ~/scratch/pt6/pt6.fa ${1}_1.fastq.gz ${1}_2.fastq.gz 2>log.bwa_new.$1 |samtools view -S -h -b…

Continue Reading BWA alignment/Samtools; Fail to read the header

PeerJ expertRxiv – Postdoctoral Associate in Bioinformatics

Job description Location: Boca Raton, Florida Job Description: The College of Medicine of Florida Atlantic University, the 5th public university in Florida, is seeking a Bioinformatics Postdoctoral Associate with experience in bioinformatics pipeline development and genomics data analysis for a Bioinformatics and Computational Genomics laboratory which focuses on high-throughput…

Continue Reading PeerJ expertRxiv – Postdoctoral Associate in Bioinformatics

Hisat2 – stringtie – deseq2 pipeline for bulk RNA seq

Software official website : Hisat2: Manual | HISAT2 StringTie:StringTie article :Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown | Nature Protocols It is recommended to watch the nanny level tutorial : 1. RNA-seq : Hisat2+Stringtie+DESeq2 – Hengnuo Xinzhi 2. RNA-seq use hisat2、stringtie、DESeq2 analysis – Simple books Basic usage…

Continue Reading Hisat2 – stringtie – deseq2 pipeline for bulk RNA seq

Differing number of reads for read 1 and 2 in fastq’s from a subset bam file

I’m working with paired-end wgs data I downloaded from TCGA. I’m trying to extract the reads that align to a specific region, extract only those reads to two fastq files, one for each pair. Unfortunately, I am getting a different number of reads in both fastq files because some of…

Continue Reading Differing number of reads for read 1 and 2 in fastq’s from a subset bam file

Rsubread featurecounts

Rsubread featurecounts 1 Hi there, I seem to be getting this error when reading in a BAM file which was generated by PBMM2 align on pacbio data. I have tried to google the error message but there are no results. I wonder if anyone has ideas on what the error…

Continue Reading Rsubread featurecounts

Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

Provided by: liballelecount-perl_4.2.1-1_all NAME alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each specified locus. SYNOPSIS Where possible use the C version for large data (it’s also more configurable). alleleCounts.pl Required: -bam -b BAM/CRAM file (expects co-located index) – if CRAM see ‘-ref’ -output -o Output…

Continue Reading Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

Getting the best of RNA-Seq

Forum:Getting the best of RNA-Seq 0 This is not a banal discussion. I am facing some problems with the analysis of DE genes in mouse. Most methods of analysis of DE genes must face two considerations or challenges. The first needs to take into consideration the existence and the different…

Continue Reading Getting the best of RNA-Seq

Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

Provided by: biobambam2_2.0.179+ds-1_amd64 NAME bamfillquery – fill query sequences into BAM files SYNOPSIS bamfillquery [options] <in.bam queries.fasta >out.bam DESCRIPTION bamfillquery reads a SAM/BAM/CRAM file and a FastA file, copies the sequences found in the FastA file into the query sequence field of the SAM/BAM/CRAM file and writes the resulting data…

Continue Reading Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

Ubuntu Manpage: samtools targetcut – cut fosmid regions (for fosmid pool only)

Provided by: samtools_1.13-2_amd64 NAME samtools targetcut – cut fosmid regions (for fosmid pool only) SYNOPSIS samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] in.bam DESCRIPTION This command identifies target regions by examining the continuity of read depth, computes haploid consensus sequences of targets and…

Continue Reading Ubuntu Manpage: samtools targetcut – cut fosmid regions (for fosmid pool only)

Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?

Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname? 0 When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other, when sorty by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield,…

Continue Reading Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?

YP5260 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status I7021 Mongolia (Bulgan) C-F15910 C-F15910*, C-Y507 Hg19 .BAM Ancient 3X, 20.2 Mbp, 40 bp NEO249 Russia (Chukotskiy avtonomnyy okrug) C-F15910* —— Hg19 .BAM Ancient 1X, 7.2 Mbp, 81 bp I11696 Mongolia (Bulgan) C-Y507 —— Hg19 .BAM Ancient 2X,…

Continue Reading YP5260 – YFull YTree Info

BAM – Job openings – Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable

Section S.3 – eScience To strengthen our team in the division “eScience” in Berlin-Steglitz, starting as soon as possible, we are looking for a Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable Salary…

Continue Reading BAM – Job openings – Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable

BY3 – YFull YTree Info

J-BY3 – YFull YTree Info SNPs currently defining J-BY3 BY3 / FGC15184     Sample ID Country / Language Info Ref File Testing company Statistics Status YF016315 —— J-FGC15174 J-FGC15174*, J-FGC15168*, J-FT258574 Hg38 .BAM FTDNA (Y500) 23X, 12.0 Mbp, 151 bp YF068400 Sudan (Janūb Kurdufān) J-FGC38453* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading BY3 – YFull YTree Info

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

Subjects Normal breast and tumor samples were obtained with the written informed consent from donors and appropriate approval from local ethical committees, with the detailed information described in the respective original publications: normal tissue9, METABRIC14, TCGA35. Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast…

Continue Reading Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

can gff2 reference used in htseq-count?

Dear all We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples. The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly. However, we were unable to obtain a gff3 version of our target plasmid genome, the…

Continue Reading can gff2 reference used in htseq-count?

Extract R1 and R2 from sam file generated by bowtie2

Extract R1 and R2 from sam file generated by bowtie2 1 Hi every one How to extract R1 and R2 from sam file generated by bowtie2 ? sam bowtie2 samtools bam • 137 views • link updated 14 hours ago by iraun &starf; 4.4k • written 15 hours ago by…

Continue Reading Extract R1 and R2 from sam file generated by bowtie2

YP3952 – YFull YTree Info

Q-YP3952 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF073154 Russia (Chechenskaya Respublika) / Chechen Q-YP3952* —— Hg38 .BAM FTDNA (Y700) 33X, 18.2 Mbp, 151 bp YF092378 Russia (Chechenskaya Respublika) / Chechen Q-BZ87 —— Hg38 .BAM FTDNA (Y700) 55X, 18.5 Mbp, 151…

Continue Reading YP3952 – YFull YTree Info

linux merge multiple files in picard

Why not use samtools? for folder in my_bam_folders/*; do samtools merge $folder.bam $folder/*.bam done In general, samtools merge can merge all the bam files in a given directory like this: samtools merge merged.bam *.bam EDIT: If samtools isn’t an option and you have to use Picard, what about something like…

Continue Reading linux merge multiple files in picard

a strange pattern of repetitive summits

Problem with the output of Deeptools PlotProfile: a strange pattern of repetitive summits 0 Hi! I am trying to plot DNA binding profiles of my ChIP-seq bw files using Deeptools plotProfile. I generated the matrix using the computeMatrix reference-point. I used some publicly available bed files as my regions of…

Continue Reading a strange pattern of repetitive summits

GeneActivity without Fragments file in Seurat for Integrating scRNA-seq and scATAC-seq

Hi all, I am new to R and Seurat, and I am following Seurat tutorials to find anchors between RNA-seq and ATAC-seq data according to: Combining the two tutorials is difficult for a cell line data set I am using for SNARE-seq Human here. I managed to run the following…

Continue Reading GeneActivity without Fragments file in Seurat for Integrating scRNA-seq and scATAC-seq

Read counts an order of magnitude higher on one chromosome

Read counts an order of magnitude higher on one chromosome 3 Hi, I am having an issue with a sequencing run that when demultiplexed, aligned, and filtered each individual has 1-2 million reads, but these reads are predominantly on one chromosome. For background these are oncorhynchus mykiss and o. clarki…

Continue Reading Read counts an order of magnitude higher on one chromosome

Detailed differences between sambamba and samtools

3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…

Continue Reading Detailed differences between sambamba and samtools

BAM file and no RNAME or POS information? : bioinformatics

Newbie here. Please, play nice. I got possession of a set of 4 .bam files that stores the exome of an individual, around 400 MB each. I used samtools to generate a 2.4 GB .sam file out of one of the .bam files, and I found it contains lines with…

Continue Reading BAM file and no RNAME or POS information? : bioinformatics

How to regress out age and sex using limma removeBatchEffect

How to regress out age and sex using limma removeBatchEffect 1 I have a protein expression data frame with a metadata data frame which includes age and sex: nph_csf_metadata = age sex bam tau 70 f 5 2 75 m 6 1 72 m 4 1 71 f 4 2…

Continue Reading How to regress out age and sex using limma removeBatchEffect

Using featureCounts and downloading Rsubread

Using featureCounts and downloading Rsubread 1 @4769e097 Last seen 23 hours ago United Kingdom I am trying to perform a count per gene analysis using featureCounts in R. I have downloaded the gtf file and edited it within R to only contain the gene ID, chr, start, end, and strand,…

Continue Reading Using featureCounts and downloading Rsubread

Parse a file of strings in python separated by newline into a json array

I don’t see where you’re actually reading from the file in the first place. You have to actually read your path_text.txt before you can format it correctly right? with open(‘path_text.txt’,’r’,encoding=’utf-8′) as myfile: content = myfiel.read().splitlines() Which will give you [‘/gp/oi/eu/gatk/inputs/NA12878_24RG_med.hg38.bam’, ‘/gp/oi/eu/gatk/inputs/NA12878_24RG_small.hg38.bam’] in content. Now if you want to write this…

Continue Reading Parse a file of strings in python separated by newline into a json array

Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts?

Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts? 1 Hi, friends. I only want to perform differential expression analysis on the annotated transcripts of my existing reference genome. I use tophat2 for alignment with –no-novel-juncs…

Continue Reading Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts?

Z697 – YFull YTree Info

R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697     Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…

Continue Reading Z697 – YFull YTree Info

Y140591 – YFull YTree Info

R-Y140591 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF067865 Germany R-Y140591* —— Hg38 .BAM FTDNA (Y700) 52X, 18.7 Mbp, 151 bp YF076495 Germany R-FT167842 —— Hg38 .BAM FTDNA (Y700) 49X, 18.3 Mbp, 151 bp YF067633 Germany R-FT167842 —— Hg38 .BAM FTDNA…

Continue Reading Y140591 – YFull YTree Info

Annotated file with gene ID (instead of gene symbol)

Annotated file with gene ID (instead of gene symbol) 0 @9cb59de3 Last seen 14 hours ago United States Hello, I am using “featureCounts” in Rsubread package for analyzing bulk RNA-seq of drosophila. Since there is no inbuilt annotations of drosophila, I am using a gtf file in the homepage of…

Continue Reading Annotated file with gene ID (instead of gene symbol)

sequencing – Interpreting ‘samtools mpileup’ output for multiple inputs

I would like to calculate sequencing coverage for a WGS project. Both long and short reads. I’ve used samtools as following: samtools mpileup -Q 1 -aa illumina_sorted.bam nanopore_sorted.bam > depth.txt Previously, when I used samtools depth instead, I only had the columns I was interested in (chromosome name / base…

Continue Reading sequencing – Interpreting ‘samtools mpileup’ output for multiple inputs

CTS1346 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status HGDP01351 China, People’s Republic of O-F3607* —— Hg38 .BAM Scientific 16X, 23.6 Mbp, 151 bp YF079316 —— O-Y224790 —— Hg19 .BAM 23mofang 58X, 21.3 Mbp, 150 bp HG00583 China, People’s Republic of O-Y224790 —— Hg19 .BAM Scientific ——…

Continue Reading CTS1346 – YFull YTree Info

Split merged Bam file without replacement

Split merged Bam file without replacement 0 Hi guys, I have 5 bam (ChIPseq PE data sorted by position) files that came from 5 different murine cortexes (mice that belong to the same group, so biological replicates), however I have a lot of group variability. I’m thinking to merge all…

Continue Reading Split merged Bam file without replacement

snp – Reference variant detected as altered one in bam file

I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…

Continue Reading snp – Reference variant detected as altered one in bam file

A114 – YFull YTree Info

R-A114 – YFull YTree Info SNPs currently defining R-A114 FGC78244     A114(H)     H Sample ID Country / Language Info Ref File Testing company Statistics Status YF067576 France (Ille-et-Vilaine) R-A114* —— Hg19 .BAM Dante Labs 12X, 23.0 Mbp, 151 bp YF088360 United States (Virginia) R-CTS4466* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading A114 – YFull YTree Info

HTseq-Count: Long processing time

HTseq-Count: Long processing time 1 Hi everyone, I’m processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks. htseq-count –max-reads-in-buffer=24000000000…

Continue Reading HTseq-Count: Long processing time

The low successful assignment ratio of FeatureCounts

Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…

Continue Reading The low successful assignment ratio of FeatureCounts

long run-time and low CPU usage

Pindel: long run-time and low CPU usage 0 I’m trying to run Pindel on some 30x Illumina WGS data. I aligned reads with BWA-MEM, then sorted by co-ordinates and indexed them with Samtools. I also tried filtering the bam files with samtools -F 0x800 as suggested by another post. I…

Continue Reading long run-time and low CPU usage

different result using minimap2 and pbmm2

Hi all! I am analysing CSS Pacbio data and each sample came from different run, in particular I have three files for each sample. I tested both pbmm2 and minimap2 to align my long reads, after getting the consensus sequences. This is the command I used to run mnimap2: minimap2…

Continue Reading different result using minimap2 and pbmm2

pjotrp/sambamba – sambamba – Genenetwork

10 years ago ​ 10 years ago ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 10 years ago ​ ​ ​ ​ ​ ​ 10 years ago 10 years ago 10 years ago ​ ​ 10 years ago ​ 10 years…

Continue Reading pjotrp/sambamba – sambamba – Genenetwork

Filtering bam file based on depth determined through samtools depth

Filtering bam file based on depth determined through samtools depth 1 Hi All, I have a bam file and I calculated read depth using samtools depth and I now want to filter the bam file to have only the contigs that have a depth between a certain value. I was…

Continue Reading Filtering bam file based on depth determined through samtools depth

Use RSEM and Bowtie2 to align paired-end sequences

Use RSEM and Bowtie2 to align paired-end sequences 0 I want to use rsem-calculate-expression and bowtie2 aligner to align paired-end sequence based on the following conditions: 2 processors generate BAM file very fast bowtie2 sensitivity append gene/transcript name My code: rsem-refseq-extract-primary-assembly GCF_000001405.31_GRCh38.p5_genomic.fna GCF_000001405.31_GRCh38.p5_genomic.primary_assembly.fna rsem-prepare-reference –gff3 GCF_000001405.31_GRCh38.p5_genomic.gff –bowtie2 –bowtie2-path /bowtie2-2.4.5-py39hd2f7db1_2 –trusted-sources…

Continue Reading Use RSEM and Bowtie2 to align paired-end sequences

Mapping back 3 sets of reads/sample with minimap2

I used FaQC to qc my raw fastqs before assembling. That program (and perhaps others) outputs properly paired Forward and Reverse fastqs, as well as an unpaired fastq file for each sample. I used the all 3 for each single sample assembly. Since minimap2 only allows for 2 query files,…

Continue Reading Mapping back 3 sets of reads/sample with minimap2

F13864 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS5240131 Singapore C-F13864* —— Hg19 .BAM Scientific 7X, 22.9 Mbp, 150 bp YF076683 China, People’s Republic of (Shandong) C-F13864* —— Hg19 .BAM 23mofang 57X, 21.2 Mbp, 150 bp YF071813 —— C-F13864* —— Hg19 .BAM 23mofang 21X, 21.8 Mbp,…

Continue Reading F13864 – YFull YTree Info

L1193 – YFull YTree Info

I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193     FGC87558     Y72031     Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…

Continue Reading L1193 – YFull YTree Info

3 -tag XM” failed! when running rsem-calculate-expression

Dear sir, When I ran “rsem-calculate-expression –paired-end –alignments -p 8input.bam” gencodev22 ./out. I got error message rsem-parse-alignments ../bowtie2/hg38 ./rsem-out.temp/rsem-out ./rsem-out.stat/rsem-out /NGS_Storage/Debbie/RNA-seq/variant_calling_20210602/RNA-leukemia002A-906.para.bam 3 -tag XM Read A00355:209:H3KTLDSX2:2:2606:24677:17425: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should…

Continue Reading 3 -tag XM” failed! when running rsem-calculate-expression

How to edit a SAM file using pysam

How to edit a SAM file using pysam 0 Dear all – I have a template sam file and I want to change one of the columns (template_length) and replace it with a new value. The new value is a quick mathematical operation. template sam file: @HD VN:1.0 SO:unsorted @SQ…

Continue Reading How to edit a SAM file using pysam

Y18411 – YFull YTree Info

J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…

Continue Reading Y18411 – YFull YTree Info

htseq-count error

htseq-count error 1 Hi, htseq-count -f bam -s yes ~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam ~refgen/gencode.v39.primary_assembly.annotation.gtf > counts.txt I am trying to run htseq-count with command above but in the err file [E::idx_find_and_load] Could not retrieve index file for ‘~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam’ 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines…

Continue Reading htseq-count error

BioInformatics Product Manager at Helix (remote)

You + Helix Helix is a place where innovators and doers gather in order to drive significant progress in population genomics. We have come together to work at the intersection of clinical care, research, and genomics.   If you’re excited by the idea of making a meaningful impact and joining a…

Continue Reading BioInformatics Product Manager at Helix (remote)

rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

I am working with non-model plant RNA samples which we have been deep sequenced and analysed using STAR aligner under default parameters. Aim We would like to conduct SNP discovery of these samples. Objective Our ultimate goal with this genotypic data is to search for variants (both SNPs and indels)…

Continue Reading rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

human genome files

human genome files 0 Hi all, Just wonder to know about these two questions? what is the main difference between the two genome files (Homo_sapiens.GRCh38.dna.primary_assembly.fa and Homo_sapiens.GRCh38.dna.fa) located in the ensemble database? which one should I use for whole-exome sequence alignment? I used Homo_sapiens.GRCh38.dna.fa for the alignment, and later on,…

Continue Reading human genome files

Why did I achieve shorter than initial reads subset after aligned reads extraction.

Why did I achieve shorter than initial reads subset after aligned reads extraction. 1 Hello dear colleages! I have recently faced some problem. I have worked with long WGS reads. Firstly I have filtered the longest subset of reads, and aligned them to the custom sequence with several structural variants…

Continue Reading Why did I achieve shorter than initial reads subset after aligned reads extraction.

Low transcript quantification with Salmon using GRCm39 annotations

Hi everyone, first time working with mouse samples and unfortunately, there are fewer resources available for the latest mouse Ensembl genome than I was expecting. What I’ve done: I performed rRNA depletion on total RNA extracted from mouse tissue and created Illumina libraries using a cDNA synthesis kit with random…

Continue Reading Low transcript quantification with Salmon using GRCm39 annotations

How can I find genes located in the same region (overlapping) of the chromosome ?

How can I find genes located in the same region (overlapping) of the chromosome ? 1 I take the BAM file as input and perform RNA-Seq. The program prints out a list of genes to which the reads match. Some of the genes in the list overlapping in the same…

Continue Reading How can I find genes located in the same region (overlapping) of the chromosome ?

M8498 – YFull YTree Info

B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…

Continue Reading M8498 – YFull YTree Info

Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

Provided by: sambamba_0.8.2+dfsg-2_amd64 NAME sambamba-view – tool for extracting information from SAM/BAM files SYNOPSIS sambamba view OPTIONS <input.bam | input.sam> [region1 […]] DESCRIPTION sambamba view allows to efficiently filter SAM/BAM files for alignments satisfying various conditions, as well as access its SAM header and information about reference sequences. In order…

Continue Reading Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

FGC15109 – YFull YTree Info

I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109     Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…

Continue Reading FGC15109 – YFull YTree Info

Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file?

Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file? 0 I followed this thread: Conversion from tdf to bed format Converted like this: igvtools tdftobedgraph file.tdf file.bedgraph Now I have a bedgraph without headers but I have no idea what the last…

Continue Reading Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file?

bam – Detect mutation context in a read of a sam file

That kind of custom fiddling with reads and variants is very cumbersome, non-standard and also error-prone. Do a standard variant callign pipeline and then filter for the mutations that you want. Then extract the variant position (so the coordinates) and get the variant context from the reference genome. Using individual…

Continue Reading bam – Detect mutation context in a read of a sam file

BTG2 gene predicts poor outcome in PT-DLBCL

Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…

Continue Reading BTG2 gene predicts poor outcome in PT-DLBCL

samtools – Potential side effects of replacing read group tags in BAM file

I have a set of BAM files where the read group tags have some (default?) values, i.e.: @RG ID:RG0 LB:LB0 PU:PU0 SM:SM0 This creates issues in my downstream analyses, where multiple BAM files with the same SM tag are used. Samtools provides a command to replace the read group tag….

Continue Reading samtools – Potential side effects of replacing read group tags in BAM file

Htseq is giving me 0 counts using the GFF3 of miRBase

Hello! I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0…

Continue Reading Htseq is giving me 0 counts using the GFF3 of miRBase

sorting – indexing sorted alignment file with samtools index gives “Exec format error”

I am struggling with samtools index. I already did the alignment using “bwa mem reference.fa seq.fastq > alg.sam”. The resulting sam file was converted to bam format using “samtools view -S -h -b alg.sam > alg.bam”. Next, the files were sorted by using “sort -h alg.bam >sorted.bam”. And now we…

Continue Reading sorting – indexing sorted alignment file with samtools index gives “Exec format error”

FGC19851 – YFull YTree Info

R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851     Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…

Continue Reading FGC19851 – YFull YTree Info

FGC35106 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016938 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081770 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 30X, 11.5 Mbp, 151 bp YF016937 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081769 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 37X, 12.5 Mbp, 151 bp…

Continue Reading FGC35106 – YFull YTree Info

bam – samtools view command not found error

When I tried to use samtools to split a bam file based on different chromosomes, I used this command: samtools view input.bam -b chr21 | chr21.bam However, I get error messages like this: -bash: chr21.bam: command not found [W::hts_idx_load3] The index file is older than the data file: input.bam.bai How…

Continue Reading bam – samtools view command not found error

YP4024 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…

Continue Reading YP4024 – YFull YTree Info