Tag: BAM

Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

Provided by: liballelecount-perl_4.2.1-1_all NAME alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each specified locus. SYNOPSIS Where possible use the C version for large data (it’s also more configurable). alleleCounts.pl Required: -bam -b BAM/CRAM file (expects co-located index) – if CRAM see ‘-ref’ -output -o Output…

Continue Reading Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

Getting the best of RNA-Seq

Forum:Getting the best of RNA-Seq 0 This is not a banal discussion. I am facing some problems with the analysis of DE genes in mouse. Most methods of analysis of DE genes must face two considerations or challenges. The first needs to take into consideration the existence and the different…

Continue Reading Getting the best of RNA-Seq

Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

Provided by: biobambam2_2.0.179+ds-1_amd64 NAME bamfillquery – fill query sequences into BAM files SYNOPSIS bamfillquery [options] <in.bam queries.fasta >out.bam DESCRIPTION bamfillquery reads a SAM/BAM/CRAM file and a FastA file, copies the sequences found in the FastA file into the query sequence field of the SAM/BAM/CRAM file and writes the resulting data…

Continue Reading Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

Ubuntu Manpage: samtools targetcut – cut fosmid regions (for fosmid pool only)

Provided by: samtools_1.13-2_amd64 NAME samtools targetcut – cut fosmid regions (for fosmid pool only) SYNOPSIS samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] in.bam DESCRIPTION This command identifies target regions by examining the continuity of read depth, computes haploid consensus sequences of targets and…

Continue Reading Ubuntu Manpage: samtools targetcut – cut fosmid regions (for fosmid pool only)

Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?

Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname? 0 When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other, when sorty by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield,…

Continue Reading Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?

YP5260 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status I7021 Mongolia (Bulgan) C-F15910 C-F15910*, C-Y507 Hg19 .BAM Ancient 3X, 20.2 Mbp, 40 bp NEO249 Russia (Chukotskiy avtonomnyy okrug) C-F15910* —— Hg19 .BAM Ancient 1X, 7.2 Mbp, 81 bp I11696 Mongolia (Bulgan) C-Y507 —— Hg19 .BAM Ancient 2X,…

Continue Reading YP5260 – YFull YTree Info

BAM – Job openings – Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable

Section S.3 – eScience To strengthen our team in the division “eScience” in Berlin-Steglitz, starting as soon as possible, we are looking for a Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable Salary…

Continue Reading BAM – Job openings – Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable

BY3 – YFull YTree Info

J-BY3 – YFull YTree Info SNPs currently defining J-BY3 BY3 / FGC15184     Sample ID Country / Language Info Ref File Testing company Statistics Status YF016315 —— J-FGC15174 J-FGC15174*, J-FGC15168*, J-FT258574 Hg38 .BAM FTDNA (Y500) 23X, 12.0 Mbp, 151 bp YF068400 Sudan (Janūb Kurdufān) J-FGC38453* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading BY3 – YFull YTree Info

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

Subjects Normal breast and tumor samples were obtained with the written informed consent from donors and appropriate approval from local ethical committees, with the detailed information described in the respective original publications: normal tissue9, METABRIC14, TCGA35. Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast…

Continue Reading Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

can gff2 reference used in htseq-count?

Dear all We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples. The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly. However, we were unable to obtain a gff3 version of our target plasmid genome, the…

Continue Reading can gff2 reference used in htseq-count?

Extract R1 and R2 from sam file generated by bowtie2

Extract R1 and R2 from sam file generated by bowtie2 1 Hi every one How to extract R1 and R2 from sam file generated by bowtie2 ? sam bowtie2 samtools bam • 137 views • link updated 14 hours ago by iraun &starf; 4.4k • written 15 hours ago by…

Continue Reading Extract R1 and R2 from sam file generated by bowtie2

YP3952 – YFull YTree Info

Q-YP3952 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF073154 Russia (Chechenskaya Respublika) / Chechen Q-YP3952* —— Hg38 .BAM FTDNA (Y700) 33X, 18.2 Mbp, 151 bp YF092378 Russia (Chechenskaya Respublika) / Chechen Q-BZ87 —— Hg38 .BAM FTDNA (Y700) 55X, 18.5 Mbp, 151…

Continue Reading YP3952 – YFull YTree Info

linux merge multiple files in picard

Why not use samtools? for folder in my_bam_folders/*; do samtools merge $folder.bam $folder/*.bam done In general, samtools merge can merge all the bam files in a given directory like this: samtools merge merged.bam *.bam EDIT: If samtools isn’t an option and you have to use Picard, what about something like…

Continue Reading linux merge multiple files in picard

a strange pattern of repetitive summits

Problem with the output of Deeptools PlotProfile: a strange pattern of repetitive summits 0 Hi! I am trying to plot DNA binding profiles of my ChIP-seq bw files using Deeptools plotProfile. I generated the matrix using the computeMatrix reference-point. I used some publicly available bed files as my regions of…

Continue Reading a strange pattern of repetitive summits

GeneActivity without Fragments file in Seurat for Integrating scRNA-seq and scATAC-seq

Hi all, I am new to R and Seurat, and I am following Seurat tutorials to find anchors between RNA-seq and ATAC-seq data according to: Combining the two tutorials is difficult for a cell line data set I am using for SNARE-seq Human here. I managed to run the following…

Continue Reading GeneActivity without Fragments file in Seurat for Integrating scRNA-seq and scATAC-seq

Read counts an order of magnitude higher on one chromosome

Read counts an order of magnitude higher on one chromosome 3 Hi, I am having an issue with a sequencing run that when demultiplexed, aligned, and filtered each individual has 1-2 million reads, but these reads are predominantly on one chromosome. For background these are oncorhynchus mykiss and o. clarki…

Continue Reading Read counts an order of magnitude higher on one chromosome

Detailed differences between sambamba and samtools

3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…

Continue Reading Detailed differences between sambamba and samtools

BAM file and no RNAME or POS information? : bioinformatics

Newbie here. Please, play nice. I got possession of a set of 4 .bam files that stores the exome of an individual, around 400 MB each. I used samtools to generate a 2.4 GB .sam file out of one of the .bam files, and I found it contains lines with…

Continue Reading BAM file and no RNAME or POS information? : bioinformatics

How to regress out age and sex using limma removeBatchEffect

How to regress out age and sex using limma removeBatchEffect 1 I have a protein expression data frame with a metadata data frame which includes age and sex: nph_csf_metadata = age sex bam tau 70 f 5 2 75 m 6 1 72 m 4 1 71 f 4 2…

Continue Reading How to regress out age and sex using limma removeBatchEffect

Using featureCounts and downloading Rsubread

Using featureCounts and downloading Rsubread 1 @4769e097 Last seen 23 hours ago United Kingdom I am trying to perform a count per gene analysis using featureCounts in R. I have downloaded the gtf file and edited it within R to only contain the gene ID, chr, start, end, and strand,…

Continue Reading Using featureCounts and downloading Rsubread

Parse a file of strings in python separated by newline into a json array

I don’t see where you’re actually reading from the file in the first place. You have to actually read your path_text.txt before you can format it correctly right? with open(‘path_text.txt’,’r’,encoding=’utf-8′) as myfile: content = myfiel.read().splitlines() Which will give you [‘/gp/oi/eu/gatk/inputs/NA12878_24RG_med.hg38.bam’, ‘/gp/oi/eu/gatk/inputs/NA12878_24RG_small.hg38.bam’] in content. Now if you want to write this…

Continue Reading Parse a file of strings in python separated by newline into a json array

Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts?

Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts? 1 Hi, friends. I only want to perform differential expression analysis on the annotated transcripts of my existing reference genome. I use tophat2 for alignment with –no-novel-juncs…

Continue Reading Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts?

Z697 – YFull YTree Info

R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697     Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…

Continue Reading Z697 – YFull YTree Info

Y140591 – YFull YTree Info

R-Y140591 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF067865 Germany R-Y140591* —— Hg38 .BAM FTDNA (Y700) 52X, 18.7 Mbp, 151 bp YF076495 Germany R-FT167842 —— Hg38 .BAM FTDNA (Y700) 49X, 18.3 Mbp, 151 bp YF067633 Germany R-FT167842 —— Hg38 .BAM FTDNA…

Continue Reading Y140591 – YFull YTree Info

Annotated file with gene ID (instead of gene symbol)

Annotated file with gene ID (instead of gene symbol) 0 @9cb59de3 Last seen 14 hours ago United States Hello, I am using “featureCounts” in Rsubread package for analyzing bulk RNA-seq of drosophila. Since there is no inbuilt annotations of drosophila, I am using a gtf file in the homepage of…

Continue Reading Annotated file with gene ID (instead of gene symbol)

sequencing – Interpreting ‘samtools mpileup’ output for multiple inputs

I would like to calculate sequencing coverage for a WGS project. Both long and short reads. I’ve used samtools as following: samtools mpileup -Q 1 -aa illumina_sorted.bam nanopore_sorted.bam > depth.txt Previously, when I used samtools depth instead, I only had the columns I was interested in (chromosome name / base…

Continue Reading sequencing – Interpreting ‘samtools mpileup’ output for multiple inputs

CTS1346 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status HGDP01351 China, People’s Republic of O-F3607* —— Hg38 .BAM Scientific 16X, 23.6 Mbp, 151 bp YF079316 —— O-Y224790 —— Hg19 .BAM 23mofang 58X, 21.3 Mbp, 150 bp HG00583 China, People’s Republic of O-Y224790 —— Hg19 .BAM Scientific ——…

Continue Reading CTS1346 – YFull YTree Info

Split merged Bam file without replacement

Split merged Bam file without replacement 0 Hi guys, I have 5 bam (ChIPseq PE data sorted by position) files that came from 5 different murine cortexes (mice that belong to the same group, so biological replicates), however I have a lot of group variability. I’m thinking to merge all…

Continue Reading Split merged Bam file without replacement

snp – Reference variant detected as altered one in bam file

I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…

Continue Reading snp – Reference variant detected as altered one in bam file

A114 – YFull YTree Info

R-A114 – YFull YTree Info SNPs currently defining R-A114 FGC78244     A114(H)     H Sample ID Country / Language Info Ref File Testing company Statistics Status YF067576 France (Ille-et-Vilaine) R-A114* —— Hg19 .BAM Dante Labs 12X, 23.0 Mbp, 151 bp YF088360 United States (Virginia) R-CTS4466* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading A114 – YFull YTree Info

HTseq-Count: Long processing time

HTseq-Count: Long processing time 1 Hi everyone, I’m processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks. htseq-count –max-reads-in-buffer=24000000000…

Continue Reading HTseq-Count: Long processing time

The low successful assignment ratio of FeatureCounts

Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…

Continue Reading The low successful assignment ratio of FeatureCounts

long run-time and low CPU usage

Pindel: long run-time and low CPU usage 0 I’m trying to run Pindel on some 30x Illumina WGS data. I aligned reads with BWA-MEM, then sorted by co-ordinates and indexed them with Samtools. I also tried filtering the bam files with samtools -F 0x800 as suggested by another post. I…

Continue Reading long run-time and low CPU usage

different result using minimap2 and pbmm2

Hi all! I am analysing CSS Pacbio data and each sample came from different run, in particular I have three files for each sample. I tested both pbmm2 and minimap2 to align my long reads, after getting the consensus sequences. This is the command I used to run mnimap2: minimap2…

Continue Reading different result using minimap2 and pbmm2

pjotrp/sambamba – sambamba – Genenetwork

10 years ago ​ 10 years ago ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 10 years ago ​ ​ ​ ​ ​ ​ 10 years ago 10 years ago 10 years ago ​ ​ 10 years ago ​ 10 years…

Continue Reading pjotrp/sambamba – sambamba – Genenetwork

Filtering bam file based on depth determined through samtools depth

Filtering bam file based on depth determined through samtools depth 1 Hi All, I have a bam file and I calculated read depth using samtools depth and I now want to filter the bam file to have only the contigs that have a depth between a certain value. I was…

Continue Reading Filtering bam file based on depth determined through samtools depth

Use RSEM and Bowtie2 to align paired-end sequences

Use RSEM and Bowtie2 to align paired-end sequences 0 I want to use rsem-calculate-expression and bowtie2 aligner to align paired-end sequence based on the following conditions: 2 processors generate BAM file very fast bowtie2 sensitivity append gene/transcript name My code: rsem-refseq-extract-primary-assembly GCF_000001405.31_GRCh38.p5_genomic.fna GCF_000001405.31_GRCh38.p5_genomic.primary_assembly.fna rsem-prepare-reference –gff3 GCF_000001405.31_GRCh38.p5_genomic.gff –bowtie2 –bowtie2-path /bowtie2-2.4.5-py39hd2f7db1_2 –trusted-sources…

Continue Reading Use RSEM and Bowtie2 to align paired-end sequences

Mapping back 3 sets of reads/sample with minimap2

I used FaQC to qc my raw fastqs before assembling. That program (and perhaps others) outputs properly paired Forward and Reverse fastqs, as well as an unpaired fastq file for each sample. I used the all 3 for each single sample assembly. Since minimap2 only allows for 2 query files,…

Continue Reading Mapping back 3 sets of reads/sample with minimap2

F13864 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS5240131 Singapore C-F13864* —— Hg19 .BAM Scientific 7X, 22.9 Mbp, 150 bp YF076683 China, People’s Republic of (Shandong) C-F13864* —— Hg19 .BAM 23mofang 57X, 21.2 Mbp, 150 bp YF071813 —— C-F13864* —— Hg19 .BAM 23mofang 21X, 21.8 Mbp,…

Continue Reading F13864 – YFull YTree Info

L1193 – YFull YTree Info

I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193     FGC87558     Y72031     Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…

Continue Reading L1193 – YFull YTree Info

3 -tag XM” failed! when running rsem-calculate-expression

Dear sir, When I ran “rsem-calculate-expression –paired-end –alignments -p 8input.bam” gencodev22 ./out. I got error message rsem-parse-alignments ../bowtie2/hg38 ./rsem-out.temp/rsem-out ./rsem-out.stat/rsem-out /NGS_Storage/Debbie/RNA-seq/variant_calling_20210602/RNA-leukemia002A-906.para.bam 3 -tag XM Read A00355:209:H3KTLDSX2:2:2606:24677:17425: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should…

Continue Reading 3 -tag XM” failed! when running rsem-calculate-expression

How to edit a SAM file using pysam

How to edit a SAM file using pysam 0 Dear all – I have a template sam file and I want to change one of the columns (template_length) and replace it with a new value. The new value is a quick mathematical operation. template sam file: @HD VN:1.0 SO:unsorted @SQ…

Continue Reading How to edit a SAM file using pysam

Y18411 – YFull YTree Info

J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…

Continue Reading Y18411 – YFull YTree Info

htseq-count error

htseq-count error 1 Hi, htseq-count -f bam -s yes ~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam ~refgen/gencode.v39.primary_assembly.annotation.gtf > counts.txt I am trying to run htseq-count with command above but in the err file [E::idx_find_and_load] Could not retrieve index file for ‘~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam’ 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines…

Continue Reading htseq-count error

BioInformatics Product Manager at Helix (remote)

You + Helix Helix is a place where innovators and doers gather in order to drive significant progress in population genomics. We have come together to work at the intersection of clinical care, research, and genomics.   If you’re excited by the idea of making a meaningful impact and joining a…

Continue Reading BioInformatics Product Manager at Helix (remote)

rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

I am working with non-model plant RNA samples which we have been deep sequenced and analysed using STAR aligner under default parameters. Aim We would like to conduct SNP discovery of these samples. Objective Our ultimate goal with this genotypic data is to search for variants (both SNPs and indels)…

Continue Reading rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

human genome files

human genome files 0 Hi all, Just wonder to know about these two questions? what is the main difference between the two genome files (Homo_sapiens.GRCh38.dna.primary_assembly.fa and Homo_sapiens.GRCh38.dna.fa) located in the ensemble database? which one should I use for whole-exome sequence alignment? I used Homo_sapiens.GRCh38.dna.fa for the alignment, and later on,…

Continue Reading human genome files

Why did I achieve shorter than initial reads subset after aligned reads extraction.

Why did I achieve shorter than initial reads subset after aligned reads extraction. 1 Hello dear colleages! I have recently faced some problem. I have worked with long WGS reads. Firstly I have filtered the longest subset of reads, and aligned them to the custom sequence with several structural variants…

Continue Reading Why did I achieve shorter than initial reads subset after aligned reads extraction.

Low transcript quantification with Salmon using GRCm39 annotations

Hi everyone, first time working with mouse samples and unfortunately, there are fewer resources available for the latest mouse Ensembl genome than I was expecting. What I’ve done: I performed rRNA depletion on total RNA extracted from mouse tissue and created Illumina libraries using a cDNA synthesis kit with random…

Continue Reading Low transcript quantification with Salmon using GRCm39 annotations

How can I find genes located in the same region (overlapping) of the chromosome ?

How can I find genes located in the same region (overlapping) of the chromosome ? 1 I take the BAM file as input and perform RNA-Seq. The program prints out a list of genes to which the reads match. Some of the genes in the list overlapping in the same…

Continue Reading How can I find genes located in the same region (overlapping) of the chromosome ?

M8498 – YFull YTree Info

B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…

Continue Reading M8498 – YFull YTree Info

Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

Provided by: sambamba_0.8.2+dfsg-2_amd64 NAME sambamba-view – tool for extracting information from SAM/BAM files SYNOPSIS sambamba view OPTIONS <input.bam | input.sam> [region1 […]] DESCRIPTION sambamba view allows to efficiently filter SAM/BAM files for alignments satisfying various conditions, as well as access its SAM header and information about reference sequences. In order…

Continue Reading Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

FGC15109 – YFull YTree Info

I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109     Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…

Continue Reading FGC15109 – YFull YTree Info

Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file?

Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file? 0 I followed this thread: Conversion from tdf to bed format Converted like this: igvtools tdftobedgraph file.tdf file.bedgraph Now I have a bedgraph without headers but I have no idea what the last…

Continue Reading Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file?

bam – Detect mutation context in a read of a sam file

That kind of custom fiddling with reads and variants is very cumbersome, non-standard and also error-prone. Do a standard variant callign pipeline and then filter for the mutations that you want. Then extract the variant position (so the coordinates) and get the variant context from the reference genome. Using individual…

Continue Reading bam – Detect mutation context in a read of a sam file

BTG2 gene predicts poor outcome in PT-DLBCL

Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…

Continue Reading BTG2 gene predicts poor outcome in PT-DLBCL

samtools – Potential side effects of replacing read group tags in BAM file

I have a set of BAM files where the read group tags have some (default?) values, i.e.: @RG ID:RG0 LB:LB0 PU:PU0 SM:SM0 This creates issues in my downstream analyses, where multiple BAM files with the same SM tag are used. Samtools provides a command to replace the read group tag….

Continue Reading samtools – Potential side effects of replacing read group tags in BAM file

Htseq is giving me 0 counts using the GFF3 of miRBase

Hello! I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0…

Continue Reading Htseq is giving me 0 counts using the GFF3 of miRBase

sorting – indexing sorted alignment file with samtools index gives “Exec format error”

I am struggling with samtools index. I already did the alignment using “bwa mem reference.fa seq.fastq > alg.sam”. The resulting sam file was converted to bam format using “samtools view -S -h -b alg.sam > alg.bam”. Next, the files were sorted by using “sort -h alg.bam >sorted.bam”. And now we…

Continue Reading sorting – indexing sorted alignment file with samtools index gives “Exec format error”

FGC19851 – YFull YTree Info

R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851     Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…

Continue Reading FGC19851 – YFull YTree Info

FGC35106 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016938 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081770 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 30X, 11.5 Mbp, 151 bp YF016937 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081769 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 37X, 12.5 Mbp, 151 bp…

Continue Reading FGC35106 – YFull YTree Info

bam – samtools view command not found error

When I tried to use samtools to split a bam file based on different chromosomes, I used this command: samtools view input.bam -b chr21 | chr21.bam However, I get error messages like this: -bash: chr21.bam: command not found [W::hts_idx_load3] The index file is older than the data file: input.bam.bai How…

Continue Reading bam – samtools view command not found error

YP4024 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…

Continue Reading YP4024 – YFull YTree Info

samtools markdup

samtools markdup 1 I’m doing deduplicate reads on a merged bam file, and I get this error What is going on? What is the solution? (base) javier@iMac-de-JAVIER BWA % samtools markdup -r -S 1merged.bam 2merged.bam [tmp_file] Error: tmp file write data failed. [markdup] error: writing temp output failed. [E::bgzf_close] File…

Continue Reading samtools markdup

Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings

Although the hypothesis of gene-regulatory network (GRN) cooption is a plausible model to explain the origin of morphological novelties (1), there has been limited empirical evidence to show that this mechanism led to the origin of any novel trait. Several hypotheses have been proposed for the origin of butterfly eyespots,…

Continue Reading Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings

Y570 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status AF2 —— Q-Y570 Q-Y570*, Q-F746* Hg19 .BAM Ancient 1X, 1.3 Mbp, 94 bp YF093124 —— Q-M120* —— Hg38 .BAM Nebula Genomics 57X, 23.6 Mbp, 150 bp Kolyma1 Russia (Sakha, Respublika [Yakutiya]) Q-Y222276* —— Hg19 .BAM Ancient 7X, 13.4…

Continue Reading Y570 – YFull YTree Info

Samtools sort creates many BAM and bugs terminal : bioinformatics

Hello, when entering the command : > samtools sort input.bam -o input_sorted.bam The terminal looks like it is busy so I let it run. Coming back several hours later, the terminal is now displaying random shifting characters like something is still going on, but visibly not right : Bugged terminal…

Continue Reading Samtools sort creates many BAM and bugs terminal : bioinformatics

sam – Use Htslib to create auxilary tags in bam file C++

I am creating a threaded c++ file where i generate in silico bam files, using header, DNA sequence and read information. First i use bam_init1() to create the bam1_t structure just named “b”. Then i use bam_set1 to create the actual sequence entry in the bam file bam_set1(b,read_id_length,READ_ID,flag,chr_idx,min_beg,mapq,n_cigar,cigar,-1,-1,0,strlen(DNAsequence),DNAsequence,quality_string,l_aux) And finally…

Continue Reading sam – Use Htslib to create auxilary tags in bam file C++

Processing two lists of files with snakemake

I want to use snakemake to do bowtie2 mapping of split read files to a reference genome, and I’d like that rule to be integrated in the general workflow. For that purpose, I first defined a rule to create a bowtie index rule build_bowtie_index: input: referenceGenomeFasta output: expand(“{name}.{index}.bt2”, index=range(1,5), name…

Continue Reading Processing two lists of files with snakemake

PF6747 – YFull YTree Info

E-PF6747 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF010216 Azerbaijan (Qəbələ) E-PF6747* —— Hg19 .BAM FTDNA (Y500) 50X, 13.7 Mbp, 165 bp YF064736 Egypt (Al Minūfīyah) E-FT97857* —— Hg38 .BAM FTDNA (Y700) 35X, 18.5 Mbp, 151 bp YF093064 Yemen (Tā’izz) E-Y280593…

Continue Reading PF6747 – YFull YTree Info

java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…

Continue Reading java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2) Install $ git clone github.com/bakerwm/find_te_ins.git&#13; $ cd find_te_ins Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11; $ bash run_pipe.sh run_pipe.sh Prerequisite minimap2 – 2.17-r974-dirty, align long…

Continue Reading Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

[MonashBioinformaticsPlatform/RSeQC] junction_saturation not suit for bam/sam file generated by minimap or pbmm2

because the CIGAR in bam/sam file generated by minimap2 contain “=” , represent right match with reference, and “X”, represent wrong match with reference. while the bam_cigar.py in ./lib/qcmodule/bam_cigar.py only suit for bam/sam generated such as BWA/bowtie, which CIGAR contain only “M” ,represent mis/match. So i modified the bam_cigar.py 77…

Continue Reading [MonashBioinformaticsPlatform/RSeQC] junction_saturation not suit for bam/sam file generated by minimap or pbmm2

Error in Rsubread featureCounts

Hi there, Excellent package! I am using it to do RNA-seq. But I encountered a small problem when using featureCounts(). The code is as follows: featureCounts( “A1.raw_1.fastq.gz.subjunc.BAM”, annot.inbuilt = NULL, annot.ext = “GCF_015227675.2_mRatBN7.2_genomic.gtf”, isGTFAnnotationFile=TRUE, isPairedEnd=TRUE, nthreads = 8 ) And it returns this: ========== _____ _ _ ____ _____ ______…

Continue Reading Error in Rsubread featureCounts

Z2039 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF003382 Finland (Länsi-Suomen lääni) I-Z2040* —— Hg19 .BAM FTDNA (Y500) 47X, 13.3 Mbp, 165 bp YF067917 Ireland I-FGC69701* —— Hg19 .BAM Dante Labs 9X, 22.9 Mbp, 151 bp YF078735 Belarus (Vicebskaja voblasc’) / Polish I-FGC69702 —— Hg38 .VCF…

Continue Reading Z2039 – YFull YTree Info

How to separate true positive alignments from a given SAM file

Hu @FadelBerakdar, Indeed, you can get true positive and false positive alignments in output. You have to specify the files where this information will be stored under the files section of a given software output. The output format is SAM files without headers. The name given in parameter is just…

Continue Reading How to separate true positive alignments from a given SAM file

BY7447 – YFull YTree Info

E-BY7447 – YFull YTree Info SNPs currently defining E-BY7447 BY7447     Sample ID Country / Language Info Ref File Testing company Statistics Status YF075635 Yemen (Al Bayḑā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 39X, 18.2 Mbp, 151 bp YF067501 Yemen (Şan’ā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 44X, 18.8 Mbp,…

Continue Reading BY7447 – YFull YTree Info

samtools – How to Sort and Index a SAM file without converting it to BAM?

Not only will you save disk space by converting to BAM, but BAM files are faster to manipulate than SAM. Source: Dave Tang’s SAMTools wiki. sort supports uncompressed SAM format from a file or stdin, though index requires BGZIP-compressed SAM or BAM. I don’t think you can get around this….

Continue Reading samtools – How to Sort and Index a SAM file without converting it to BAM?

Profiling and functional characterization of maternal mRNA translation during mouse maternal-to-zygotic transition

INTRODUCTION Mammalian life starts with the fusion of two terminally differentiated gametes, sperm and oocyte, resulting in a totipotent zygote. After going through preimplantation development, the zygote reaches blastocyst before implantation. The two most important events taking place during preimplantation development are zygotic genome activation (ZGA) and the first cell…

Continue Reading Profiling and functional characterization of maternal mRNA translation during mouse maternal-to-zygotic transition

Bioconductor on Microsoft Azure – Microsoft Tech Community

Co-authored by: Nitesh Turaga – Scientist at Dana Farber/Harvard, Bioconductor Core Team Erdal Cosgun – Sr. Data Scientist at Microsoft Biomedical Platforms and Genomics team Vincent Carey – Professor at Harvard Medical School, Bioconductor Core Team   Introduction   The Bioconductor project promotes the statistical analysis and comprehension of current and emerging…

Continue Reading Bioconductor on Microsoft Azure – Microsoft Tech Community

DF109 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016926 Ireland R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 27X, 12.7 Mbp, 165 bp YF016394 United States (Ohio) R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 34X, 11.9 Mbp, 151 bp YF011566 Ireland (Mayo) R-DF109 R-DF109*, R-A18726*, R-FGC23742* Hg38…

Continue Reading DF109 – YFull YTree Info

GATK HaplotypeCaller with interval list

I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…

Continue Reading GATK HaplotypeCaller with interval list

UMItools dedup deduplication taking too much time + RAM

I have some RNAseq data from miRNAs that I have processed with Bowtie2 (aligning to miRBase). Now, when doing the deduplication with umi_tools dedup I find that some of the files take a lot of time+RAM to finish (some files take around 3-4 minutes and 4-5GB of RAM and some…

Continue Reading UMItools dedup deduplication taking too much time + RAM

ZP77 – YFull YTree Info

R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562     Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…

Continue Reading ZP77 – YFull YTree Info

Petabase-scale sequence alignment catalyses viral discovery

Serratus alignment architecture Serratus (v0.3.0) (github.com/ababaian/serratus) is an open-source cloud-infrastructure designed for ultra-high-throughput sequence alignment against a query sequence or pangenome (Extended Data Fig. 1). Serratus compute costs are dependent on search parameters (expanded discussion available: github.com/ababaian/serratus/wiki/pangenome_design). The nucleotide vertebrate viral pangenome search (bowtie2, database size: 79.8 MB) reached processing rates…

Continue Reading Petabase-scale sequence alignment catalyses viral discovery

Efficiently merge two BAM files while retaining reads from only one file in overlapping regions

Efficiently merge two BAM files while retaining reads from only one file in overlapping regions 1 I have a WGS BAM file that is fairly large (>150GB) and a smaller BAM file (<5GB) with reads in a small 10Mbp region. I want to (efficiently) merge the two BAM files while…

Continue Reading Efficiently merge two BAM files while retaining reads from only one file in overlapping regions

variant – Error running gatk HaplotypeCaller with allele specific annotations

I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…

Continue Reading variant – Error running gatk HaplotypeCaller with allele specific annotations

Read bam/cram file with IGV from aws s3

Hi all, We store our alignment files on aws s3. I would like to be able to open them with IGV without needing to download them completely, but I can’t find an optimal solution. If I get a pre-signed url it works but it’s not convenient. I try to follow…

Continue Reading Read bam/cram file with IGV from aws s3

Samtools flagstat confusing result of a merged bam file

Hi, I am a bioinformatics student and I am struggling with an issue, I had paired-end fastq files for one sample with some low-quality bases at the end and adapter contamination, so I went and I trimmed my reads with trimmomatic, it gave me 4 files that I used for…

Continue Reading Samtools flagstat confusing result of a merged bam file

Ubuntu Manpage: samtools reheader – replaces the header in the input file

Provided by: samtools_1.13-2_amd64 NAME samtools reheader – replaces the header in the input file SYNOPSIS samtools reheader [-iP] [-c CMD | in.header.sam ] in.bam DESCRIPTION Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion. By default…

Continue Reading Ubuntu Manpage: samtools reheader – replaces the header in the input file

Unable to convert from sam to bam file.

Unable to convert from sam to bam file. 0 samtools view -S -b BD143_TGACCA_L005.sam -o BD143_TGACCA_L005.bam When I am running this command the following error is appearing: [main_samview] fail to read the header from “BD143_TGACCA_L005.sam”. As a result, if anyone knows how to fix this error and thanks. converting File…

Continue Reading Unable to convert from sam to bam file.

samtools sort

samtools sort 1 I am transforming sam files to bam, to facilitate their ordering I use this command, % cd /Volumes/GENOMA/BWA % samtools sort -n -O V350019555_L03_B5GHUMqcnrRAABA-551.sam | samtools fixmate -m -O bam V350019555_L03_B5GHUMqcnrRAABA-551.bam but it gives me the following error, As elsewhere in samtools, use ‘-‘ as the filename…

Continue Reading samtools sort

[SOLVED] changing the order of input changes samtools merge ouput

I realized that this is a stupid mistake I have made. Since samtools do not overwrite the files by default, the output that I get from samtools merge output.bam f2.bam f1.bam wan’t what I thought it was below is my original post ++++++++++++++++++++++++++ I’m using samtool/1.9.0 and I’m trying to…

Continue Reading [SOLVED] changing the order of input changes samtools merge ouput

Estimating individual mtDNA haplotypes in mixed DNA samples by combining MinION and MiSeq

doi: 10.1007/s00414-021-02763-0. Online ahead of print. Affiliations Expand Affiliations 1 Department of Forensic Medicine, Juntendo University School of Medicine, 2-1-1, Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan. hnakani@juntendo.ac.jp. 2 Department of Forensic Medicine, Saitama Medical University, 38 Morohongo, Moroyama, Saitama, 350-0495, Japan. 3 Department of Forensic Medicine, Juntendo University School of Medicine,…

Continue Reading Estimating individual mtDNA haplotypes in mixed DNA samples by combining MinION and MiSeq

Issue running MACS3

I am having issues running MACS3. I installed MACS3 using: wget github.com/macs3-project/MACS/archive/refs/tags/v3.0.0a6.tar.gz tar -xf v3.0.0a6.tar.gz chmod a+rwx MACS-3.0.0a6/bin/macs3 It appears to be installed correctly because the following code generates the predictd help window: MACS-3.0.0a6/bin/macs3 predictd –help However, when I try running the actual code I get the following error: MACS-3.0.0a6/bin/macs3…

Continue Reading Issue running MACS3

mergue bam itv

mergue bam itv 0 I am trying to create a combined file b m, to enter all the readings, but it gives me an error when loading In a Mac text editor, I enter the path of the three files, and save it with the extension bam.list I introduce HARD…

Continue Reading mergue bam itv

Bwa on multiple processor

Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…

Continue Reading Bwa on multiple processor

processing in strelka2 with multiples bam file in directory

processing in strelka2 with multiples bam file in directory 0 If I manually tell strelka2 to use these three bam files below, then I get the desired results of 3 individually genome files in results/variants. xxx_00.bam yyy_01.bam zzz_02.bam ${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py –bam xxx_00.bam –bam yyy_01.bam –bam zzz_02 –referenceFasta <fasta> –callRegions <.bed.gz> –runDir…

Continue Reading processing in strelka2 with multiples bam file in directory

Aligning multiple single and paired-end reads from multiple files (lanes)

Rsubread: Aligning multiple single and paired-end reads from multiple files (lanes) 0 Hello, I am new to bioinformatics and looking for some help. I have 27 files from an Illumina output. There are 4 paired end and 23 single read files. I am trying to align them using Rsubread in…

Continue Reading Aligning multiple single and paired-end reads from multiple files (lanes)

Samtools flagstat

Samtools flagstat 1 I aligned my ONT sequencing run with minimap2, subsequently I filtered the file using samtools view -b -F 256 aln_transcriptome_sorted_6.bam -o filtered_aln_transcriptome_6.bam to end up with primary alignments only. When I run samtools flagstat on the filtered file I get the following output: 3502608 + 0 in…

Continue Reading Samtools flagstat