Tag: BAM
Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each
Provided by: liballelecount-perl_4.2.1-1_all NAME alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each specified locus. SYNOPSIS Where possible use the C version for large data (it’s also more configurable). alleleCounts.pl Required: -bam -b BAM/CRAM file (expects co-located index) – if CRAM see ‘-ref’ -output -o Output…
Getting the best of RNA-Seq
Forum:Getting the best of RNA-Seq 0 This is not a banal discussion. I am facing some problems with the analysis of DE genes in mouse. Most methods of analysis of DE genes must face two considerations or challenges. The first needs to take into consideration the existence and the different…
Ubuntu Manpage: bamfillquery – fill query sequences into BAM files
Provided by: biobambam2_2.0.179+ds-1_amd64 NAME bamfillquery – fill query sequences into BAM files SYNOPSIS bamfillquery [options] <in.bam queries.fasta >out.bam DESCRIPTION bamfillquery reads a SAM/BAM/CRAM file and a FastA file, copies the sequences found in the FastA file into the query sequence field of the SAM/BAM/CRAM file and writes the resulting data…
Ubuntu Manpage: samtools targetcut – cut fosmid regions (for fosmid pool only)
Provided by: samtools_1.13-2_amd64 NAME samtools targetcut – cut fosmid regions (for fosmid pool only) SYNOPSIS samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] in.bam DESCRIPTION This command identifies target regions by examining the continuity of read depth, computes haploid consensus sequences of targets and…
Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?
Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname? 0 When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other, when sorty by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield,…
YP5260 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status I7021 Mongolia (Bulgan) C-F15910 C-F15910*, C-Y507 Hg19 .BAM Ancient 3X, 20.2 Mbp, 40 bp NEO249 Russia (Chukotskiy avtonomnyy okrug) C-F15910* —— Hg19 .BAM Ancient 1X, 7.2 Mbp, 81 bp I11696 Mongolia (Bulgan) C-Y507 —— Hg19 .BAM Ancient 2X,…
BAM – Job openings – Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable
Section S.3 – eScience To strengthen our team in the division “eScience” in Berlin-Steglitz, starting as soon as possible, we are looking for a Data Scientist for additive manufactruing (m/f/d) in the field of business informatics, computer science, software development, bioinformatics, engineering, data management, physics, data engineering or comparable Salary…
BY3 – YFull YTree Info
J-BY3 – YFull YTree Info SNPs currently defining J-BY3 BY3 / FGC15184 Sample ID Country / Language Info Ref File Testing company Statistics Status YF016315 —— J-FGC15174 J-FGC15174*, J-FGC15168*, J-FT258574 Hg38 .BAM FTDNA (Y500) 23X, 12.0 Mbp, 151 bp YF068400 Sudan (Janūb Kurdufān) J-FGC38453* —— Hg38 .BAM FTDNA (Y700)…
Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant
Subjects Normal breast and tumor samples were obtained with the written informed consent from donors and appropriate approval from local ethical committees, with the detailed information described in the respective original publications: normal tissue9, METABRIC14, TCGA35. Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast…
can gff2 reference used in htseq-count?
Dear all We are recently working with E.coli plasmid and tried to summarize the gene counts from our RNA-Seq samples. The short reads were mapped to E.coli plasmid using tophat which generated bam files accordingly. However, we were unable to obtain a gff3 version of our target plasmid genome, the…
Extract R1 and R2 from sam file generated by bowtie2
Extract R1 and R2 from sam file generated by bowtie2 1 Hi every one How to extract R1 and R2 from sam file generated by bowtie2 ? sam bowtie2 samtools bam • 137 views • link updated 14 hours ago by iraun ★ 4.4k • written 15 hours ago by…
YP3952 – YFull YTree Info
Q-YP3952 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF073154 Russia (Chechenskaya Respublika) / Chechen Q-YP3952* —— Hg38 .BAM FTDNA (Y700) 33X, 18.2 Mbp, 151 bp YF092378 Russia (Chechenskaya Respublika) / Chechen Q-BZ87 —— Hg38 .BAM FTDNA (Y700) 55X, 18.5 Mbp, 151…
linux merge multiple files in picard
Why not use samtools? for folder in my_bam_folders/*; do samtools merge $folder.bam $folder/*.bam done In general, samtools merge can merge all the bam files in a given directory like this: samtools merge merged.bam *.bam EDIT: If samtools isn’t an option and you have to use Picard, what about something like…
a strange pattern of repetitive summits
Problem with the output of Deeptools PlotProfile: a strange pattern of repetitive summits 0 Hi! I am trying to plot DNA binding profiles of my ChIP-seq bw files using Deeptools plotProfile. I generated the matrix using the computeMatrix reference-point. I used some publicly available bed files as my regions of…
GeneActivity without Fragments file in Seurat for Integrating scRNA-seq and scATAC-seq
Hi all, I am new to R and Seurat, and I am following Seurat tutorials to find anchors between RNA-seq and ATAC-seq data according to: Combining the two tutorials is difficult for a cell line data set I am using for SNARE-seq Human here. I managed to run the following…
Read counts an order of magnitude higher on one chromosome
Read counts an order of magnitude higher on one chromosome 3 Hi, I am having an issue with a sequencing run that when demultiplexed, aligned, and filtered each individual has 1-2 million reads, but these reads are predominantly on one chromosome. For background these are oncorhynchus mykiss and o. clarki…
Detailed differences between sambamba and samtools
3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…
BAM file and no RNAME or POS information? : bioinformatics
Newbie here. Please, play nice. I got possession of a set of 4 .bam files that stores the exome of an individual, around 400 MB each. I used samtools to generate a 2.4 GB .sam file out of one of the .bam files, and I found it contains lines with…
How to regress out age and sex using limma removeBatchEffect
How to regress out age and sex using limma removeBatchEffect 1 I have a protein expression data frame with a metadata data frame which includes age and sex: nph_csf_metadata = age sex bam tau 70 f 5 2 75 m 6 1 72 m 4 1 71 f 4 2…
Using featureCounts and downloading Rsubread
Using featureCounts and downloading Rsubread 1 @4769e097 Last seen 23 hours ago United Kingdom I am trying to perform a count per gene analysis using featureCounts in R. I have downloaded the gtf file and edited it within R to only contain the gene ID, chr, start, end, and strand,…
Parse a file of strings in python separated by newline into a json array
I don’t see where you’re actually reading from the file in the first place. You have to actually read your path_text.txt before you can format it correctly right? with open(‘path_text.txt’,’r’,encoding=’utf-8′) as myfile: content = myfiel.read().splitlines() Which will give you [‘/gp/oi/eu/gatk/inputs/NA12878_24RG_med.hg38.bam’, ‘/gp/oi/eu/gatk/inputs/NA12878_24RG_small.hg38.bam’] in content. Now if you want to write this…
Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts?
Is it correct to use Tophat2 directly followed by Cuffquant to only align to the reference transcriptomes without wishing to assemble new transcripts? 1 Hi, friends. I only want to perform differential expression analysis on the annotated transcripts of my existing reference genome. I use tophat2 for alignment with –no-novel-juncs…
Z697 – YFull YTree Info
R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697 Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…
Y140591 – YFull YTree Info
R-Y140591 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF067865 Germany R-Y140591* —— Hg38 .BAM FTDNA (Y700) 52X, 18.7 Mbp, 151 bp YF076495 Germany R-FT167842 —— Hg38 .BAM FTDNA (Y700) 49X, 18.3 Mbp, 151 bp YF067633 Germany R-FT167842 —— Hg38 .BAM FTDNA…
Annotated file with gene ID (instead of gene symbol)
Annotated file with gene ID (instead of gene symbol) 0 @9cb59de3 Last seen 14 hours ago United States Hello, I am using “featureCounts” in Rsubread package for analyzing bulk RNA-seq of drosophila. Since there is no inbuilt annotations of drosophila, I am using a gtf file in the homepage of…
sequencing – Interpreting ‘samtools mpileup’ output for multiple inputs
I would like to calculate sequencing coverage for a WGS project. Both long and short reads. I’ve used samtools as following: samtools mpileup -Q 1 -aa illumina_sorted.bam nanopore_sorted.bam > depth.txt Previously, when I used samtools depth instead, I only had the columns I was interested in (chromosome name / base…
CTS1346 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status HGDP01351 China, People’s Republic of O-F3607* —— Hg38 .BAM Scientific 16X, 23.6 Mbp, 151 bp YF079316 —— O-Y224790 —— Hg19 .BAM 23mofang 58X, 21.3 Mbp, 150 bp HG00583 China, People’s Republic of O-Y224790 —— Hg19 .BAM Scientific ——…
Split merged Bam file without replacement
Split merged Bam file without replacement 0 Hi guys, I have 5 bam (ChIPseq PE data sorted by position) files that came from 5 different murine cortexes (mice that belong to the same group, so biological replicates), however I have a lot of group variability. I’m thinking to merge all…
snp – Reference variant detected as altered one in bam file
I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…
A114 – YFull YTree Info
R-A114 – YFull YTree Info SNPs currently defining R-A114 FGC78244 A114(H) H Sample ID Country / Language Info Ref File Testing company Statistics Status YF067576 France (Ille-et-Vilaine) R-A114* —— Hg19 .BAM Dante Labs 12X, 23.0 Mbp, 151 bp YF088360 United States (Virginia) R-CTS4466* —— Hg38 .BAM FTDNA (Y700)…
HTseq-Count: Long processing time
HTseq-Count: Long processing time 1 Hi everyone, I’m processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks. htseq-count –max-reads-in-buffer=24000000000…
The low successful assignment ratio of FeatureCounts
Hello, I would like to confirm if the low assignment ratio (54%) is normal, and please check the possible reason I found. I used Hisat2 to assign paired-end strand-specific transcriptomic sequences (rRNA removed) to a reference genome. Because I filtered out the unmapped sequences in advance, the overall assignment ratio…
long run-time and low CPU usage
Pindel: long run-time and low CPU usage 0 I’m trying to run Pindel on some 30x Illumina WGS data. I aligned reads with BWA-MEM, then sorted by co-ordinates and indexed them with Samtools. I also tried filtering the bam files with samtools -F 0x800 as suggested by another post. I…
different result using minimap2 and pbmm2
Hi all! I am analysing CSS Pacbio data and each sample came from different run, in particular I have three files for each sample. I tested both pbmm2 and minimap2 to align my long reads, after getting the consensus sequences. This is the command I used to run mnimap2: minimap2…
pjotrp/sambamba – sambamba – Genenetwork
10 years ago 10 years ago 10 years ago 10 years ago 10 years ago 10 years ago 10 years ago 10 years…
Filtering bam file based on depth determined through samtools depth
Filtering bam file based on depth determined through samtools depth 1 Hi All, I have a bam file and I calculated read depth using samtools depth and I now want to filter the bam file to have only the contigs that have a depth between a certain value. I was…
Use RSEM and Bowtie2 to align paired-end sequences
Use RSEM and Bowtie2 to align paired-end sequences 0 I want to use rsem-calculate-expression and bowtie2 aligner to align paired-end sequence based on the following conditions: 2 processors generate BAM file very fast bowtie2 sensitivity append gene/transcript name My code: rsem-refseq-extract-primary-assembly GCF_000001405.31_GRCh38.p5_genomic.fna GCF_000001405.31_GRCh38.p5_genomic.primary_assembly.fna rsem-prepare-reference –gff3 GCF_000001405.31_GRCh38.p5_genomic.gff –bowtie2 –bowtie2-path /bowtie2-2.4.5-py39hd2f7db1_2 –trusted-sources…
Mapping back 3 sets of reads/sample with minimap2
I used FaQC to qc my raw fastqs before assembling. That program (and perhaps others) outputs properly paired Forward and Reverse fastqs, as well as an unpaired fastq file for each sample. I used the all 3 for each single sample assembly. Since minimap2 only allows for 2 query files,…
F13864 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status ERS5240131 Singapore C-F13864* —— Hg19 .BAM Scientific 7X, 22.9 Mbp, 150 bp YF076683 China, People’s Republic of (Shandong) C-F13864* —— Hg19 .BAM 23mofang 57X, 21.2 Mbp, 150 bp YF071813 —— C-F13864* —— Hg19 .BAM 23mofang 21X, 21.8 Mbp,…
L1193 – YFull YTree Info
I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193 FGC87558 Y72031 Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…
3 -tag XM” failed! when running rsem-calculate-expression
Dear sir, When I ran “rsem-calculate-expression –paired-end –alignments -p 8input.bam” gencodev22 ./out. I got error message rsem-parse-alignments ../bowtie2/hg38 ./rsem-out.temp/rsem-out ./rsem-out.stat/rsem-out /NGS_Storage/Debbie/RNA-seq/variant_calling_20210602/RNA-leukemia002A-906.para.bam 3 -tag XM Read A00355:209:H3KTLDSX2:2:2606:24677:17425: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should…
How to edit a SAM file using pysam
How to edit a SAM file using pysam 0 Dear all – I have a template sam file and I want to change one of the columns (template_length) and replace it with a new value. The new value is a quick mathematical operation. template sam file: @HD VN:1.0 SO:unsorted @SQ…
Y18411 – YFull YTree Info
J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…
htseq-count error
htseq-count error 1 Hi, htseq-count -f bam -s yes ~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam ~refgen/gencode.v39.primary_assembly.annotation.gtf > counts.txt I am trying to run htseq-count with command above but in the err file [E::idx_find_and_load] Could not retrieve index file for ‘~/htseq-trial/SRR13826419_Aligned.sortedByName.out.bam’ 100000 GFF lines processed. 200000 GFF lines processed. 300000 GFF lines processed. 400000 GFF lines…
BioInformatics Product Manager at Helix (remote)
You + Helix Helix is a place where innovators and doers gather in order to drive significant progress in population genomics. We have come together to work at the intersection of clinical care, research, and genomics. If you’re excited by the idea of making a meaningful impact and joining a…
rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias
I am working with non-model plant RNA samples which we have been deep sequenced and analysed using STAR aligner under default parameters. Aim We would like to conduct SNP discovery of these samples. Objective Our ultimate goal with this genotypic data is to search for variants (both SNPs and indels)…
human genome files
human genome files 0 Hi all, Just wonder to know about these two questions? what is the main difference between the two genome files (Homo_sapiens.GRCh38.dna.primary_assembly.fa and Homo_sapiens.GRCh38.dna.fa) located in the ensemble database? which one should I use for whole-exome sequence alignment? I used Homo_sapiens.GRCh38.dna.fa for the alignment, and later on,…
Why did I achieve shorter than initial reads subset after aligned reads extraction.
Why did I achieve shorter than initial reads subset after aligned reads extraction. 1 Hello dear colleages! I have recently faced some problem. I have worked with long WGS reads. Firstly I have filtered the longest subset of reads, and aligned them to the custom sequence with several structural variants…
Low transcript quantification with Salmon using GRCm39 annotations
Hi everyone, first time working with mouse samples and unfortunately, there are fewer resources available for the latest mouse Ensembl genome than I was expecting. What I’ve done: I performed rRNA depletion on total RNA extracted from mouse tissue and created Illumina libraries using a cDNA synthesis kit with random…
How can I find genes located in the same region (overlapping) of the chromosome ?
How can I find genes located in the same region (overlapping) of the chromosome ? 1 I take the BAM file as input and perform RNA-Seq. The program prints out a list of genes to which the reads match. Some of the genes in the list overlapping in the same…
M8498 – YFull YTree Info
B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…
Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files
Provided by: sambamba_0.8.2+dfsg-2_amd64 NAME sambamba-view – tool for extracting information from SAM/BAM files SYNOPSIS sambamba view OPTIONS <input.bam | input.sam> [region1 […]] DESCRIPTION sambamba view allows to efficiently filter SAM/BAM files for alignments satisfying various conditions, as well as access its SAM header and information about reference sequences. In order…
FGC15109 – YFull YTree Info
I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109 Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…
Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file?
Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file? 0 I followed this thread: Conversion from tdf to bed format Converted like this: igvtools tdftobedgraph file.tdf file.bedgraph Now I have a bedgraph without headers but I have no idea what the last…
bam – Detect mutation context in a read of a sam file
That kind of custom fiddling with reads and variants is very cumbersome, non-standard and also error-prone. Do a standard variant callign pipeline and then filter for the mutations that you want. Then extract the variant position (so the coordinates) and get the variant context from the reference genome. Using individual…
BTG2 gene predicts poor outcome in PT-DLBCL
Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…
samtools – Potential side effects of replacing read group tags in BAM file
I have a set of BAM files where the read group tags have some (default?) values, i.e.: @RG ID:RG0 LB:LB0 PU:PU0 SM:SM0 This creates issues in my downstream analyses, where multiple BAM files with the same SM tag are used. Samtools provides a command to replace the read group tag….
Htseq is giving me 0 counts using the GFF3 of miRBase
Hello! I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0…
sorting – indexing sorted alignment file with samtools index gives “Exec format error”
I am struggling with samtools index. I already did the alignment using “bwa mem reference.fa seq.fastq > alg.sam”. The resulting sam file was converted to bam format using “samtools view -S -h -b alg.sam > alg.bam”. Next, the files were sorted by using “sort -h alg.bam >sorted.bam”. And now we…
FGC19851 – YFull YTree Info
R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851 Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…
FGC35106 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF016938 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081770 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 30X, 11.5 Mbp, 151 bp YF016937 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081769 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 37X, 12.5 Mbp, 151 bp…
bam – samtools view command not found error
When I tried to use samtools to split a bam file based on different chromosomes, I used this command: samtools view input.bam -b chr21 | chr21.bam However, I get error messages like this: -bash: chr21.bam: command not found [W::hts_idx_load3] The index file is older than the data file: input.bam.bai How…
YP4024 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…
samtools markdup
samtools markdup 1 I’m doing deduplicate reads on a merged bam file, and I get this error What is going on? What is the solution? (base) javier@iMac-de-JAVIER BWA % samtools markdup -r -S 1merged.bam 2merged.bam [tmp_file] Error: tmp file write data failed. [markdup] error: writing temp output failed. [E::bgzf_close] File…
Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings
Although the hypothesis of gene-regulatory network (GRN) cooption is a plausible model to explain the origin of morphological novelties (1), there has been limited empirical evidence to show that this mechanism led to the origin of any novel trait. Several hypotheses have been proposed for the origin of butterfly eyespots,…
Y570 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status AF2 —— Q-Y570 Q-Y570*, Q-F746* Hg19 .BAM Ancient 1X, 1.3 Mbp, 94 bp YF093124 —— Q-M120* —— Hg38 .BAM Nebula Genomics 57X, 23.6 Mbp, 150 bp Kolyma1 Russia (Sakha, Respublika [Yakutiya]) Q-Y222276* —— Hg19 .BAM Ancient 7X, 13.4…
Samtools sort creates many BAM and bugs terminal : bioinformatics
Hello, when entering the command : > samtools sort input.bam -o input_sorted.bam The terminal looks like it is busy so I let it run. Coming back several hours later, the terminal is now displaying random shifting characters like something is still going on, but visibly not right : Bugged terminal…
sam – Use Htslib to create auxilary tags in bam file C++
I am creating a threaded c++ file where i generate in silico bam files, using header, DNA sequence and read information. First i use bam_init1() to create the bam1_t structure just named “b”. Then i use bam_set1 to create the actual sequence entry in the bam file bam_set1(b,read_id_length,READ_ID,flag,chr_idx,min_beg,mapq,n_cigar,cigar,-1,-1,0,strlen(DNAsequence),DNAsequence,quality_string,l_aux) And finally…
Processing two lists of files with snakemake
I want to use snakemake to do bowtie2 mapping of split read files to a reference genome, and I’d like that rule to be integrated in the general workflow. For that purpose, I first defined a rule to create a bowtie index rule build_bowtie_index: input: referenceGenomeFasta output: expand(“{name}.{index}.bt2”, index=range(1,5), name…
PF6747 – YFull YTree Info
E-PF6747 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF010216 Azerbaijan (Qəbələ) E-PF6747* —— Hg19 .BAM FTDNA (Y500) 50X, 13.7 Mbp, 165 bp YF064736 Egypt (Al Minūfīyah) E-FT97857* —— Hg38 .BAM FTDNA (Y700) 35X, 18.5 Mbp, 151 bp YF093064 Yemen (Tā’izz) E-Y280593…
java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread
I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…
Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)
find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2) Install $ git clone github.com/bakerwm/find_te_ins.git $ cd find_te_ins Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11; $ bash run_pipe.sh run_pipe.sh Prerequisite minimap2 – 2.17-r974-dirty, align long…
[MonashBioinformaticsPlatform/RSeQC] junction_saturation not suit for bam/sam file generated by minimap or pbmm2
because the CIGAR in bam/sam file generated by minimap2 contain “=” , represent right match with reference, and “X”, represent wrong match with reference. while the bam_cigar.py in ./lib/qcmodule/bam_cigar.py only suit for bam/sam generated such as BWA/bowtie, which CIGAR contain only “M” ,represent mis/match. So i modified the bam_cigar.py 77…
Error in Rsubread featureCounts
Hi there, Excellent package! I am using it to do RNA-seq. But I encountered a small problem when using featureCounts(). The code is as follows: featureCounts( “A1.raw_1.fastq.gz.subjunc.BAM”, annot.inbuilt = NULL, annot.ext = “GCF_015227675.2_mRatBN7.2_genomic.gtf”, isGTFAnnotationFile=TRUE, isPairedEnd=TRUE, nthreads = 8 ) And it returns this: ========== _____ _ _ ____ _____ ______…
Z2039 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF003382 Finland (Länsi-Suomen lääni) I-Z2040* —— Hg19 .BAM FTDNA (Y500) 47X, 13.3 Mbp, 165 bp YF067917 Ireland I-FGC69701* —— Hg19 .BAM Dante Labs 9X, 22.9 Mbp, 151 bp YF078735 Belarus (Vicebskaja voblasc’) / Polish I-FGC69702 —— Hg38 .VCF…
How to separate true positive alignments from a given SAM file
Hu @FadelBerakdar, Indeed, you can get true positive and false positive alignments in output. You have to specify the files where this information will be stored under the files section of a given software output. The output format is SAM files without headers. The name given in parameter is just…
BY7447 – YFull YTree Info
E-BY7447 – YFull YTree Info SNPs currently defining E-BY7447 BY7447 Sample ID Country / Language Info Ref File Testing company Statistics Status YF075635 Yemen (Al Bayḑā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 39X, 18.2 Mbp, 151 bp YF067501 Yemen (Şan’ā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 44X, 18.8 Mbp,…
samtools – How to Sort and Index a SAM file without converting it to BAM?
Not only will you save disk space by converting to BAM, but BAM files are faster to manipulate than SAM. Source: Dave Tang’s SAMTools wiki. sort supports uncompressed SAM format from a file or stdin, though index requires BGZIP-compressed SAM or BAM. I don’t think you can get around this….
Profiling and functional characterization of maternal mRNA translation during mouse maternal-to-zygotic transition
INTRODUCTION Mammalian life starts with the fusion of two terminally differentiated gametes, sperm and oocyte, resulting in a totipotent zygote. After going through preimplantation development, the zygote reaches blastocyst before implantation. The two most important events taking place during preimplantation development are zygotic genome activation (ZGA) and the first cell…
Bioconductor on Microsoft Azure – Microsoft Tech Community
Co-authored by: Nitesh Turaga – Scientist at Dana Farber/Harvard, Bioconductor Core Team Erdal Cosgun – Sr. Data Scientist at Microsoft Biomedical Platforms and Genomics team Vincent Carey – Professor at Harvard Medical School, Bioconductor Core Team Introduction The Bioconductor project promotes the statistical analysis and comprehension of current and emerging…
DF109 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF016926 Ireland R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 27X, 12.7 Mbp, 165 bp YF016394 United States (Ohio) R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 34X, 11.9 Mbp, 151 bp YF011566 Ireland (Mayo) R-DF109 R-DF109*, R-A18726*, R-FGC23742* Hg38…
GATK HaplotypeCaller with interval list
I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…
UMItools dedup deduplication taking too much time + RAM
I have some RNAseq data from miRNAs that I have processed with Bowtie2 (aligning to miRBase). Now, when doing the deduplication with umi_tools dedup I find that some of the files take a lot of time+RAM to finish (some files take around 3-4 minutes and 4-5GB of RAM and some…
ZP77 – YFull YTree Info
R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562 Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…
Petabase-scale sequence alignment catalyses viral discovery
Serratus alignment architecture Serratus (v0.3.0) (github.com/ababaian/serratus) is an open-source cloud-infrastructure designed for ultra-high-throughput sequence alignment against a query sequence or pangenome (Extended Data Fig. 1). Serratus compute costs are dependent on search parameters (expanded discussion available: github.com/ababaian/serratus/wiki/pangenome_design). The nucleotide vertebrate viral pangenome search (bowtie2, database size: 79.8 MB) reached processing rates…
Efficiently merge two BAM files while retaining reads from only one file in overlapping regions
Efficiently merge two BAM files while retaining reads from only one file in overlapping regions 1 I have a WGS BAM file that is fairly large (>150GB) and a smaller BAM file (<5GB) with reads in a small 10Mbp region. I want to (efficiently) merge the two BAM files while…
variant – Error running gatk HaplotypeCaller with allele specific annotations
I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…
Read bam/cram file with IGV from aws s3
Hi all, We store our alignment files on aws s3. I would like to be able to open them with IGV without needing to download them completely, but I can’t find an optimal solution. If I get a pre-signed url it works but it’s not convenient. I try to follow…
Samtools flagstat confusing result of a merged bam file
Hi, I am a bioinformatics student and I am struggling with an issue, I had paired-end fastq files for one sample with some low-quality bases at the end and adapter contamination, so I went and I trimmed my reads with trimmomatic, it gave me 4 files that I used for…
Ubuntu Manpage: samtools reheader – replaces the header in the input file
Provided by: samtools_1.13-2_amd64 NAME samtools reheader – replaces the header in the input file SYNOPSIS samtools reheader [-iP] [-c CMD | in.header.sam ] in.bam DESCRIPTION Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion. By default…
Unable to convert from sam to bam file.
Unable to convert from sam to bam file. 0 samtools view -S -b BD143_TGACCA_L005.sam -o BD143_TGACCA_L005.bam When I am running this command the following error is appearing: [main_samview] fail to read the header from “BD143_TGACCA_L005.sam”. As a result, if anyone knows how to fix this error and thanks. converting File…
samtools sort
samtools sort 1 I am transforming sam files to bam, to facilitate their ordering I use this command, % cd /Volumes/GENOMA/BWA % samtools sort -n -O V350019555_L03_B5GHUMqcnrRAABA-551.sam | samtools fixmate -m -O bam V350019555_L03_B5GHUMqcnrRAABA-551.bam but it gives me the following error, As elsewhere in samtools, use ‘-‘ as the filename…
[SOLVED] changing the order of input changes samtools merge ouput
I realized that this is a stupid mistake I have made. Since samtools do not overwrite the files by default, the output that I get from samtools merge output.bam f2.bam f1.bam wan’t what I thought it was below is my original post ++++++++++++++++++++++++++ I’m using samtool/1.9.0 and I’m trying to…
Estimating individual mtDNA haplotypes in mixed DNA samples by combining MinION and MiSeq
doi: 10.1007/s00414-021-02763-0. Online ahead of print. Affiliations Expand Affiliations 1 Department of Forensic Medicine, Juntendo University School of Medicine, 2-1-1, Hongo, Bunkyo-Ku, Tokyo, 113-8421, Japan. hnakani@juntendo.ac.jp. 2 Department of Forensic Medicine, Saitama Medical University, 38 Morohongo, Moroyama, Saitama, 350-0495, Japan. 3 Department of Forensic Medicine, Juntendo University School of Medicine,…
Issue running MACS3
I am having issues running MACS3. I installed MACS3 using: wget github.com/macs3-project/MACS/archive/refs/tags/v3.0.0a6.tar.gz tar -xf v3.0.0a6.tar.gz chmod a+rwx MACS-3.0.0a6/bin/macs3 It appears to be installed correctly because the following code generates the predictd help window: MACS-3.0.0a6/bin/macs3 predictd –help However, when I try running the actual code I get the following error: MACS-3.0.0a6/bin/macs3…
mergue bam itv
mergue bam itv 0 I am trying to create a combined file b m, to enter all the readings, but it gives me an error when loading In a Mac text editor, I enter the path of the three files, and save it with the extension bam.list I introduce HARD…
Bwa on multiple processor
Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…
processing in strelka2 with multiples bam file in directory
processing in strelka2 with multiples bam file in directory 0 If I manually tell strelka2 to use these three bam files below, then I get the desired results of 3 individually genome files in results/variants. xxx_00.bam yyy_01.bam zzz_02.bam ${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py –bam xxx_00.bam –bam yyy_01.bam –bam zzz_02 –referenceFasta <fasta> –callRegions <.bed.gz> –runDir…
Aligning multiple single and paired-end reads from multiple files (lanes)
Rsubread: Aligning multiple single and paired-end reads from multiple files (lanes) 0 Hello, I am new to bioinformatics and looking for some help. I have 27 files from an Illumina output. There are 4 paired end and 23 single read files. I am trying to align them using Rsubread in…
Samtools flagstat
Samtools flagstat 1 I aligned my ONT sequencing run with minimap2, subsequently I filtered the file using samtools view -b -F 256 aln_transcriptome_sorted_6.bam -o filtered_aln_transcriptome_6.bam to end up with primary alignments only. When I run samtools flagstat on the filtered file I get the following output: 3502608 + 0 in…