Tag: BWA
Read counts an order of magnitude higher on one chromosome
Read counts an order of magnitude higher on one chromosome 3 Hi, I am having an issue with a sequencing run that when demultiplexed, aligned, and filtered each individual has 1-2 million reads, but these reads are predominantly on one chromosome. For background these are oncorhynchus mykiss and o. clarki…
Detailed differences between sambamba and samtools
3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…
Genetic and chemotherapeutic influences on germline hypermutation
DNM filtering in 100,000 Genomes Project We analysed DNMs called in 13,949 parent–offspring trios from 12,609 families from the rare disease programme of the 100,000 Genomes Project. The rare disease cohort includes individuals with a wide array of diseases, including neurodevelopmental disorders, cardiovascular disorders, renal and urinary tract disorders, ophthalmological…
On a reference pan-genome model (Part II)
12 July 2019 I wrote a blog post on a potential reference pan-genome model. I had more thoughts in my mind. I didn’t write about them because they are immature. Nonetheless, a few readers raised questions related to my immature thoughts, so I decide to add this “Part II” as…
Postdoctoral Research Fellow in Bioinformatics/Computational Biology
Details Posted: 27-Apr-22 Location: Boston, Massachusetts Salary: Open Categories: Staff/Administrative Internal Number: 2022-27118 Located in Boston and the surrounding communities, Dana-Farber Cancer Institute brings together world renowned clinicians, innovative researchers and dedicated professionals, allies in the common mission of conquering cancer, HIV/AIDS and related diseases. Combining extremely talented people with…
Bioinformatics Analyst II – Remote in Danville, PA for Geisinger
Details Posted: 22-Apr-22 Location: Danville, Pennsylvania Type: Full Time Salary: Open Categories: Operations Job Summary Primary accountability is to leverage the organization’s data assets exome sequencing data (>180,000 individuals) from MyCode Community Health Initiative to improve quality, efficiency and generate knowledge specifically in the field of bioinformatics within health research….
long run-time and low CPU usage
Pindel: long run-time and low CPU usage 0 I’m trying to run Pindel on some 30x Illumina WGS data. I aligned reads with BWA-MEM, then sorted by co-ordinates and indexed them with Samtools. I also tried filtering the bam files with samtools -F 0x800 as suggested by another post. I…
FastQC per base sequence content
FastQC per base sequence content 1 I’m running FastQC on some paired-end fastq files. I have a warning on per-base sequence content, as the first 5 to 6 bases show significant bias towards T and G, as shown below. I was wondering what the sequence in the first 5 or…
Bioinformatics Pipeline Development Engineer II at Personalis, Inc
Personalis, Inc. is a leader in advanced cancer genomics for enabling the next generation of precision cancer therapies and diagnostics. The Personalis NeXT Platform® is designed to adapt to the complex and evolving understanding of cancer, providing its biopharmaceutical customers and clinicians with information on all of the approximately 20,000 human genes,…
Sam file is not written
Dear all, It writes the following in the log file: [08-02 01:26:25] Running Step 2: BWA … bwa_wrap /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq 6 Output3/out_1.valid.sam 0 Running BWA on trimmed reads … bwa mem -t 6 /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq | samtools view -h -F 2048 – > Output3/out_1.valid.sam However, the sam file size is…
Building custom hg38 – alt contigs
I am exploring modifications of hg38 like these: github.com/mebbert/Dark_and_Camouflaged_genes Starting from the regular bcbio hg38 data installation Masking hg38.fa using bedtools maskfasta Generating indexes using bcbio_setup_genome.py for seq and bwa as described in the manual The bwa directory then contains ├── bwa │ ├── hg38_masked.fa.amb │ ├── hg38_masked.fa.ann │ ├──…
Color hiring Software Engineer, Bioinformatics in Remote
About Color Color’s mission is to help people lead the healthiest lives that science and medicine can offer. We launched in April 2015 with a simple, affordable genetic test to help people understand their risk for hereditary cancer. In 2017, we added coverage for hereditary heart conditions. Between them, cancer…
Frontiers | Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation
Introduction The study of the microbial environments has benefited from the sequencing revolution, where technology improvement decreased the DNA sequencing cost and increased the number of sequenced nucleic bases. For approximately 20 years (depending on how we define the term metagenomics), it has allowed the decryption of the microbial composition…
BTG2 gene predicts poor outcome in PT-DLBCL
Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…
sorting – indexing sorted alignment file with samtools index gives “Exec format error”
I am struggling with samtools index. I already did the alignment using “bwa mem reference.fa seq.fastq > alg.sam”. The resulting sam file was converted to bam format using “samtools view -S -h -b alg.sam > alg.bam”. Next, the files were sorted by using “sort -h alg.bam >sorted.bam”. And now we…
bwa-mem2/mm2-fast: Accelerated version of minimap2; up to 1.8x faster
GitHub – bwa-mem2/mm2-fast: Accelerated version of minimap2; up to 1.8x faster This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. You can’t perform that action at this time. You signed in with another tab or window. Reload to…
HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd
Bioinformatics Scientist 2 Hours: 37.5 hours per week Salary: Competitive Ref No: HRJOB7442 Business Unit: Diagnostic Services Location: Craigavon or Manchester Open To: Internal and External Applicants The Company Almac Diagnostic Services is a leading stratified medicine business, specialising in biomarker-driven clinical trials. We are incredibly proud to be involved…
samtools markdup
samtools markdup 1 I’m doing deduplicate reads on a merged bam file, and I get this error What is going on? What is the solution? (base) javier@iMac-de-JAVIER BWA % samtools markdup -r -S 1merged.bam 2merged.bam [tmp_file] Error: tmp file write data failed. [markdup] error: writing temp output failed. [E::bgzf_close] File…
nf-core/circrna
circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data Introduction nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across…
Cell Strain-Derived Induced Pluripotent Stem Cells as an Isogenic Approach To Investigate Age-Related Host Response to Flaviviral Infection
INTRODUCTION Dengue is the most common mosquito-borne viral disease globally (1). This acute disease, which can be life-threatening, is caused by four different dengue viruses (DENVs) (DENV-1, DENV-2, DENV-3, and DENV-4). An estimated 390 million people are infected with these DENVs annually (2), and populations throughout the tropics face frequent…
[MonashBioinformaticsPlatform/RSeQC] junction_saturation not suit for bam/sam file generated by minimap or pbmm2
because the CIGAR in bam/sam file generated by minimap2 contain “=” , represent right match with reference, and “X”, represent wrong match with reference. while the bam_cigar.py in ./lib/qcmodule/bam_cigar.py only suit for bam/sam generated such as BWA/bowtie, which CIGAR contain only “M” ,represent mis/match. So i modified the bam_cigar.py 77…
bwa , 2 files fastq to 1 sam
bwa , 2 files fastq to 1 sam 1 i have this problem, please, help me, I’m trying it too from Mac OS Catalina I am creating a sam file, with 2 fastq files, using bwa I apply the following command bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz > V350019555_L03_B5GHUMqcnrRAABA-556.sam…
Senior Bioinformatics Software Developer – Bethesda
Medical Science & Computing, (MSC), a Dovel company, is seeking skilled Senior Bioinformatics Software Developers to join our team supporting our client, NCBI at the National Institutes of Health, (NIH) in Bethesda, MD. The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM) at…
Samtools flagstat confusing result of a merged bam file
Hi, I am a bioinformatics student and I am struggling with an issue, I had paired-end fastq files for one sample with some low-quality bases at the end and adapter contamination, so I went and I trimmed my reads with trimmomatic, it gave me 4 files that I used for…
Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence
Materials and Methods Genomic data was collected as part of the MDS National History Study or The Cancer Genome Atlas project and consented appropriately under those protocols 8 Sekeres M.A. Gore S.D. Stablein D.M. DiFronzo N. Abel G.A. DeZern A.E. Troy J.D. Rollison D.E. Thomas J.W. Waclawiw M.A. Liu J.J….
samtools sort
samtools sort 1 I am transforming sam files to bam, to facilitate their ordering I use this command, % cd /Volumes/GENOMA/BWA % samtools sort -n -O V350019555_L03_B5GHUMqcnrRAABA-551.sam | samtools fixmate -m -O bam V350019555_L03_B5GHUMqcnrRAABA-551.bam but it gives me the following error, As elsewhere in samtools, use ‘-‘ as the filename…
Bwa on multiple processor
Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…
Alignment report
Alignment report 0 Hi Guys, I did alignment of R1 and R2 fastq files with reference genome using bwa mem and got bam file. Now, I want to check whether the alignment is done correctly and alignment percentage,coverage etc. I run following command: bwa mem hg19.fasta R1.fastq R2.fastq | samtools…
sequence alignment – MarkDuplicatesSpark failing with cryptic error message. MarkDuplicates succeeds
[*] I have been trying to follow the GATK Best Practice Workflow for ‘Data pre-processing for variant discovery’ (gatk.broadinstitute.org/hc/en-us/articles/360035535912). This has all been run on Windows Subsystem for Linux 2 on the Bash shell. I started off with FASTQ files from IGSR (www.internationalgenome.org/data-portal) and performed alignment with Bowtie2 (instead of…
Systems biology analysis of human genomes points to key pathways conferring spina bifida risk
Significance Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to investigate SB predisposition controls for population stratification and avoids…
Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS
This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD, Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…
Towards the biogeography of prokaryotic genes
1. Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015). PubMed Google Scholar 2. Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019). CAS PubMed PubMed Central Google Scholar 3. Mohammad,…
Attempting to generate a bam.bai file but the output is not readable
Attempting to generate a bam.bai file but the output is not readable 1 Hi, I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index stage because the output files are in a non-human readable format and I believe…
Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample
I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…
Mapping multiples
Mapping multiples 1 Hi, I am coming to you for help. I am doing a mapping on short and long read files with BWA and MINIMAP2 My problem is that, I want to make an if loop that would allow me to choose either BWA if I work with short…
Strange speed up in GATK LeftAlignIndels
Strange speed up in GATK LeftAlignIndels 1 Hi! I noticed a strange thing, I have been running a DNA-seq pipeline like this: reads -> bwa-mem2 -> picard SortSam -> picard MergeSamFiles -> picard MarkDuplicates -> gatk LeftAlignIndels … gatk LeftAlignIndels has always taken around 4 hours to complete with the…
Single-cell DNA and RNA sequencing reveals the dynamics of intra-tumor heterogeneity in a colorectal cancer model | BMC Biology
Organoid culture of small intestinal cells and lentiviral transduction C57BL/6J mice and BALB/cAnu/nu immune-deficient nude mice were purchased from CLEA Japan (Tokyo, Japan). The small intestine was harvested from wild-type male C57BL/6J mice at 3–5 weeks of age (Additional file 1: Figure S9A). Crypts were purified and dissociated into single cells,…
Why are my Nextflow processes not executing in parallel?
I have written a Nextflow script with three process: The first process takes a pair of fastq files and aligns with reference genome. The process writes the resulting SAM file into sam channel. Second process takes input from the sam channel and creates a BAM file from it, and writes…
Weird error from BWA and BOWTIE2
Weird error from BWA and BOWTIE2 1 Hi community, Recently I have used BWA and Bowtie2 to align simulated DNA sequencing data to test our sequencing simulator. I got some errors from both aligners: BWA: submit.sh: line 48: 6881 Segmentation fault (core dumped) BOWTIE2: terminate called after throwing an instance…
Transposition and duplication of MADS-domain transcription factor genes in annual and perennial Arabis species modulates flowering
Annual and perennial species occur in many plant families. Annual plants and some perennials are monocarpic (flowering once in their life cycle), characterized by a massive flowering and typically produce many seeds before the whole plant senesces. By contrast, most perennials live for many years, show delayed reproduction, and are…
iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data
Abstract Despite the tremendous increase in omics data generated by modern sequencing technologies, their analysis can be tricky and often requires substantial expertise in bioinformatics. To address this concern, we have developed a user-friendly pipeline to analyze (cancer) genomic data that takes in raw sequencing data (FASTQ format) as input…
Dissemination of Mycobacterium abscessus via global transmission networks
Dataset construction, cluster identification and definition of DCCs Whole genome sequencing of two collections of isolates from Manchester, UK, and the Netherlands was carried out as previously described2. Briefly, DNA was extracted from colony sweeps of subcultured samples before to paired-end sequencing using the Illumina HiSeq platform. These samples were…
Genome-wide analysis reveals associations between climate and regional patterns of adaptive divergence and dispersal in American pikas
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664 CAS PubMed PubMed Central Article Google Scholar Alexander DH, Shringarpure SS, Novembre J, Lange K (2015) Admixture 1.3 software manual. UCLA Hum Genet Softw Distrib, Los Angeles Google Scholar Angert AL, Bontrager…
converting Bam to fastq while removing clipping(hard/soft clip bases)
converting Bam to fastq while removing clipping(hard/soft clip bases) 0 Hello, I want to do some analysis and my raw data is paired-end reads fastq files. So far: I used BWA mem to convert them to Sam file then used samtools to convert to BAM file. My next step is…
Haplotype divergence supports long-term asexuality in the oribatid mite Oppiella nova
Significance Putatively ancient asexual species pose a challenge to theory because they appear to escape the predicted negative long-term consequences of asexuality. Although long-term asexuality is difficult to demonstrate, specific signatures of haplotype divergence, called the “Meselson effect,” are regarded as strong support for long-term asexuality. Here, we provide evidence…
The sardine run in southeastern Africa is a mass migration into an ecological trap
INTRODUCTION Large-scale annual migrations occur in an extraordinary range of animals, from insects to the great whales. While the driving mechanisms of these migrations are varied and sometimes poorly understood, they often represent a way of optimizing conditions for breeding and adult fitness when these are in conflict. Often, populations…
Align fastq SOLiD data
Align fastq SOLiD data 1 Hello everyone, I have downloaded some data from the short read archive using the sratoolkit. The data is SOLiD data. I have seen people using the Lifescope (Life Technologies) to align the reads, as I presume it works for this type of data. But unfortunately,…
bamdst gives error “EOF marker is absent. The input is probably truncated.”
bamdst gives error “EOF marker is absent. The input is probably truncated.” 0 I created a set of bam files from Poolseq data using bwa -aln, and all of the output files gave the following error when I ran bamdst to get summary statistics on read depth: “EOF marker is…
High tumor mutation burden and DNA repair gene mutations
Introduction Anaplastic lymphoma kinase (ALK)‑fusion genes represent a small but important part of oncogenic driver mutations in NSCLC, accounting for approximately 3%‑7% of all cases worldwide.1,2 Small molecule tyrosine kinase inhibitors (TKIs) are the standard therapy for ALK-rearranged NSCLC. Crizotinib, a first-generation TKI, is the most widely used targeted drug…
Command-line alternative to Geneious assemble for Sanger sequencing data
I am doing Sanger sequencing of a construct ~2Kb using 4 primer pairs. I get back 4 .ab1 files, each with generally around 1Kb of high quality sequence and given the relatively small size of the construct these overlap significantly. The goal is to assemble these 4 sequences into a…
Bioinformatics Support Specialist (Remote) at Agilent Technologies, Inc.
Agilent inspires and supports discoveries that advance the quality of life. We provide life science, diagnostic, and applied market laboratories worldwide with instruments, services, consumables, applications, and expertise. Agilent enables customers to gain the answers and insights they seek so they can do what they do best: improve the world…
High frequency of an otherwise rare phenotype in a small and isolated tiger population
Significance Small and isolated populations have low genetic variation due to founding bottlenecks and genetic drift. Few empirical studies demonstrate visible phenotypic change associated with drift using genetic data in endangered species. We used genomic analyses of a captive tiger pedigree to identify the genetic basis for a rare trait,…
MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)
Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…
Biocept, Inc. hiring Bioinformatics Scientist in San Diego, California, United States
Tasks and Responsibilities Develop and maintain analysis pipelines for next generation sequencing data. Deep dive analysis of targeted and UMI… Required Skills & Experience MS/PhD in bioinformatics, mathematics, computer science, biology, chemistry, or similar, with 3+ years experience in an industrial environment. Experience in clinical diagnostics desired but not required…
Cancer Mutation Detection Depends on Choices at Each Step of Sequencing, Analysis Pipeline
NEW YORK — An international team of researchers has examined how variations in sequencing approaches can influence the ability to accurately detect cancer mutations, providing guidance for the wider community. The team additionally developed a set of reference samples for benchmarking efforts. Next-generation sequencing approaches are increasingly being adopted to…
Assistant Research Professor – Genomics and Bioinformatics job with City of Hope
About City of Hope City of Hope, an innovative biomedical research, treatment and educational institution with over 6000 employees, is dedicated to the prevention and cure of cancer and other life-threatening diseases and guided by a compassionate, patient-centered philosophy. Founded in 1913 and headquartered in Duarte, California, City of Hope…
ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat
Significance To date, the potential of utilizing root traits in plant breeding remains largely untapped. In this study, we cloned and characterized the ENHANCED GRAVITROPISM2 (EGT2) gene of barley that encodes a STERILE ALPHA MOTIF domain–containing protein. We demonstrated that EGT2 is a key gene of root growth…
Mapping digested synthetic oligos back to original sequences.
Mapping digested synthetic oligos back to original sequences. 0 Hi, I have several synthetic dsDNA of 70bp and I digest them with some enzyme. I am interested to see the exact cut site of the enzyme so I had the products sequenced using MiSeq. They are single-end read. What is…
Aligning Multiple paired end files together
Aligning Multiple paired end files together 1 Hi All, I have 72 paired end .fastq file for which i need to do Alignment using BWA. Since its a paired end data and my files are named as sam_001_1.fastq sam_001_2.fastq sam_002_1.fastq sam_002_2.fastq & so on Since its a paired end data…
Gene mutation analysis in papillary thyroid carcinoma
Introduction Thyroid tumors are the most common malignant tumors of the endocrine system, and their incidence has been increasing in the recent decades. Currently, there are some target drugs that can effectively treat PTC, and next-generation sequencing (NGS) can be used for targeted therapy. In order to make better informed…
pseudogenes and their parent gene common regions
pseudogenes and their parent gene common regions 1 Hi,I have a list of gene names and their corresponding pseudogenes. I want to figure out which regions of a pseudogene and its parent gene are common. I think one way would be first extracting their sequence then align them to each…
How to analyze the generated VCF file, what to do if you have multiple VCF file for the same gene?
How to analyze the generated VCF file, what to do if you have multiple VCF file for the same gene? 0 I have given 40 tumor samples to NGS for the analysis and I gave them a list of specific genes only do the sequencing for lets call that gene…
Twist Bioscience hiring Bioinformatics Scientist, Production Bioinformatics in South San Francisco, California, United States
Twist is looking for a Bioinformatics Scientist to join our Production Bioinformatics Team. You will work alongside research scientists, software engineers and data scientists to further deliver on our mission to expand access to best-in-class synthetic biology and next-generation sequencing applications. You will be developing and engineering tools to better…
Error in pipe output to samblaster from bwa-mem2
Error in pipe output to samblaster from bwa-mem2 0 Hi, I am trying to upgrade my command from bwa to bwa-mem2. This command usually works. bwa mem -M -R “@RGtID:idtSM:sampletLB:lib” human_g1k_v37.fasta sample.1.fq sample.2.fq | samblaster -M –excludeDups –addMateTags –maxSplitCount 2 –minNonOverlap 20 | samtools view -S -b – > sample.bam…
the Genomic Rearrangement IDentification Software Suite
Tool:GRIDSS: the Genomic Rearrangement IDentification Software Suite 0 GRIDSS is typically used for detecting structural variation breakpoints from short read sequencing data but is a modular software suite containing a number of tools useful for the detection of genomic rearrangements including: A structural variant caller. The GRIDSS caller uses break-end…
Bioinformatics Analyst II in Danville, PA for Geisinger
Job Summary Primary accountability is to leverage the organization’s data assets exome sequencing data (>180,000 individuals) from MyCode Community Health Initiative to improve quality, efficiency and generate knowledge specifically in the field of bioinformatics within health research. Performs and supervises complex data extraction, transformation, visualization, and summarization to support Research…
How to align and visualize data with .fasta and .gff3 files in IGV?
How to align and visualize data with .fasta and .gff3 files in IGV? 1 Hi everyone, I have an issue in aligning and visualizing my data in IGV. As I read in manual of IGV, to align and visualize data, I need to to prepare .BAM/.SAM or other input format…
Bowtie2 hg19 reference for gatk MuTect
Bowtie2 hg19 reference for gatk MuTect 3 Hello, I understand that the suggested aligner to use with GATK is bwa. If I want to use Bowtie2 as the aligner, which reference file should I be using? The reference in GATK bundle (Homo_sapiens_assembly19.fasta) does not seem to work with Bowtie2 and…
Bwa sampe error 999
Bwa sampe error 999 25-08-2021 I’m getting the following error message when I try to import into 1aa.vremenagoda54.ru file (using samtools import). [samopen] SAM header I’m using bwa aln to find coordinates and bwa sampe to…
Exctracting amino acid substitutions
Exctracting amino acid substitutions 0 Good day, I’m trying to develop a pipeline to determine mutations which are responsible for amino acid changes in genes associated with antibiotic resistance. I have roughly 300 bacrtial isolates. My approach so far has not been fruitful, in short this is what i tried:…
Snakemake-Aligment using BWA-MEM2
Hello I have started using snakemake 6.5.2 to align fastq files with reference file. I have pasted the error below in this question. How to allocate memory in the snakefile and read the header from samfile, ‘-‘. This is the snakefile (wrapper for running alignment): rule bwa_mem2_mem: input: reads=[“/scicore/home/cichon/GROUP/test_workflow/samples/{sample}.1.fq”, “/scicore/home/cichon/GROUP/test_workflow/samples/{sample}.2.fq”]…
Unable to locate package hisat2″
how to solve this error when I want to install HISAT2? “E: Unable to locate package hisat2” 1 Dear all, I need to install HISAT2 aligner in my study. My Linux version is 16.04 (Xenial Xerus). So I used the below command : sudo apt-get install -y hisat2 but I…
How to calculate the Average Insert Size after mapping the reads to the reference genome using BWA
How to calculate the Average Insert Size after mapping the reads to the reference genome using BWA 3 Hi, Having mapped the reads to the reference genome using BWA, I am trying to calculate their Average Insert Size. Thereafter, I converted the BAM file to SAM file in order to…
Missing read group in BAM files
Missing read group in BAM files 1 Hello everyone, I have processed PE reads through the pipeline HybPiper to align them to a reference genome with GATK. But inspecting the output BAM files with the GATK tool ValidateSamFile, I found out a very common error in the error report: WARNING::RECORD_MISSING_READ_GROUP…
MarkduplicatesSpark How to speed-up ?
MarkduplicatesSpark How to speed-up ? 0 Hello all, I would like to know if there is any good option to speed up MarkduplicatesSpark ? I work with human genome with arround 900 millions reads (151 bp). I work on a cluster (with slurm). The command that i used is (with…
What is the difference between GRCh37 and hs37? And hg19?
This is what I have found so far. Please correct me if I am wrong. GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds. The rCRS reference mitogenome in…
Base recalibration -Java run time error and no sequence dictionary
Base recalibration -Java run time error and no sequence dictionary 0 Hello I am stuck with base recalibration step in NGS analysis. Used this command for the base calibration step: gatk BaseRecalibrator -I sample1.bam -R gch38.fa –known-sites GCF_000001405.39 -O recal_data.table I got the following warning: WARN IndexUtils – Feature file…
Vacancy for Bioinformatics Analyst in the USA – OYA Opportunities
Apply for Vacancy for Bioinformatics Analyst at Weill Cornell Medicine in the USA. The deadline for this job is 30th September 2021. About: Weill Cornell Medicine, officially the Joan & Sanford I. Weill Medical College of Cornell University, is the biomedical research unit and medical school of Cornell University, a…
So many variants detected.
So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…
CROP-seq data analysis
CROP-seq data analysis 1 Hi, I am a new bie to single cell sequencing analysis. I have to analyze CROP-seq data, I am going through the following paper, www.nature.com/articles/nmeth.4177. I have to use cell ranger ( instead of DROP-seq software) as the first step to process single cell data.I wanted…
Alignment using bwa-mem2
Alignment using bwa-mem2 0 Hello I need help in aligning the sequence with reference using bwa-mem2. I used the following code: bwa-mem2 mem -t 8 gch38.fa DE98NGSUKBD117612_1_1.fq DE98NGSUKBD117612_1_2.fq > d3_align.sam I got the following error: ERROR! Unable to open the file: gch38.fa.bwt.2bit.64 There is no gch38.fa.bwt.2bit.64 file. I have the…
align using file.ht2
align using file.ht2 1 now i downloaded in my terminal indexed file of UCSC hg19 and when i uncompress it , i found two files genome.5.ht2 genome.8.ht2 and every time i want to align my samples at indexed file this error show up [e::bwa_idx_load_from_disk] fail to locate the index files…
I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv.
I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv. 0 Hey everyone, before i start apologies for the inconvenience cause of my wrong or inappropriate use of terms. I take some fails of bwa mem lately. As i…
Map Entire Directory of Paired-End Reads at Once
Map Entire Directory of Paired-End Reads at Once 0 Is there a way to map an entire directory of reads at once? Would I just have to write a script for this specific to my directory structure and data? I’m using BWA MEM to map 49 paired-end reads and have…
Read group info
Read group info 0 Hello I need help in getting read group info for performing alignment using BWA-MEM2. I read previous post (bwa mem: Passing a variable to read group) on read-group info, where a shell script is used to get the read group info from fastq file. Can someone…
VCF Filter On Small Genomes
VCF Filter On Small Genomes 0 Hi guys, I am working on a yeast species (Candida glabrata) NGS data to find any mutations related to drug resistance. I am new in bioinformatics so I am using Galaxy.eu to get use to algorithms. There is literature about some genes that mutations…