Categories
Tag: GATK
Different relatedness estimates by PLINK and VCFTOOLS despite same method
According to the vcftools manual, specifying the “–relatedness2” flag allows calculating relatedness statistics using the method by Manichaikul et al., BIOINFORMATICS 2010 (doi:10.1093/bioinformatics/btq559). That is, based on KING. According to the PLINK manual, PLINK uses the same method to calculate relatedness when specifying the flag “–make-king-table”. So, although both PLINK…
Search for specific SNPs in VCF files of patients.
Search for specific SNPs in VCF files of patients. 0 I have 490 genomes from 490 patients in VCF format. I created a Multi VCF file from these VCFs. I want to find 2 mutations (Y215C and G325R) in these patients, count the number of patients who have these SNPs…
Multiallelic variants when merging VCF’s with GLnexus
Multiallelic variants when merging VCF’s with GLnexus 0 I’m attempting to combine around 140 .g.vcf files into a single file using GLnexus on the DNAnexus platform. To examine multiallelic variants, I’m normalizing the files using the bcftools norm -m-any $file command. While merging the original VCF files (generated with GATK)…
Genomic hypomethylation in cell-free DNA predicts responses to checkpoint blockade in lung and breast cancer
Lung cancer ICB cohort Advanced non-small cell lung carcinoma patients who were treated with anti-PD-1/PD-L1 monotherapy at Samsung Medical Center, Seoul, Republic of Korea were enrolled for this study. The present study has been reviewed and approved by the Institutional Review Board (IRB) of the Samsung Medical Center (IRB no….
Diversity and dissemination of viruses in pathogenic protozoa
Wang, A. L. & Wang, C. C. Viruses of the protozoa. Annu. Rev. Microbiol. 45, 251–263 (1991). Article CAS PubMed Google Scholar Banik, G., Stark, D., Rashid, H. & Ellis, J. Recent advances in molecular biology of parasitic viruses. Infect. Disord. – Drug Targets 14, 155–167 (2015). Article Google Scholar …
DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States
UPMC Presbyterian is hiring a full-time Bioinformatics Scientist to support the Molecular & Genomic Pathology Lab! This role will be scheduled for daylight shifts, Monday-Friday. The Molecular & Genomic Pathology Laboratory is a dynamic, state-of-the-art clinical laboratory that prides itself on delivering the highest quality of patient care through cutting-edge…
Variant calling using HaplotypeCaller does not show #FILTER information
Variant calling using HaplotypeCaller does not show #FILTER information 0 Hi All, I would like to ask for Variant Calling using HaplotypeCaller. It’s supposed that after doing the HaplotypeCaller, the #FILTER columns in gvcf files shall show the ‘PASS/LowQ’ however in my case, the output #FILTER only shows ‘.’ without…
haplotypecaller – NVIDIA Docs
Run a GPU-accelerated haplotypecaller. This tool applies an accelerated GATK CollectMultipleMetrics for assessing the metrics of a BAM file, such as including alignment success, quality score distributions, GC bias, and sequencing artifacts. This functions as a ‘meta-metrics’ tool, and can run any combination of the available metrics tools in GATK…
Indigenous Australian genomes show deep structure and rich novel variation
Inclusion and ethics The DNA samples analysed in this project form part of a collection of biospecimens, including historically collected samples, maintained under Indigenous governance by the NCIG11 at the John Curtin School of Medical Research at the Australian National University (ANU). NCIG, a statutory body within ANU, was founded…
Individual vs. joint call VCFs
Individual vs. joint call VCFs 0 Is there any way to figure out and be sure if a VCF file is individually called or jointly called? Is there any line in the VCF header to look at for this? GATK VCF WGS • 62 views • link updated 2 hours…
GATK GenomicsDBImport too slow
GATK GenomicsDBImport too slow 1 Hello, I have 3264 g.VCFs and an interval list for the reference genome that contains 20000 contigs. The interval list looks like the following: utg19_pilon_pilon:1-42237 utg22_pilon_pilon:1-49947 utg24_pilon_pilon:1-61707 utg30_pilon_pilon:1-459006 utg38_pilon_pilon:1-129173 utg40_pilon_pilon:1-101813 utg58_pilon_pilon:1-143918 utg93_pilon_pilon:1-186249 utg100_pilon_pilon:1-87875 utg104_pilon_pilon:1-49315 I am running the GATK GenomicsDBImport command as follows: gatk –java-options…
Variant missing in WGS sample
Variant missing in WGS sample 1 Hi, I have processed a WGS sample including alignment (bwa-mem2), variant calling (GATK HaplotypeCaller) and annotation (ANNOVAR). In the annotated file, a variant fitting the phenotype was identified. However, on visualizing the bam in IGV, this variant was not there. What could be the…
Help with gatk BaseRecalibrator
Help with gatk BaseRecalibrator 1 Hi Biostars, I try to do variant calling and got error at this step. Would you please have a suggestion? Thank you so much. gatk BaseRecalibrator -I ${aligned_reads}/SRR062634_sorted_dedup_reads.bam -R ${ref} –known-sites ${known_sites} -O ${data}/recal_data.table Invalid argument ‘/recal_data.table GATK variant-calling • 124 views • link updated…
How to input list into GenomicsDBImport with snakemake?
How to input list into GenomicsDBImport with snakemake? 0 Hello! I’m currently writing a pipeline with snakemake for exome data. During joint variant calling I need to use GATK’s GenomicsDBImport, although I’m unsure how to input all the samples at once. Here’s the simplified version of the rule I’m using:…
How to create interval list from reference fasta or dict file?
How to create interval list from reference fasta or dict file? 3 I am using GATK pipeline on WGS data. My BAM files is aligned to GRCh38 from GENCODE. So I want to create interval file for this GRCh38 instead of download from GATKbundle, because some of their contigs have…
GetPileupSummaries intervals-list with Targeted Sequencing?
GetPileupSummaries intervals-list with Targeted Sequencing? 0 Hi! I am applying the GetPileUpSummaries, for somatic variant calling starting from targeted sequencing .fasta. I aligned the file with the GrCh38 reference. And currently I am at the GetPileUpSummariesStep. gatk –java-options -Xmx200G GetPileupSummaries \ -I $RECBAM \ -L ???? \ -O $OUTPUT \…
Apply BSQR for Targeted Sequencing
Apply BSQR for Targeted Sequencing 0 Hi! I am performing variant calling starting from a fasta resulting targeted sequencing of ~320 cancer genes, I followed the GATK best practices aligning with the GrCh38 reference. For the Apply Base Quality Score Recalibration, which files should I use for the “–known-sites” given…
How to subtract variants from one VCF file to another?
How to subtract variants from one VCF file to another? 1 I have 2 VCF files from running the GATK Joint Genotyping workflow on two different groups of samples. I would like to filter out all the variants that are common to both VCF files and output a new VCF…
gatk SelectVariants is giving dupilicate allele error while extracting SNPs out of vcf file
gatk SelectVariants is giving dupilicate allele error while extracting SNPs out of vcf file 1 I am trying to extract snps out of merged vcf file using gatk SelectVariants command but it is giving following error: htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 73: Duplicate allele…
ASEReadCounter output wrong number of coverage
ASEReadCounter output wrong number of coverage 0 Hi, I am using ASEReadCounter to count the number of reads per variant in a BAM file. For some positions, it will report 1 read covered(1 refCount or 1 altCount) while there is no read covered at those positions after checking it in…
Building reference dbSNP file using WGS samples
Building reference dbSNP file using WGS samples 2 Dear scientific community, I have to call variants from WGS samples of citrus. I used GATK pipeline for post processing of aligned reads but reference dbSNP file is not available for citrus sinensis. I am using bootstraping method. Removed duplicates and called…
Help with gatk CreateSequenceDictionary
Help with gatk CreateSequenceDictionary 0 Hi Biostars, I checked my path to the hg38.fa but still don’t know what cause the error. Would you please have a suggestion? Thank you so much. gatk CreateSequenceDictionary R=/variant_calling/Desktop/demo/supporting_files/hg38/hg38.fa O=/variant_calling/Desktop/demo/supporting_fi les/hg38/hg38.dict Invalid argument ‘R=/variant_calling/Desktop/demo/supporting_files/hg38/hg38.fa’ GATK • 39 views Read more here: Source link
The role of APOBEC3B in lung tumor evolution and targeted cancer therapy resistance
Cell line and growth assays Cell lines were grown in Roswell Park Memorial Institute-1640 medium (RPMI-1640) with 1% penicillin–streptomycin (10,000 U ml−1) and 10% FBS or in Iscove’s modified Dulbecco’s medium (IMDM) with 1% penicillin–streptomycin (10,000 U ml−1), l-glutamine (200 mM) and 10% FBS in a humidified incubator with 5% CO2 maintained at 37 °C. Drugs…
GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data
GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data 0 Dear all, Is there any best practice for the mouse snp indel vcf files using GATK Mutect2 for mouse whole exome data? For mm10, it seems have several available, for mm39, it seems the newest is from…
Longitudinal detection of circulating tumor DNA
Analysis of Roche KAPA Target Enrichment kit experimental data obtained on an Illumina sequencing system is most frequently performed using a variety of publicly available, open-source analysis tools. The typical variant calling analysis workflow consists of sequencing read quality assessment, read filtering, mapping against the reference genome, duplicate removal, coverage…
H101 for cervical cancer | DDDT
Introduction Patients with persistent, recurrent, or metastatic (P/R/M) cervical carcinoma respond poorly to treatment despite the best available therapeutic regimens, with a 5-year survival of 17%.1 Most of them are heavily pretreated with chemotherapy and/or radiotherapy, and many patients experience complications related to treatment or advanced disease, which exclude them…
[maftools]Too many multi_hit and missense mutation
[maftools]Too many multi_hit and missense mutation 0 Describe the issue When using maftools to plot mutational summary data, I encountered some issues: I use WES data to generate a filtered VCF file, and then utilize VEP for annotation to obtain an MAF file. The MAF file contains an excessive number…
Analyzing somatic mutations by single-cell whole-genome sequencing
Failla, G. The aging process and cancerogenesis. Ann. N. Y. Acad. Sci. 71, 1124–1140 (1958). Article CAS PubMed Google Scholar Szilard, L. On the nature of the aging process. Proc. Natl Acad. Sci. USA 45, 30–45 (1959). Article CAS PubMed PubMed Central Google Scholar Vijg, J. & Dong, X. Pathogenic…
Merging several vcf files for GWAS?
Merging several vcf files for GWAS? 0 Hello! I am a Medical Student without much background in Bioinformatics trying to perform analysis for my first GWAS study, tremendously overwhelmed. It’s a Case Control Association Study with samples from 50 subjects, that we sampled using Novogene NGS platform. The problem is,…
Phenotypic drug-susceptibility profiles and genetic analysis based on whole-genome sequencing of Mycobacterium avium complex isolates in Thailand
Abstract Mycobacterium avium complex (MAC) infections are a significant clinical challenge. Determining drug-susceptibility profiles and the genetic basis of drug resistance is crucial for guiding effective treatment strategies. This study aimed to determine the drug-susceptibility profiles of MAC clinical isolates and to investigate the genetic basis conferring drug resistance using…
Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA
Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA 0 Dear Biostar Community I’m currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements. For this, I’ve downloaded the variants file (VCF) and the…
BaseRecalibrator takes forever to run. Any suggestions?
BaseRecalibrator takes forever to run. Any suggestions? 1 Hello, I am trying to run BaseRecalibrator tool from GATK package and it takes forever (more than 4 days per one bam file). The command I’m using is: gatk BaseRecalibrator -I NG-01_1_S1_dedup_bwa.bam -R /rumi/shams/genomes/hg38/hg38.fa –known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz –known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz –known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table…
Primate-specific ZNF808 is essential for pancreatic development in humans
Subjects The study was conducted in accordance with the Declaration of Helsinki and all subjects or their parents/guardian gave informed written consent for genetic testing. DNA testing and storage in the Beta Cell Research Bank was approved by the Wales Research Ethics Committee 5 Bangor (REC 17/WA/0327, IRAS project ID…
SNP calling with many samples using bcftools
SNP calling with many samples using bcftools 0 Hello, I aim to identify SNPs from approximately 500 BAM files (non-human). I’m opting for bcftools since GATK, even with the Spark addition, takes a substantial 6 hours per sample. My objective is to generate a single VCF file encompassing all SNPs…
Query regarding callsets used as known sites in Variant Calling
Query regarding callsets used as known sites in Variant Calling 0 Hi, Where can I learn more about the standard VCF files that are used as known sites during the BQSR step in Variant Calling with GATK? The files are: Homo_sapiens_assembly38.dbsnp138.vcf Homo_sapiens_assembly38.known_indels.vcf.gz Mills_and_1000G_gold_standard.indels.hg38.vcf.gz I am aware that these files are…
MemVerge and Sentieon Announce WaveRider for Sentieon to Accelerate Next-Generation Sequencing in the Cloud
Early Customers Realize 10x Increase in Performance and Cloud Cost Savings; Sentieon Software Offered Free in Memory Machine Cloud Subscription MILPITAS, Calif., Nov. 14, 2023 /PRNewswire/ — MemVerge®, pioneers of Big Memory software, and Sentieon®, the market leader in genomics software, today announced a collaboration to accelerate next-generation sequencing (NGS)…
BWA mem -M option for gatk mutect
BWA mem -M option for gatk mutect 0 Hi, everyone! I am new to the gatk pipelines, so I’m not sure is it necessary to use the -M option in the bwa men commond before I put the outcoming bam file to the gatk mutect pipelines for calling the mutation…
GATK SelectVariants –remove-unused-alternates dropping real INDELs?
I’m using a VCF that is generated by GenotypeGVCFs (so doing calibration based on a larger cohort of samples) and my goal is to only extract variants of interest to one specific sample. The VCF in the subset tends to include some variants that were present in the original joint…
Samtools index not working in Snakemake
I am setting up a Snakemake pipeline for sequencing reads alignment and variants calling. But the samtools index rule is not activated, and the subsequent haplotype caller rule fail. I think it is because the samtools index rule is not perceived as necessary to execute the output of rule all…
BAM file for phasing
BAM file for phasing 0 Hi all, I’m new in bioinformatics, and i’m trying to do phasing and imputation to WGS-level. For imputation with Beagle, I would like to make a bref file from a vcf file. And I have to phase the reference panel for that. Is a BAM…
ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications | BMC Bioinformatics
Pipeline architecture and configuration file Genomic data processing poses a challenge for genetic research studies because it involves multiple program dependency installations, vast numbers of samples with raw data from various next-generation sequencing (NGS) platforms, and inconsistent genetic variant ID and/or positions among datasets. The Iliad suite of genomic data…
Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly
Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly 0 We have two groups sample of yeast species, control (1 sample) and treatment (1 sample), whose complete reference genome isn’t available yet to do alignment nor variant calling. The objective of this project is straightforward, simply wanting to…
variant calling – How to run a GATK Docker Image with local files?
I’m trying to use the HaplotypeCaller from the GATK toolkit but I keep getting an error. I pulled GATK through Docker and am using this command: docker run -v /Users/rimo/ -it broadinstitute/gatk:latest gatk HaplotypeCaller -R /Users/rimo/reference.fasta -I /Users/rimo/sample1.bam -O /Users/rimo/sample1.g.vcf.gz -ERC GVCF /Users/rimo is my home directory it’s where the…
How to slice a CRAM file into the 50kb regions padded with 1kb?
How to slice a CRAM file into the 50kb regions padded with 1kb? 0 Hello, I am working on whole genome sequencing CRAM files and I want to perform GATK best practice. Before that, I want to slice each CRAM into smaller chunks, 50kb regions with 1kb padding, and avoid…
Structural Variants in gnomAD v4
Today, we are thrilled to announce the release of genome-wide structural variants (SVs) for 63,046 unrelated samples with genome sequencing (GS) data. All site-level information for 1,199,117 high-quality SVs discovered in these samples is browsable in the gnomAD browser (gnomAD SV v4) and downloadable from the gnomAD downloads page. For…
Single-nucleus DNA sequencing reveals hidden somatic loss-of-heterozygosity in Cerebral Cavernous Malformations
Ethical statement Our research complies with all relevant ethical regulations, including the Declaration of Helsinki and has been approved by the Institutional Review Boards of University of Chicago, Duke University and the Alliance to Cure Cavernous Malformations. Cerebral cavernous malformation lesions All human CCM tissue specimens have been previously reported18,19…
CombineGVCFs skips a chromosome
Hi! I am having issues for the first time with CombineGVCFs. Specifically, it outputs a combined gvcf without chromosome 8 (SUPER_8) even though this is present in the individual gvcfs that I input in the command. There is no error in the log file, the engine just shuts down after…
Bristol Myers Squibb hiring Summer 2024 – Informatics and Predictive Science Bioinformatics Internship in Lawrence, NJ
Working with UsChallenging. Meaningful. Life-changing. Those aren’t words that are usually associated with a job. But working at Bristol Myers Squibb is anything but usual. Here, uniquely interesting work happens every day, in every department. From optimizing a production line to the latest breakthroughs in cell therapy, this is work…
Inferring bacterial transmission dynamics using deep sequencing genomic surveillance data
Study design Experiments were performed in accordance with the New Zealand Animal Welfare Act (1999) and institutional guidelines provided by the University of Auckland Animal Ethics Committee, which reviewed and approved these experiments under application R1003. We did not use any specific randomisation process to allocate animals to a particular…
Bioinformatics Specialist I | MGH Cancer Center
!*! GENERAL SUMMARY/ OVERVIEW STATEMENT: In Gulhan Lab, we develop statistical and machine learning methods for cancer genomics to improve patient classification and early cancer detection strategies. We aim to decipher the broad spectrum of genomic instabilities that dictate the evolution of cancer genomes, and to understand how best to…
Hey guys, I’m having a prob when using GATK4 BQSR . This dbsnp vcf file has chromosomes notated as 1,2 …. but my reference contiges are chr1.chr2…incompatibility in coutigs..
anilkumar@ak-omen-laptop:~/NGStools/gatk-4.4.0.0$ gatk –java-options “-DGATK_STACKTRACE_ON_USER_EXCEPTION=true” BaseRecalibrator -I “/media/anilkumar/My Passport/CRC/fastq/C_4_mkdp.bam” -R “/media/anilkumar/My Passport/CRC/fastq/hg19.fa” –known-sites “/media/anilkumar/My Passport/CRC/fastq/dbsnp_138.b37.vcf” –known-sites “/media/anilkumar/My Passport/CRC/fastq/Mills_and_1000G_gold_standard.indels.b37.vcf” –known-sites “/media/anilkumar/My Passport/CRC/fastq/1000G_phase1.indels.b37.vcf” -O “/media/anilkumar/My Passport/CRC/fastq/C_4_bqsr.table” Using GATK jar /home/anilkumar/NGStools/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /home/anilkumar/NGStools/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar BaseRecalibrator -I /media/anilkumar/My Passport/CRC/fastq/C_4_mkdp.bam -R /media/anilkumar/My Passport/CRC/fastq/hg19.fa –known-sites /media/anilkumar/My Passport/CRC/fastq/dbsnp_138.b37.vcf –known-sites /media/anilkumar/My Passport/CRC/fastq/Mills_and_1000G_gold_standard.indels.b37.vcf –known-sites…
Mycobacterium tuberculosis Sub Lineage 4.2.2/SIT149 as DR
Introduction Antimicrobial resistance is a hidden global pandemic that shattered over 4.9 million people in 2019 alone, and the burden is highest, mainly in low-resource settings.1 Drug-resistant tuberculosis (DR-TB) caused by Mycobacterium tuberculosis (Mtb) complex (MTBC), which is resistant to one or more anti-TB drugs, is a leading global public…
Identification of CCZ1 as an essential lysosomal trafficking regulator in Marburg and Ebola virus infections
Cells and viruses Haploid mSCs AN3–12 are a feeder independent clonal derivative of HMSc2 isolated from mice oocyte and maintained at IMBA52. The AN3–12 library and knocked out cells used for the haploid screening was obtained from IMBA (Austria)14,52. AN3–12 cells were validated by STR analysis. Haploid mES cells were…
Invasive Californian death caps develop mushrooms unisexually and bisexually
Mushroom collecting Sporocarps were collected from various herbaria and during three expeditions to Point Reyes National Seashore (PRNS), California in 2004, 2014 and 2015, and in 2015 from three sites in Portugal. A total of 86 sporocarps were collected: 67 Californian sporocarps (one early herbarium sample dates to 1993), 11…
Genentech hiring Principal Bioinformatics Scientist II in Santa Clara, California, United States
The Position Principal Bioinformatics Scientist We are seeking a highly skilled and experienced Principal Bioinformatics Scientist specializing in Next Generation Sequencing (NGS) to join our research team. As a Computational Biologist in NGS, you will lead and contribute to cutting-edge projects focused on analyzing and interpreting NGS data, playing a…
NGS Training | Top NGS Courses | Online Training | RNASeq | Genome Variant Detection
NGS Training Next Generation Sequencing (NGS), a recently evolved technology, have served a lot in the research and development sector of our society. NGS methods are highly parallelized enabling to sequence thousands to millions of molecules simultaneously. This technology results into huge amount of data, which…
Confirming called variants
Confirming called variants 0 Hello, I performed whole exome analysis using GATK pipeline. After annotation of variant using annovar, I performed these steps: Filtered variants that have passed all filters Using Gnomad_exome_all, looked for variants less than 0.01 Then tried to confirm if these variants are also present in bam…
Cube Hub Inc. hiring Bioinformatics Scientist *hybrid*-MS or PhD(Next Gen Sequencing/Phylogenetics/Analysis Tools/Programming)JK in Foster City, California, United States
Job Title: Bioinformatics Scientist Location: Foster City, CA(Hybrid Model) Duration: 6 Months contract-Possible to extend Shift Details: 1st shift Pay Range : $ 44/h to $47/h Description- We are requesting to open a new contract position in Clinical Virology (Foster City). The Clinical Virology Bioinformatics team needs an additional resource…
Pre-imputation checks using 1000G data (hg19) for a hg38 VCF
Pre-imputation checks using 1000G data (hg19) for a hg38 VCF 0 I’m trying to use the pre-imputation checks here www.well.ox.ac.uk/~wrayner/tools/ to check a vcf (on the hg38 assembly) on the 1000G phase 3 v5 data, which is hg19, before imputing using the MIS. Obviously, very few of the variants in…
Does GATK SetNmMdAndUqTags reduces the size of a CRAM?
Does GATK SetNmMdAndUqTags reduces the size of a CRAM? 0 I performed GATK SetNmMdAndUqTags on a CRAM file for Whole Genome Sequencing after completing the MarkDuplicates step. The initial size of the CRAM file was 19GB, and after performing the SetNmMdAndUqTags operation, its size reduced to 8GB. The following is…
Challenges in Variant Calling and Genotyping with Short-Read Data Mapped to a Pangenome Graph: Seeking Guidance
Challenges in Variant Calling and Genotyping with Short-Read Data Mapped to a Pangenome Graph: Seeking Guidance 0 Hello all, We are reaching out since we have some practical questions regarding variant calling and analyses of short-read data mapped to a pangenome graph. We are working on a project aimed to…
Building reference dbSNP file for citrus sinensis using 80 WGS samples
Building reference dbSNP file for citrus sinensis using 80 WGS samples 1 Dear scientific community, I have to call variants from 80 WGS samples of citrus. I used GATK pipeline for post processing of aligned reads but reference dbSNP file is not available for citrus sinensis. I am using bootstraping…
Bootstrapping for BQSR of 80 WGS samples
Bootstrapping for BQSR of 80 WGS samples 0 Dear scientific community, I have to call variants from 80 WGS samples of citrus. I used GATK pipeline for post processing of aligned reads but reference dbSNP file is not available for citrus sinensis. I am using bootstraping method. Removed duplicates and…
How to choose the best tool for variant trio analysis
How to choose the best tool for variant trio analysis 0 Hi everyone, I am working on a project about variants in a rare disease in the reproductive filed. I have the BAMs and VCFs of a father, a mother, the girl affected and a brother. I would like to…
Genotyping, sequencing and analysis of 140,000 adults from Mexico City
Recruitment of study participants The MCPS was established in the late 1990s following discussions between Mexican scientists at the National Autonomous University of Mexico (UNAM) and British scientists at the University of Oxford about how best to measure the changing health effects of tobacco in Mexico. These discussions evolved into…
The mutational signature of hypertrophic cardiomyopathy
Introduction Hypertrophic cardiomyopathy (HCM), characterized by asymmetric hypertrophy of the ventricular wall, is a condition where the heart becomes thickened without a distinct inducement.1,2 Epidemiological investigation shows that the estimated prevalence rate of HCM in the general population is 1:500.3,4 The clinical manifestations vary greatly, with no symptoms and mild…
GenomicStar hiring Bioinformatics Data Analyst in Chennai, Tamil Nadu, India
Position: Bioinformatics Data Analyst Company: GenomicStar – A SaaS based Precision Medicine Platform Job Purpose: The Bioinformatics Data Analyst will play a critical role in the development, validation, and implementation of data analysis pipelines for various genetic test panels, incorporating state-of-the-art AI techniques. This role is integral to our commitment…
Determine INDELs number (both classes separately) from reference and graph-based VCF files
Hi there, this is more so of a hint/suggestion post than a real question since I could manage to find some related posts here on Biostars but appreciate a feedback on the procedure/results for the analysis. In principle, I’m trying to compare the bwa-mem_GATK pipeline working on the linear reference…
How to choose LiftOver chain file
How to choose LiftOver chain file 1 I am trying to liftover a hg38 Whole Genome Sequenced VCF to hg19 VCF. Planning to use GATK Picard for this. However not sure which liftover chain file to use from this path: hg38tohg19 picard LiftOver • 32 views • link updated 31…
RNAseq based variant dataset in a black poplar association panel | BMC Research Notes
Dickmann DI, Kuzovkina J. Poplars and willows of the world, with emphasis on silviculturally important species. In: Isebrands JG, Richardson J, editors. Poplars and willows: trees for society and the environment. Wallingford: CABI; 2014. Google Scholar Imbert E, Lefèvre F. Dispersal and gene flow of Populus nigra (Salicaceae) along a…
GenotypeGVCF too many genotypes from pooled samples
Hello, I am trying to create a VCF file using GentypeGVCFs in GATK4. I have 60 samples and each sample is pooled data. The ploidy per sample is 60. This is due to the biological system I work in. This data has been processed in Haplotypecaller, below is an example…
Problem with RNAseq MarkDuplicates(Picard)
Problem with RNAseq MarkDuplicates(Picard) 0 Hi, I’m trying to use GATK to call variant with RNAseq, but I have some question about MarkDuplicates. In MarkDuplicates document: MarkDuplicates(Picard), mentioned “Set READ_NAME_REGEX to null to skip optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and…
PTEN-induced kinase 1 gene single-nucleotide variants as biomarkers in adjuvant chemotherapy for colorectal cancer: a retrospective study | BMC Gastroenterology
Tissue samples A total of 84 analytic samples from surgical or biopsy specimens were collected from 84 patients who underwent radical surgery for CRC at Saitama Medical University International Medical Center between January and December 2016. One case was excluded because the specimen was too small; therefore, we used a…
Purpose, Types, Applications, Bioinformatics and more
Next-Generation Sequencing (NGS), also known as high-throughput sequencing, is a revolutionary technology used for determining the sequence of DNA or RNA molecules. It has significantly advanced the field of genomics and has numerous applications in various biological and medical fields. Key Points of Next-Generation Sequencing (NGS): Revolutionary Technology: NGS represents…
Effect of recombination on genetic diversity of Caenorhabditis elegans
Strong correlation exists between recombination rate and abundance and proportion of indels Whole-genome sequence data of many C. elegans wild isolates now exist. These include Illumina paired-end data of over 600 wild isolates by CeNDR, which also obtained first-generation PacBio long-read data of 14 wild isolates. Second-generation PacBio HiFi data20…
how to extract unique snps in a vcf file by comparing with multiple vcf files
how to extract unique snps in a vcf file by comparing with multiple vcf files 1 how to extract unique snps in a vcf file by comparing with multiple vcf files and make a file with unique snps EDIT by Ram OP created anotehr post a couple of hours later…
filtering variants in a Strelka2 VCF file based on AD and AF
Dear all, I would appreciate having your suggestions on the following. I am working with a VCF file that was produced by Strelka on Tumor-Normal pairs. As it is well known, Strelka2 does not provide Allele Depth (AD) or VAF (variant allele fraction) in the VCF fields. I have used…
University of Washington hiring Bioinformatics Specialist, Software Engineer in Seattle, Washington, United States
Req #: 227105 Department: GENOME SCIENCES Posting Date: 09/20/2023 Closing Info: Open Until Filled Salary: $6,843 to $7,917 per month Shift: First Shift Notes: As a UW employee, you will enjoy generous benefits and work/life programs. For a complete description of our benefits for this position, please visit our website,…
Bioinformatics Scientist – Noralogic Inc
Job Title: Bioinformatics Scientist Job Location: Bethesda , MD (20892) Job Type: Contract (Onsite) General Summary Additional Qualifications – Master’s degree in Bioinformatics or a related discipline – Two years’ experience Field of Study Biochemical Sciences Information Sciences Software MS Office R PowerPoint Unix Image J Outlook Python JAVA C++…
Allele specific binding of histone modifications and a transcription factor does not predict allele specific expression in correlated ChIP-seq peak-exon pairs
ChIP-seq and RNA-seq Tissue sampling and RNA-sequencing for three Holstein dairy cows and two of their foetuses (one male and one female with a shared sire) are described in17 and18. ChIP-sequencing for all tissues was as described in16, with the inclusion of more tissues. Whole genome sequence for each animal…
after gatk VariantAnnotator -V *_com_norm.vcf -A AlleleFraction -O *_norm_AB.vcf There “nan,nan” or “nan” in my vcf file
after gatk VariantAnnotator -V *_com_norm.vcf -A AlleleFraction -O *_norm_AB.vcf There “nan,nan” or “nan” in my vcf file 0 After I run this code gatk VariantAnnotator -V _com_norm.vcf -A AlleleFraction -O _norm_AB.vcf there “nan,nan” or “nan” in my vcf file the input file dosen’t has “nan,nan” or “nan”, it (*_com_norm.vcf) comes…
MeiraGTx hiring Bioinformatics Analyst in New York, New York, United States
MeiraGTx is a clinical-stage gene therapy company focused on developing potentially curative, innovative treatments for patients living with serious diseases of significant unmet medical need. We are seeking a motivated, enthusiastic, well-rounded individual with experience in computational biology, multi-omics data integration and/or genomics data analysis be a part of…
hg38 1kg/GATK is not available in the Lift Genome Annotation tool
Manuel Dominguez Clinical Bioinformatician (trainee) Wessex Genomics Laboratory Service (Salisbury) Salisbury District Hospital, Salisbury, SP2 8BJ. UK Tel (direct line) 01722 336262 (ext 3704) Tel (admin office) 01722 429080 www.wrgl.org.uk Please note…
Is a PON necessary for tumor-normal matched Mutect2?
Is a PON necessary for tumor-normal matched Mutect2? 1 I’m a bit confused on whether or not i should include GATK’s public PON (either 1000g_pon.hg38.vcf.gz since I aligned with hg38), make my own from my normal samples, or just leave it and not include a PON. I am planning on…
Downstream analysis on multi-sample or single-sample VCF files?
Downstream analysis on multi-sample or single-sample VCF files? 0 Hello, I use GATK best practices in my analysis (mainly dnaseq pipeline) and as it is suggested the pipeline calls genotypes on all the samples together and at the end creates an “allSamples.vcf.gz” file. At this stage one approach would be…
Liftover GRCh37 to hg38 1kg/GATK.
Liftover GRCh37 to hg38 1kg/GATK. 1 I need to liftover a few variants from GRCh37 to hg38 1kg/GATK. UCSC lifover does not have this reference genome version available. I have tried with the standard hg38 but conversations are wrong. Where can I find GRCh37 to hg38 1kg/GATK chain files or…
Mismatch repair deficiency is not sufficient to elicit tumor immunogenicity
Mice All animal use was approved by the Department of Comparative Medicine at the Massachusetts Institute of Technology (MIT) and the Institutional Animal Care and Use Committee under protocol no. 0714-076-17. Mice were housed with a 12-h light/12-h dark cycle with temperatures in the range 20–22 °C and 30–70% humidity. KrasLSL-G12D…
sarek: Introduction
Introduction nf-core/sarek is a workflow designed to detect variants on whole genome or targeted sequencing data. Initially designed for Human, and Mouse, it can work on any species with a reference genome. Sarek can also handle tumour / normal pairs and could include additional relapses. The pipeline is built using…
The genomic footprint of whaling and isolation in fin whale populations
Samples and sequencing Tissue samples from 50 fin whales (Balaenoptera physalus) were collected using a standard protocol to obtain skin biopsies from free-ranging cetacean species, which use a small stainless-steel biopsy dart deployed from a crossbow or rifle73,74. These samples were collected throughout the Eastern North Pacific (ENP; N = 30, represented…
Bioinformatics Technician I Job in Vancouver for UBC
Staff – Non Union Job Category Non Union Technicians and Research Assistants Job Profile Non Union Salaried – Research Assistant /Technician 3 Job Title Bioinformatics Technician I Department Data Management | Sequencing and Bioinformatics Consortium | VP Research and Innovation Office Compensation Range $3,982.67 – $4,703.83 CAD Monthly Posting End…
bcftools merge is resulting in a lot of missing data, how do I fix this?
I reckon you have different variant sites in your files. Individual A has SNPs at position 1, 2, 3, after imputation you’ll still have SNPs at position 1, 2, 3. Individual B has SNPs at position 4, 5, 6, after imputation it’s still 4, 5, 6. Once you merge them…
Issues while running BaseRecalibrator
Issues while running BaseRecalibrator 1 I am facing this error while using the BaseRecalibrator Thank you for your time and help. Exact command used: java -jar ~/WGS/tool/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar BaseRecalibrator -I dup/oryza.dup -O base/oryza.table -R ref/Oryza-sativa-Japonica-chromosome1.fasta –known-sites oryza_sativa.vcf Entire error log: A USER ERROR has occurred: Input files reference and features have…
Senior Genome Bioinformatics Analyst/Genome Bioinformatics Analyst, Remote Opportunity
APPLY NOW UPMC Magee-Womens Hospital is hiring a full-time Senior Genome Bioinformatics Analyst or Genome Bioinformatics Analyst to join the Genomics laboratory team! This will be a remote position. Applicants will be placed into the appropriate job title and salary based on their individual experience and education. The Genome Bioinformatics Analyst’s…
Senior Bioinformatician Lund
Senior Bioinformatician Lund | kr55000 – kr60000 per month | Permanent | Bioinformatics/Biostatistics/Bioengineers To Apply for this Job Click Here Unleash Your Bioinformatics Expertise in Lund – Join Our Cutting-Edge Team! – Apply Now! Title: Senior Bioinformatician Reporting To: Head of Bioinformatics Location: Lund, Skane Region, Sweden Industry: Life Sciences…
Bioinformatics Research Scientist job in San Mateo, CA at Conflux Systems @ Get.It
Position: Bioinformatics Scientist for Virology Cure Programs We are looking for a talented and driven individual to join our team as a Bioinformatics Scientist for our Virology Cure Programs. Under this role, you will be responsible for collaborating with other research functions to create data analysis tools that investigate viral…
READ GROUP in GATK
READ GROUP in GATK 1 My fastq files for a sample with their header line looked like this: HHNG7DSX5_19417170_S118_L003_R1_001.fastq.gz @A00428:335:HHNG7DSX5:3:1101:5466:1000 1:N:0:NGATGTTT+NTCAATTG HHNG7DSX5_19417170_S118_L003_R2_001.fastq.gz @A00428:335:HHNG7DSX5:3:1101:5466:1000 2:N:0:NGATGTTT+NTCAATTG HHNG7DSX5_19417170_S118_L004_R1_001.fastq.gz @A00428:335:HHNG7DSX5:4:1101:2302:1000 1:N:0:NGATGTTT+NTCAATTG HHNG7DSX5_19417170_S118_L004_R2_001.fastq.gz @A00428:335:HHNG7DSX5:4:1101:2302:1000 2:N:0:NGATGTTT+NTCAATTG I merged L003_R1, L004_R1 and L003_R2, L004_R2. First question is should I merge R1 and R2 lanes? I want to…
Microbiology and Immunology Bioinformatics Analyst
Job Description About the Job The Research Informatics (RI) Bioinformatics group within the University of Minnesota Supercomputing Institute (MSI) is hiring a full-time Bioinformatics Analyst to support research for the Department of Microbiology and Immunology (MI) at the University of Minnesota. The analyst in this position will conduct cutting-edge bioinformatics…
Virtual Variant Detection Workshop: September 11-14, 2023
News:Virtual Variant Detection Workshop: September 11-14, 2023 0 Virtual Variant Detection Workshop Dates: September 11-14, 2023 (4 days) Time: 9.00am – 12.00pm Location: Online Cost: $400 (UConn affiliates including UConn Health) $500(External Participants) The workshop will cover an introduction to linux and high performance computing, an introduction to variant detection,…
Problem while working with sequenza
Problem while working with sequenza – Chromosomes out of order 1 Hi, I’m trying to work with sequenza in order to calculate HRD score of a sample using WES data. When I run sequenza, I get a message saying that “chromosomes are out of order”, and I don’t know how…
WES CNV analysis
WES CNV analysis 0 Hi, I am new to CNV analysis and beginner in R language. I am trying to call germline CNVs using exome data using ExomeDepth. I only have the raw data with hg38 reference. If you have the ExomeDepth scripts to run on hg38 reference. Kindly share…