Categories
Tag: VCF
Different relatedness estimates by PLINK and VCFTOOLS despite same method
According to the vcftools manual, specifying the “–relatedness2” flag allows calculating relatedness statistics using the method by Manichaikul et al., BIOINFORMATICS 2010 (doi:10.1093/bioinformatics/btq559). That is, based on KING. According to the PLINK manual, PLINK uses the same method to calculate relatedness when specifying the flag “–make-king-table”. So, although both PLINK…
PCA from plink2 for SGDP using a pangenome and DeepVariant
Hi there, I’m doing my first experiments with PCA and UMAP as dimensionality reductions to visualize a dataset I’ve been working on. Basically, I used the samples from the SGDP which I then mapped on the human pangenome for, finally, calling small variants with DeepVariant. I moved on with some…
Remote Software Quality Engineer III – Bioinformatics Job at Natera
JOB TITLE: Software Quality Engineer III – Bioinformatics LOCATION: Remote, USA PRIMARY RESPONSIBILITIES: Perform software verification, define and execute test cases and scenarios required for software quality assurance and regulatory compliance. Perform system analysis, assess risk, and develop strong test strategies by analyzing product design and technical specifications, and by…
Imputing missing genotypes in –score
Does plink 2 impute missing genotypes with this pipe? plink2 –threads 1 \ –read-freq freq.afreq \ –vcf tube.vcf \ –score score_file.anno.plink2.tsv ignore-dup-ids \ …
Genomic insights into Plasmodium vivax population structure and diversity in central Africa | Malaria Journal
Hamblin MT, Di Rienzo A. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am J Hum Genet. 2000;66:1669–79. Article CAS PubMed PubMed Central Google Scholar Hamblin MT, Thompson EE, Di Rienzo A. Complex signatures of natural selection at the Duffy blood group…
VCF heterozygosity
VCF heterozygosity 0 Hello, I want some opinions. I am new to this and need to calculate the heterozygosity by contig. I have my VCF file, and I used six samples. I got the GT of the samples, and I ended in a file that looks like this: SRHA02000001 316…
Error getting the genome on clinvaR
Error getting the genome on clinvaR 1 Hi, I am trying to use clinvaR following this vignette (here ) but when I try to download and Import 1000 Genomes VCF, I get an error: Cannot open specified tabix file: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr15.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz Error in read.table(text = paste(output, collapse = “\n”), header =…
Accurate detection of identity-by-descent segments in human ancient DNA
Ethics No new aDNA data were generated for this study and we only analysed previously published and publicly available aDNA data. Identifying biological kin is a standard analysis in the aDNA field. Permission for aDNA work on the archaeological samples was granted by the respective excavators, archaeologists, curators and museum…
The Biostar Herald for Tuesday, December 19, 2023
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, and was edited…
Search for specific SNPs in VCF files of patients.
Search for specific SNPs in VCF files of patients. 0 I have 490 genomes from 490 patients in VCF format. I created a Multi VCF file from these VCFs. I want to find 2 mutations (Y215C and G325R) in these patients, count the number of patients who have these SNPs…
Multiallelic variants when merging VCF’s with GLnexus
Multiallelic variants when merging VCF’s with GLnexus 0 I’m attempting to combine around 140 .g.vcf files into a single file using GLnexus on the DNAnexus platform. To examine multiallelic variants, I’m normalizing the files using the bcftools norm -m-any $file command. While merging the original VCF files (generated with GATK)…
ftbfs and autopkgtest regression with htslib 1.19
Source: cyvcf2 Version: 0.30.22-1 Severity: important Tags: ftbfs upstream With the introduction of htslib 1.19 in experimental, cyvcf2 is experiencing test failures at package build time and autopkgtest time. The relevant part of the error looks like: cyvcf2/tests/test_reader.py …………………Fatal Python error: Aborted Current thread 0x00007fa7874de040 (most recent call first): File “/<<PKGBUILDDIR>>/.pybuild/cpython3_3.11_cyvcf2/build/cyvcf2/tests/test_reader.py”, line 285…
Require Genotypes in VCF file in order to output as 0/1/2 matrix.
vcftools Error: Require Genotypes in VCF file in order to output as 0/1/2 matrix. 2 Hi everyone, I have a vcf-file and I’m trying to convert my vcf file into 012 genotype matrix using the following code: vcftools –vcf myfile.vcf –012 –out out_file I didn’t have problems when I run…
Require Genotypes in VCF file in order to output IMPUTE format.
Error: Require Genotypes in VCF file in order to output IMPUTE format. 0 Hello, I am trying to export a VCF file in IMPUTE format and keep getting the same error message: Code: module load htslib/1.17 module load samtools/1.17 module load bcftools/1.17 module load java/17.0.8 module load python3 module load…
Annotate variants with ensembl rest api
Annotate variants with ensembl rest api 0 I have a variant file (.vcf.gz), and I want to annotate this file using the Ensembl Rest API, particularly the Vep Rest API. I am new to this variant annotation; however, I have seen a couple of codes from the Ensembl page on…
Convert bed file from hg19 to GRCH38
Convert bed file from hg19 to GRCH38 1 Hello everyone! I have a list of over 500,000 rs and I would like to obtain the coordinates (BED file) on the GRCH38 reference genome. I am using the UCSC Table Browser tool, but unfortunately, it doesn’t find 90,000 rs, and since…
DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States
UPMC Presbyterian is hiring a full-time Bioinformatics Scientist to support the Molecular & Genomic Pathology Lab! This role will be scheduled for daylight shifts, Monday-Friday. The Molecular & Genomic Pathology Laboratory is a dynamic, state-of-the-art clinical laboratory that prides itself on delivering the highest quality of patient care through cutting-edge…
Variant calling using HaplotypeCaller does not show #FILTER information
Variant calling using HaplotypeCaller does not show #FILTER information 0 Hi All, I would like to ask for Variant Calling using HaplotypeCaller. It’s supposed that after doing the HaplotypeCaller, the #FILTER columns in gvcf files shall show the ‘PASS/LowQ’ however in my case, the output #FILTER only shows ‘.’ without…
How to compute Hudson’s/Bhatia’s FST in R OR with vcf?
How to compute Hudson’s/Bhatia’s FST in R OR with vcf? 1 Hi everyone, How can I compute hierarchical Fst with Bhatia’s/Hudson’s estimator using a vcf as input? My data is structured like this: there are individuals within sampling sites, and sampling sites within groups. My vcfs contain SNP data (~1000…
convert VCF to gVCF
Your question is not completely clear, but since the most sensible ways to understand it have the same answer, I’m gonna go with that. I have the exact reference fasta used for generating the VCFs TLDR: You don’t have enough information to do this with just VCFs and reference fasta….
Whole mitochondrial and chloroplast genome sequencing of Tunisian date palm cultivars: diversity and evolutionary relationships | BMC Genomics
Johnson DV, Al-Khayri JM, Jain SM. Introduction: Date Production Status and Prospects in Africa and the Americas. In: Al-Khayri J, Jain S, Johnson D, editors. Date Palm Genetic Resources and Utilization: Volume 1: Africa and the Americas. Springer Netherlands, Dordrecht; 2015. p. 1–18. doi.org/10.1007/978-94-017-9694-1_1 Gros-Balthazard M, Hazzouri KM, Flowers JM. Genomic…
Indigenous Australian genomes show deep structure and rich novel variation
Inclusion and ethics The DNA samples analysed in this project form part of a collection of biospecimens, including historically collected samples, maintained under Indigenous governance by the NCIG11 at the John Curtin School of Medical Research at the Australian National University (ANU). NCIG, a statutory body within ANU, was founded…
Do you have to run separate pca or covariate file for different number of samples?
Hi. I have two different phenotypes to run GWAS quantitative analysis (–glm), which are bmi and hdl. As for input, I have input phenotype file, genotype file and covariate file. The special circumstance here is that I have different number of participants for each different phenotype, meaning that some participants…
max-maf not filtering properly
Hi Chris, I have a vcf file for which I have left aligned and split multi-allelic sites. Then used, plink2 –vcf test.vcf –make-bed –out test1; this gives me binaries file. then, I updated FID and sex (all males, all founders). In plink2 plink2 –bfile test1 –max-maf 0.01 –geno 0.05 –make-bed…
Individual vs. joint call VCFs
Individual vs. joint call VCFs 0 Is there any way to figure out and be sure if a VCF file is individually called or jointly called? Is there any line in the VCF header to look at for this? GATK VCF WGS • 62 views • link updated 2 hours…
GATK GenomicsDBImport too slow
GATK GenomicsDBImport too slow 1 Hello, I have 3264 g.VCFs and an interval list for the reference genome that contains 20000 contigs. The interval list looks like the following: utg19_pilon_pilon:1-42237 utg22_pilon_pilon:1-49947 utg24_pilon_pilon:1-61707 utg30_pilon_pilon:1-459006 utg38_pilon_pilon:1-129173 utg40_pilon_pilon:1-101813 utg58_pilon_pilon:1-143918 utg93_pilon_pilon:1-186249 utg100_pilon_pilon:1-87875 utg104_pilon_pilon:1-49315 I am running the GATK GenomicsDBImport command as follows: gatk –java-options…
extract variants from 1000 Genome VCF files
extract variants from 1000 Genome VCF files 0 Hi everyone, I have a gVCF containing genetic information from different individuals, and I would like to extract specific SNPs. The SNPs of interest are listed in a BED file with the following structure (the end position rapresent the real position of…
How to input list into GenomicsDBImport with snakemake?
How to input list into GenomicsDBImport with snakemake? 0 Hello! I’m currently writing a pipeline with snakemake for exome data. During joint variant calling I need to use GATK’s GenomicsDBImport, although I’m unsure how to input all the samples at once. Here’s the simplified version of the rule I’m using:…
The Biostar Herald for Monday, December 11, 2023
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, cmdcolin, and was edited by…
bcftools=1.18 not filtering correcting MAF
bcftools=1.18 not filtering correcting MAF 0 Hi, I have encountered some issues when using bcftools v.1.11, v.1.14 or v.1.18 I want to filter MAF<=0.01 & ‘F_MISSING<0.1’ for rare-variant analysis. I have a vcf file mapped to the GRCh37, left aligned, and multi-allelic split. bcftools view -q 0.01:minor test1.vcf > test2.vcf…
How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)?
How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)? 2 Does anyone know how display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)? Is this possible at all? The number of variants…
r – Fst calculation from VCF files
I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…
vcfdist: accurately benchmarking phased small variant calls in human genomes
The affine gap design space for selecting variant representations As demonstrated in Fig. 1, the main issue with a difference-based format such as VCF is that often there are multiple reasonable sets of variant calls that can be used to represent the same final sequence relative to a reference FASTA. Since…
GetPileupSummaries intervals-list with Targeted Sequencing?
GetPileupSummaries intervals-list with Targeted Sequencing? 0 Hi! I am applying the GetPileUpSummaries, for somatic variant calling starting from targeted sequencing .fasta. I aligned the file with the GrCh38 reference. And currently I am at the GetPileUpSummariesStep. gatk –java-options -Xmx200G GetPileupSummaries \ -I $RECBAM \ -L ???? \ -O $OUTPUT \…
Infer ancestry for RNA-seq data
Infer ancestry for RNA-seq data 0 I generated VCF files with bcftools for 4 patient RNA-seq samples. I was also able to generate bed, bim, and fam files with PLINK for these files. I want some guidance on how to infer ancestry for these RNA-seq samples: How do I find…
How to subtract variants from one VCF file to another?
How to subtract variants from one VCF file to another? 1 I have 2 VCF files from running the GATK Joint Genotyping workflow on two different groups of samples. I would like to filter out all the variants that are common to both VCF files and output a new VCF…
How to query 1000 genomes project VCF files for specific regions without downloading whole chromosomes first?
How to query 1000 genomes project VCF files for specific regions without downloading whole chromosomes first? 2 Hi, I am trying to find a way to extract an arbitrary region of human genome from the 1000 genomes project’s VCF files without having to download the genome or individual chromosome files…
Failed to open /ROH/.log. Try changing the –out parameter.
Error: Failed to open /ROH/.log. Try changing the –out parameter. 0 when I used this code in R system(“plink –vcf Pakistan.total.vcf –homozyg –homozyg-window-snp 50 –homozyg-snp 50 –homozyg-window-missing 3 –homozyg-kb 100 –homozyg-density 1000 –allow-extra-chr –out /ROH/plink/n”) I got this error: Error: Failed to open /ROH/plink/n.log. Try changing the –out parameter. How…
Comparison of DNA sequencing services
This page lists the different DNA sequencing services. 2 main types can be distinguished: Whole exome sequencing is the middle ground between these two types, where a large amount of genes are sequenced, but only those that produce meaningful differences important for practical purposes, which is only 1% of the…
gatk SelectVariants is giving dupilicate allele error while extracting SNPs out of vcf file
gatk SelectVariants is giving dupilicate allele error while extracting SNPs out of vcf file 1 I am trying to extract snps out of merged vcf file using gatk SelectVariants command but it is giving following error: htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 73: Duplicate allele…
Convert NCBI Downloaded files to ANNOVAR format
Convert NCBI Downloaded files to ANNOVAR format 0 I have been trying to understand from the ANNOVAR documentation and other sites the steps needed to make these files from NCBI available to ANNOVAR. I admit to being new to bioinformatics, but have been a software developer for 30+ years. My…
Genomics England hiring PhD Bioinformatics Intern in London, England, United Kingdom
Company DescriptionGenomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our ability to…
ASEReadCounter output wrong number of coverage
ASEReadCounter output wrong number of coverage 0 Hi, I am using ASEReadCounter to count the number of reads per variant in a BAM file. For some positions, it will report 1 read covered(1 refCount or 1 altCount) while there is no read covered at those positions after checking it in…
Building reference dbSNP file using WGS samples
Building reference dbSNP file using WGS samples 2 Dear scientific community, I have to call variants from WGS samples of citrus. I used GATK pipeline for post processing of aligned reads but reference dbSNP file is not available for citrus sinensis. I am using bootstraping method. Removed duplicates and called…
How To Install bedtools on Debian 11
In this tutorial we learn how to install bedtools on Debian 11. bedtools is suite of utilities for comparing genomic features Introduction In this tutorial we learn how to install bedtools on Debian 11. What is bedtools bedtools is: The BEDTools utilities allow one to address common genomics tasks such…
Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release
Body: Hello, Biostars Community, I am working on creating a custom database of variants using the VCF from the latest dbSNP alpha release available at ftp.ncbi.nih.gov/snp/population_frequency/latest_release/. I have encountered a couple of issues that I’m hoping someone might help me resolve. Firstly, the chromosome encoding uses RefSeq IDs (e.g., NC_000007.12)…
vcftools’ –weir-fst-pop and R hierfstats package’s varcomp.glob()
Differing output: vcftools’ –weir-fst-pop and R hierfstats package’s varcomp.glob() 0 I’m getting different Fst values when I use vcftools –weir-fst-pop vs. the varcomp.glob() function from the R hierfstats package. I’m not sure why because they both calculate Fst based on Weir & Cockerham’s 1984 paper. Can someone please explain why?…
vcftools
vcftools 1 Hi, I tried this code but I couldn’t get any output. Please guide me to resolve this issue bash for i in {1..2} do vcftools –LROH –vcf Pakistan.total.vcf –out ${i} –chr i done vcftools • 41 views • link updated 2 hours ago by Barista ▴ 10 •…
locuszoom error
locuszoom error 1 Hi there, I downloaded locuszoom in it’s entirty from their github page. I have everything in order but when I attempt to run the following I get an error: locuszoom_test/bin/locuszoom –metal chr7.snx13.marker.locuszoom.metal –ld chr7.imputed.concat.sorted.nomono.80.recode.maf01.pheno.vcf.2.ld.locuszoom.input –refsnp rs1533245 –prefix chr7.snx13.rs1533245 /share/hennlab/progs/locuszoom_test/bin/../src/m2zfast.py:82: SyntaxWarning: invalid escape sequence ‘\d’ RE_SNP_1000G = re.compile(“chr(\d+|[a-zA-z]+):(\d+)$”);…
PhD Bioinformatics Intern Job in Greater London, Pharmaceuticals & Life Sciences Career, Intern/Graduate Jobs in Genomics England
Company Description Genomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our…
Association analysis of production traits of Japanese quail (Coturnix japonica) using restriction-site associated DNA sequencing
Tsudzuki, M. Mutations of Japanese quail (Coturnix japonica) and recent advances of molecular genetics for this species. J. Poult. Sci. 45, 159–179 (2008). CAS Google Scholar Recoquillay, J. et al. A medium density genetic map and QTL for behavioral and production traits in Japanese quail. BMC Genom. 16, 10 (2015)….
Which program, tool, or strategy do you use to visualize genomic rearrangements?
Which program, tool, or strategy do you use to visualize genomic rearrangements? 5 Which program, tool, or strategy do you use to visualize genomic rearrangements? In relation to my master thesis I’m working on tools to visualize fusion genes. In that regard I’m interested in any and all strategies and…
Loftee no splice site annotations
Loftee no splice site annotations 1 Hello! I am using Loftee in my VEP pipeline and after some fights with my code everything works now, but the splice site annotations…meaning that I dont get them. There is no error at all, but my vcf files do not contain a single…
Where do these snpeff annotation come from?
Where do these snpeff annotation come from? 0 I am annotating a VCF with annotation from snpeff, which I want to use eventually to parse for predicted loss of function variants I want to understand the annotation better and document how they are happening. I run this command: snpEff “hg38″…
BBtools bug in reporting the number of substitutions in the console output, it seems to report insanely high rates of heterozygosity
Hello, I know Brian is sometimes around, but here is my command: while read p; do callvariants.sh in=${p}.recal.bam ploidy=2 vcf=${p}.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g ; done <ID java -ea -Xmx50g -Xms50g -cp /home/alessandro/software/bbmap/current/ var2.CallVariants in=ancestor.recal.bam ploidy=2 vcf=ancestor.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g Executing var2.CallVariants [in=ancestor.recal.bam, ploidy=2, vcf=ancestor.20score.vcf, useiden tity=f, overwrite=true, ref=Adineta_vaga.fsa,…
calculate nucleotide diversity from whole-genome-sequence data for individual genes
calculate nucleotide diversity from whole-genome-sequence data for individual genes 0 I am trying to calculate per gene nucleotide diversity (pi) for whole-genome-sequence data. I basically have whole genome resequenced data for many hundred individuals with ~1.2 million SNPs and a well annotated species with 36k genes. I was wondering if…
GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data
GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data 0 Dear all, Is there any best practice for the mouse snp indel vcf files using GATK Mutect2 for mouse whole exome data? For mm10, it seems have several available, for mm39, it seems the newest is from…
Transformer-based tool recommendation system in Galaxy | BMC Bioinformatics
Kumar A, Rasche H, Grüning B, Backofen R. Tool recommender system in Galaxy using deep learning. GigaScience. 2021. doi.org/10.1093/gigascience/giaa152. Article PubMed PubMed Central Google Scholar The galaxy community: the galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 50(W1):W345-W35104 2022. (2022). doi.org/10.1093/nar/gkac247 Gil Y, Ratnakar…
Bcftools consensus when reference is a deletion
Bcftools consensus when reference is a deletion 1 Hello, I am trying to call a consensus on a VCF file like so: bcftools consensus species.vcf.gz -f Reference.fasta –absent N > Consensus.fasta Error: The site SUPER_1:173197 overlaps with another variant, skipping… I looked at this site and included the previous site…
bam or VCF files from GSE75010
bam or VCF files from GSE75010 1 Hi all I’m planning to run a variant calling analysis using Microarray data GSE75010 that contains GSE75010_RAW.tar and GSE75010_complete_dataset.csv.gz. I used to download the .fastq files using SRA Run numbers through Ubuntu/Linux to get .bam and VCF files. However, this is not the…
VCF conservation into Treemix
VCF conservation into Treemix 1 I have a multi-sample vcf file with ~7 millions SNPs. Now I want to convert it into required format of the Treemix. I run it using vcf2treemix.sh along with plink2treemix.py, but plink2treemix.py works very very slow. So that if I use it, the analysis in…
Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome sequencing
Loftus, R. T., MacHugh, D. E., Bradley, D. G., Sharp, P. M. & Cunningham, P. Evidence for two independent domestications of cattle. Proc. Natl Acad. Sci. USA 91, 2757–2761 (1994). Article ADS CAS PubMed PubMed Central Google Scholar Verdugo Marta, P. et al. Ancient cattle genomics, origins, and rapid turnover…
Bedtools intersection
Bedtools intersection 0 While intersecting a VCF file and bed to obtain the reads that map to a class of genes, bedtools gives the following error: ***** WARNING: File TARGT_First.bed has inconsistent naming convention for record: chr1 45794952 45795134 MUTYH 1 ***** WARNING: File TARGT_First.bed has inconsistent naming convention for…
[maftools]Too many multi_hit and missense mutation
[maftools]Too many multi_hit and missense mutation 0 Describe the issue When using maftools to plot mutational summary data, I encountered some issues: I use WES data to generate a filtered VCF file, and then utilize VEP for annotation to obtain an MAF file. The MAF file contains an excessive number…
The number of variations in the pan-genome is reduced compared to the variations in the input VCF file
The number of variations in the pan-genome is reduced compared to the variations in the input VCF file 0 Do vg filter out some variants during the construction of the pan-genome, and if so, what are the criteria for filtering? The number of variations in the pan-genome is reduced compared…
Annotated file cells show string
Annovar – Annotated file cells show string 0 Hello, I am new to bioinformatics and trying to get annovar to work. I was able to download the databases and get annovar working with the example files. But when I try to use VCF files, the annotated file’s cells have a…
Genotypes in vcf files
Genotypes in vcf files 0 Hi all, I have a variety of genotypes outputted in my vcf, these being: 0 1 0/0 0/1 1/1 1/2 Now I can assume these ones: 0 1 0/0 = homozygous for the reference 0/1 = heterozygous for the alternate 1/1 = homozygous for the…
Pruning with –indep-pairwise with plink 1.9
I’m new to PLINK and I would like to obtain a file with SNPs in approximate linkage equilibrium. Here is my script and the outputs of each step. If someone could tell me if there is an error in the script because at…
selection of reference genome
selection of reference genome 1 hello everyone, I got a vcf file with variation called using hg38 as reference genome. I wonder what would happen if I use hg19 as reference genome to annotate these variants. Would it be OK or get wrong? Thanks! hg19 reference genome hg38 • 25…
SNPs of a specific mouse strain
Hi, I wonder how can I get SNPs for a particular mouse strain like C57BL6. I have downloaded a mouse reference vcf from ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-2112-v8-SNPs_Indels/mgp_REL2021_snps.rsID.vcf.gz Its header is #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129P2_OlaHsd 129S1_SvImJ 129S5SvEvBrd A_J AKR_J B10.RIII BALB_cByJ BALB_cJ BTBR_T+_Itpr3tf_J BUB_BnC3H_HeH C3H_HeJ C57BL_10J C57BL_10SnJ C57BL_6NJ…
normalize not left-normalizing?
I’m running plink2 to convert a vcf to a pgen with pseudobiallelic variants. Calling –normalize does not seem to left-normalize as I would expect, at least when I look at the .pvar. Log PLINK v2.00a6LM AVX2 Intel (21 Nov 2023) www.cog-genomics.org/plink/2.0/(C) 2005-2023 Shaun Purcell, Christopher Chang …
Help finding the correct file version for dbSNP VCF ID replacement
Tried to use dbSNP version 156 using bcftools to replace the ID field in a reference VCF which originally contains a different position ID format. It seems the bcftools command did not work because a numeric chromosome column format in the #CHROM field which might not be compatible with bcftools…
Merging several vcf files for GWAS?
Merging several vcf files for GWAS? 0 Hello! I am a Medical Student without much background in Bioinformatics trying to perform analysis for my first GWAS study, tremendously overwhelmed. It’s a Case Control Association Study with samples from 50 subjects, that we sampled using Novogene NGS platform. The problem is,…
Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA
Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA 0 Dear Biostar Community I’m currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements. For this, I’ve downloaded the variants file (VCF) and the…
update FMT/GT in VCF file using bcftools annotate
update FMT/GT in VCF file using bcftools annotate 1 Hi – I am trying to use bcftools to overwrite the existing FMT/GT values in a VCF file, matching by the ID column, in addition to CHROM and POS. I tried creating a .txt.gz file as an annotation file, but got…
How to overlap patient VCF with ClinVar database annotation using bedtools?
How to overlap patient VCF with ClinVar database annotation using bedtools? 1 Hello, I’m trying to help a colleague who is trying to add ClinVar databases clinical significance column to VCF samples that she analysed. More specifically, we are trying to add overlapping/common variant annotation so that if the variant…
All variants in a VCF register as “invalid genotype records in input file”
ANNOVAR Error: All variants in a VCF register as “invalid genotype records in input file” 0 Hello, I am running into an error with convert2annovar.pl where it is registering all of the variants in my VCF as invalid. My VCF is 1925 variants plus the header with the following format:…
The Biostar Herald for Monday, November 20, 2023
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…
bcftools info and filter error
bcftools info and filter error 0 Hi. I am trying to do a comparative analysis of my vcf file against the vcf files of ExAc. I’m using the bcftool isec here. I am getting an error that says: [W::vcf_parse_info] INFO ‘HOM_CONSANGUINEOUS’ is not defined in the header, assuming Type=String [W::vcf_parse_filter]…
BaseRecalibrator takes forever to run. Any suggestions?
BaseRecalibrator takes forever to run. Any suggestions? 1 Hello, I am trying to run BaseRecalibrator tool from GATK package and it takes forever (more than 4 days per one bam file). The command I’m using is: gatk BaseRecalibrator -I NG-01_1_S1_dedup_bwa.bam -R /rumi/shams/genomes/hg38/hg38.fa –known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz –known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz –known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table…
fasta – Get a certain gene sequence from bam/vcf and reference
I need to get a fasta sequence of a certain gene for a certain worm strain that is different from reference. I have a reference genome, BAM for the strain of interest, and coordinates of the gene. I know that vcftools can convert bam to fasta, but I do not…
Compare two VCF / BAM / FASTQ files for degree of relativity
Compare two VCF / BAM / FASTQ files for degree of relativity 1 Hi, I have recently got reunited with a sibling, but we do not know if we only share the mother, or we have a common father as well. I already have made a full genome sequencing test…
Handling male samples chrX vcf genotype from 1000G high-coverage 30x
Handling male samples chrX vcf genotype from 1000G high-coverage 30x 0 Hello, I am working with the vcf files from the 1000G project high-coverage (30x) release. I do not completely understand how have the authors handled the genotypes of male individuals in the non-pseudoautosomal chrX regions. The genotypes in the…
SNP calling with many samples using bcftools
SNP calling with many samples using bcftools 0 Hello, I aim to identify SNPs from approximately 500 BAM files (non-human). I’m opting for bcftools since GATK, even with the Spark addition, takes a substantial 6 hours per sample. My objective is to generate a single VCF file encompassing all SNPs…
set GT values to missing in VCF file for specific sample-variant combinations
Hi – I have a multi-subject vcf file and would like to set specific genotypes (GT) to missing for a set of subjects. However, the subjects that I need to set to missing are different for each variant. For example, suppose I have this: CHROM POS ID FORMAT sub1 sub2…
ImputePipelinePlugin fails when trying to imputing SNPs on a gvcf file.
Hello everyone, I hope you’re doing great. I’m trying to impute a gvcf using a PHG database. As far as I’m concerned and because of the logs (attached here) of the steps 1 and 2 in the PHG Wiki guide, It seems that I have stablished and populated the PHG…
Query regarding callsets used as known sites in Variant Calling
Query regarding callsets used as known sites in Variant Calling 0 Hi, Where can I learn more about the standard VCF files that are used as known sites during the BQSR step in Variant Calling with GATK? The files are: Homo_sapiens_assembly38.dbsnp138.vcf Homo_sapiens_assembly38.known_indels.vcf.gz Mills_and_1000G_gold_standard.indels.hg38.vcf.gz I am aware that these files are…
Appropriate genome reference for converting TCGA VCF files to MAF
Appropriate genome reference for converting TCGA VCF files to MAF 0 I have a directory of MAF files obtained from TCGA and I want to convert it to VCF format. Reference: GRCh38.d1.vd1 Reference Sequence Source: gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files maf2vcf.pl –input-maf maf/* –output-dir VCF -ref-fasta /home/melchua/.vep/homo_sapiens/GRCh38/GRCh38.d1.vd1.fa.tar.gz Traceback: Use of uninitialized value $lines in…
Bgen file not being opened by PRSice
Bgen file not being opened by PRSice 0 I used the following command to calculate PRS of a sequenced file coming from a collaborator. I imputed the vcf file which gave me separate vcf files for each chromosome. I then converted them to bgen and generated bgi and sample files…
GATK SelectVariants –remove-unused-alternates dropping real INDELs?
I’m using a VCF that is generated by GenotypeGVCFs (so doing calibration based on a larger cohort of samples) and my goal is to only extract variants of interest to one specific sample. The VCF in the subset tends to include some variants that were present in the original joint…
University of Alabama at Birmingham hiring BIOINFORMATICIAN I in Birmingham, Alabama, United States
Position Summary: The primary role is to execute a variety of data management and analysis tasks, ensuring the quality, reproducibility, and efficiency of processes related to high-dimensional data. You will collaborate with study investigators and fellow bioinformatics professionals within the department to contribute to high-quality, reproducible research across various scientific…
Samtools index not working in Snakemake
I am setting up a Snakemake pipeline for sequencing reads alignment and variants calling. But the samtools index rule is not activated, and the subsequent haplotype caller rule fail. I think it is because the samtools index rule is not perceived as necessary to execute the output of rule all…
Bioconductor – AnnotationHub
DOI: 10.18129/B9.bioc.AnnotationHub Client to access AnnotationHub resources Bioconductor version: Release (3.6) This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be…
Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error
Source: bcftools Version: 1.18-1 Severity: serious Tags: ftbfs Justification: ftbfs Control: forwarded -1 github.com/samtools/bcftools/issues/2036 Dear Maintainer, bcftools currently ftbfs on armhf due to multiple test_vcf_merge failures with Bus error[1]. I already informed upstream[2]. This bug is mostly to keep track of the issue on Debian side and eventually comment on possible Debian specific…
MELT-SINGLE “priors” list usage
I’m trying to get “priors” working with MELT-SINGLE but nothing I’ve done seems to be making a difference. As a test, I run 1 sample without a priors list java -Xmx6G -jar MELT/MELTv2.2.2.jar Single -bamfile HT-7604-01A-11D-2088.bam -t MELT/me_refs/Hg38/ALU_MELT.zip -h hg38.chrXYM_alts.fa -n MELT/add_bed_files/Hg38/Hg38.genes.bed -w HT-7604-01A-11D-2088-run1/ Command Line:MELT.jar Single -bamfile HT-7604-01A-11D-2088.bam -t…
Divergent mechanisms of reduced growth performance in Betula ermanii saplings from high-altitude and low-latitude range edges
Aizawa M, Yoshimaru H, Saito H, Katsuki T, Kawahara T, Kitamura K et al. (2009) Range‐wide genetic structure in a north‐east Asian spruce (Picea jezoensis) determined using nuclear microsatellite markers. J Biogeogr 36(5):996–1007 Article Google Scholar Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated…
best annotator for mitochondria and heteroplasmy calculation
best annotator for mitochondria and heteroplasmy calculation 0 Dear all, I am analysing sequencing data on mitochondrial DNA with the aim of finding pathogenic variants. I only have the vcf’s. Does anyone have any suggestions as to where I can annotate them? Unfortunately Annovar doesn’t annotate anything for me and…
BAM file for phasing
BAM file for phasing 0 Hi all, I’m new in bioinformatics, and i’m trying to do phasing and imputation to WGS-level. For imputation with Beagle, I would like to make a bref file from a vcf file. And I have to phase the reference panel for that. Is a BAM…
Bioinformatician job with Proclinical Staffing
Proclinical is seeking a Bioinformatician for a digital IT and data science provider located in Cambridge, MA. Must be eligible to work in the US. Job Responsibilities: Deploy and optimize bioinformatic workflows for the integration and analysis of NGS data, including short and long read sequencing data. Interpret results from…
Landscape genomics reveals adaptive genetic differentiation driven by multiple environmental variables in naked barley on the Qinghai-Tibetan Plateau
Abebe TD, Naz AA, Léon J (2015) Landscape genomics reveal signatures of local adaptation in barley (Hordeum vulgare L.). Front Plant Sci 6:813 Article PubMed PubMed Central Google Scholar Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664 Article CAS PubMed …