Tag: VCF

Different relatedness estimates by PLINK and VCFTOOLS despite same method

According to the vcftools manual, specifying the “–relatedness2” flag allows calculating relatedness statistics using the method by Manichaikul et al., BIOINFORMATICS 2010 (doi:10.1093/bioinformatics/btq559). That is, based on KING. According to the PLINK manual, PLINK uses the same method to calculate relatedness when specifying the flag “–make-king-table”. So, although both PLINK…

Continue Reading Different relatedness estimates by PLINK and VCFTOOLS despite same method

PCA from plink2 for SGDP using a pangenome and DeepVariant

Hi there, I’m doing my first experiments with PCA and UMAP as dimensionality reductions to visualize a dataset I’ve been working on. Basically, I used the samples from the SGDP which I then mapped on the human pangenome for, finally, calling small variants with DeepVariant. I moved on with some…

Continue Reading PCA from plink2 for SGDP using a pangenome and DeepVariant

Remote Software Quality Engineer III – Bioinformatics Job at Natera

JOB TITLE: Software Quality Engineer III – Bioinformatics LOCATION: Remote, USA PRIMARY RESPONSIBILITIES: Perform software verification, define and execute test cases and scenarios required for software quality assurance and regulatory compliance. Perform system analysis, assess risk, and develop strong test strategies by analyzing product design and technical specifications, and by…

Continue Reading Remote Software Quality Engineer III – Bioinformatics Job at Natera

Imputing missing genotypes in –score

Does plink 2 impute missing genotypes with this pipe? plink2 –threads 1 \                –read-freq freq.afreq \                –vcf tube.vcf \                –score score_file.anno.plink2.tsv ignore-dup-ids \             …

Continue Reading Imputing missing genotypes in –score

Genomic insights into Plasmodium vivax population structure and diversity in central Africa | Malaria Journal

Hamblin MT, Di Rienzo A. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am J Hum Genet. 2000;66:1669–79. Article  CAS  PubMed  PubMed Central  Google Scholar  Hamblin MT, Thompson EE, Di Rienzo A. Complex signatures of natural selection at the Duffy blood group…

Continue Reading Genomic insights into Plasmodium vivax population structure and diversity in central Africa | Malaria Journal

VCF heterozygosity

VCF heterozygosity 0 Hello, I want some opinions. I am new to this and need to calculate the heterozygosity by contig. I have my VCF file, and I used six samples. I got the GT of the samples, and I ended in a file that looks like this: SRHA02000001 316…

Continue Reading VCF heterozygosity

Error getting the genome on clinvaR

Error getting the genome on clinvaR 1 Hi, I am trying to use clinvaR following this vignette (here ) but when I try to download and Import 1000 Genomes VCF, I get an error: Cannot open specified tabix file: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr15.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz Error in read.table(text = paste(output, collapse = “\n”), header =…

Continue Reading Error getting the genome on clinvaR

Accurate detection of identity-by-descent segments in human ancient DNA

Ethics No new aDNA data were generated for this study and we only analysed previously published and publicly available aDNA data. Identifying biological kin is a standard analysis in the aDNA field. Permission for aDNA work on the archaeological samples was granted by the respective excavators, archaeologists, curators and museum…

Continue Reading Accurate detection of identity-by-descent segments in human ancient DNA

The Biostar Herald for Tuesday, December 19, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, and was edited…

Continue Reading The Biostar Herald for Tuesday, December 19, 2023

Search for specific SNPs in VCF files of patients.

Search for specific SNPs in VCF files of patients. 0 I have 490 genomes from 490 patients in VCF format. I created a Multi VCF file from these VCFs. I want to find 2 mutations (Y215C and G325R) in these patients, count the number of patients who have these SNPs…

Continue Reading Search for specific SNPs in VCF files of patients.

Multiallelic variants when merging VCF’s with GLnexus

Multiallelic variants when merging VCF’s with GLnexus 0 I’m attempting to combine around 140 .g.vcf files into a single file using GLnexus on the DNAnexus platform. To examine multiallelic variants, I’m normalizing the files using the bcftools norm -m-any $file command. While merging the original VCF files (generated with GATK)…

Continue Reading Multiallelic variants when merging VCF’s with GLnexus

ftbfs and autopkgtest regression with htslib 1.19

Source: cyvcf2 Version: 0.30.22-1 Severity: important Tags: ftbfs upstream With the introduction of htslib 1.19 in experimental, cyvcf2 is experiencing test failures at package build time and autopkgtest time. The relevant part of the error looks like: cyvcf2/tests/test_reader.py …………………Fatal Python error: Aborted Current thread 0x00007fa7874de040 (most recent call first): File “/<<PKGBUILDDIR>>/.pybuild/cpython3_3.11_cyvcf2/build/cyvcf2/tests/test_reader.py”, line 285…

Continue Reading ftbfs and autopkgtest regression with htslib 1.19

Require Genotypes in VCF file in order to output as 0/1/2 matrix.

vcftools Error: Require Genotypes in VCF file in order to output as 0/1/2 matrix. 2 Hi everyone, I have a vcf-file and I’m trying to convert my vcf file into 012 genotype matrix using the following code: vcftools –vcf myfile.vcf –012 –out out_file I didn’t have problems when I run…

Continue Reading Require Genotypes in VCF file in order to output as 0/1/2 matrix.

Require Genotypes in VCF file in order to output IMPUTE format.

Error: Require Genotypes in VCF file in order to output IMPUTE format. 0 Hello, I am trying to export a VCF file in IMPUTE format and keep getting the same error message: Code: module load htslib/1.17 module load samtools/1.17 module load bcftools/1.17 module load java/17.0.8 module load python3 module load…

Continue Reading Require Genotypes in VCF file in order to output IMPUTE format.

Annotate variants with ensembl rest api

Annotate variants with ensembl rest api 0 I have a variant file (.vcf.gz), and I want to annotate this file using the Ensembl Rest API, particularly the Vep Rest API. I am new to this variant annotation; however, I have seen a couple of codes from the Ensembl page on…

Continue Reading Annotate variants with ensembl rest api

Convert bed file from hg19 to GRCH38

Convert bed file from hg19 to GRCH38 1 Hello everyone! I have a list of over 500,000 rs and I would like to obtain the coordinates (BED file) on the GRCH38 reference genome. I am using the UCSC Table Browser tool, but unfortunately, it doesn’t find 90,000 rs, and since…

Continue Reading Convert bed file from hg19 to GRCH38

DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States

UPMC Presbyterian is hiring a full-time Bioinformatics Scientist to support the Molecular & Genomic Pathology Lab! This role will be scheduled for daylight shifts, Monday-Friday. The Molecular & Genomic Pathology Laboratory is a dynamic, state-of-the-art clinical laboratory that prides itself on delivering the highest quality of patient care through cutting-edge…

Continue Reading DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States

Variant calling using HaplotypeCaller does not show #FILTER information

Variant calling using HaplotypeCaller does not show #FILTER information 0 Hi All, I would like to ask for Variant Calling using HaplotypeCaller. It’s supposed that after doing the HaplotypeCaller, the #FILTER columns in gvcf files shall show the ‘PASS/LowQ’ however in my case, the output #FILTER only shows ‘.’ without…

Continue Reading Variant calling using HaplotypeCaller does not show #FILTER information

How to compute Hudson’s/Bhatia’s FST in R OR with vcf?

How to compute Hudson’s/Bhatia’s FST in R OR with vcf? 1 Hi everyone, How can I compute hierarchical Fst with Bhatia’s/Hudson’s estimator using a vcf as input? My data is structured like this: there are individuals within sampling sites, and sampling sites within groups. My vcfs contain SNP data (~1000…

Continue Reading How to compute Hudson’s/Bhatia’s FST in R OR with vcf?

convert VCF to gVCF

Your question is not completely clear, but since the most sensible ways to understand it have the same answer, I’m gonna go with that. I have the exact reference fasta used for generating the VCFs TLDR: You don’t have enough information to do this with just VCFs and reference fasta….

Continue Reading convert VCF to gVCF

Whole mitochondrial and chloroplast genome sequencing of Tunisian date palm cultivars: diversity and evolutionary relationships | BMC Genomics

Johnson DV, Al-Khayri JM, Jain SM. Introduction: Date Production Status and Prospects in Africa and the Americas. In: Al-Khayri J, Jain S, Johnson D, editors. Date Palm Genetic Resources and Utilization: Volume 1: Africa and the Americas. Springer Netherlands, Dordrecht; 2015. p. 1–18. doi.org/10.1007/978-94-017-9694-1_1 Gros-Balthazard M, Hazzouri KM, Flowers JM. Genomic…

Continue Reading Whole mitochondrial and chloroplast genome sequencing of Tunisian date palm cultivars: diversity and evolutionary relationships | BMC Genomics

Indigenous Australian genomes show deep structure and rich novel variation

Inclusion and ethics The DNA samples analysed in this project form part of a collection of biospecimens, including historically collected samples, maintained under Indigenous governance by the NCIG11 at the John Curtin School of Medical Research at the Australian National University (ANU). NCIG, a statutory body within ANU, was founded…

Continue Reading Indigenous Australian genomes show deep structure and rich novel variation

Do you have to run separate pca or covariate file for different number of samples?

Hi. I have two different phenotypes to run GWAS quantitative analysis (–glm), which are bmi and hdl.  As for input, I have input phenotype file, genotype file and covariate file. The special circumstance here is that I have different number of participants for each different phenotype, meaning that some participants…

Continue Reading Do you have to run separate pca or covariate file for different number of samples?

max-maf not filtering properly

Hi Chris, I have a vcf file for which I have left aligned and split multi-allelic sites. Then used, plink2 –vcf test.vcf –make-bed –out test1; this gives me binaries file. then, I updated FID and sex (all males, all founders). In plink2 plink2 –bfile test1 –max-maf 0.01 –geno 0.05 –make-bed…

Continue Reading max-maf not filtering properly

Individual vs. joint call VCFs

Individual vs. joint call VCFs 0 Is there any way to figure out and be sure if a VCF file is individually called or jointly called? Is there any line in the VCF header to look at for this? GATK VCF WGS • 62 views • link updated 2 hours…

Continue Reading Individual vs. joint call VCFs

GATK GenomicsDBImport too slow

GATK GenomicsDBImport too slow 1 Hello, I have 3264 g.VCFs and an interval list for the reference genome that contains 20000 contigs. The interval list looks like the following: utg19_pilon_pilon:1-42237 utg22_pilon_pilon:1-49947 utg24_pilon_pilon:1-61707 utg30_pilon_pilon:1-459006 utg38_pilon_pilon:1-129173 utg40_pilon_pilon:1-101813 utg58_pilon_pilon:1-143918 utg93_pilon_pilon:1-186249 utg100_pilon_pilon:1-87875 utg104_pilon_pilon:1-49315 I am running the GATK GenomicsDBImport command as follows: gatk –java-options…

Continue Reading GATK GenomicsDBImport too slow

extract variants from 1000 Genome VCF files

extract variants from 1000 Genome VCF files 0 Hi everyone, I have a gVCF containing genetic information from different individuals, and I would like to extract specific SNPs. The SNPs of interest are listed in a BED file with the following structure (the end position rapresent the real position of…

Continue Reading extract variants from 1000 Genome VCF files

How to input list into GenomicsDBImport with snakemake?

How to input list into GenomicsDBImport with snakemake? 0 Hello! I’m currently writing a pipeline with snakemake for exome data. During joint variant calling I need to use GATK’s GenomicsDBImport, although I’m unsure how to input all the samples at once. Here’s the simplified version of the rule I’m using:…

Continue Reading How to input list into GenomicsDBImport with snakemake?

The Biostar Herald for Monday, December 11, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, cmdcolin, and was edited by…

Continue Reading The Biostar Herald for Monday, December 11, 2023

bcftools=1.18 not filtering correcting MAF

bcftools=1.18 not filtering correcting MAF 0 Hi, I have encountered some issues when using bcftools v.1.11, v.1.14 or v.1.18 I want to filter MAF<=0.01 & ‘F_MISSING<0.1’ for rare-variant analysis. I have a vcf file mapped to the GRCh37, left aligned, and multi-allelic split. bcftools view -q 0.01:minor test1.vcf > test2.vcf…

Continue Reading bcftools=1.18 not filtering correcting MAF

How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)?

How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)? 2 Does anyone know how display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)? Is this possible at all? The number of variants…

Continue Reading How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)?

r – Fst calculation from VCF files

I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…

Continue Reading r – Fst calculation from VCF files

vcfdist: accurately benchmarking phased small variant calls in human genomes

The affine gap design space for selecting variant representations As demonstrated in Fig. 1, the main issue with a difference-based format such as VCF is that often there are multiple reasonable sets of variant calls that can be used to represent the same final sequence relative to a reference FASTA. Since…

Continue Reading vcfdist: accurately benchmarking phased small variant calls in human genomes

GetPileupSummaries intervals-list with Targeted Sequencing?

GetPileupSummaries intervals-list with Targeted Sequencing? 0 Hi! I am applying the GetPileUpSummaries, for somatic variant calling starting from targeted sequencing .fasta. I aligned the file with the GrCh38 reference. And currently I am at the GetPileUpSummariesStep. gatk –java-options -Xmx200G GetPileupSummaries \ -I $RECBAM \ -L ???? \ -O $OUTPUT \…

Continue Reading GetPileupSummaries intervals-list with Targeted Sequencing?

Infer ancestry for RNA-seq data

Infer ancestry for RNA-seq data 0 I generated VCF files with bcftools for 4 patient RNA-seq samples. I was also able to generate bed, bim, and fam files with PLINK for these files. I want some guidance on how to infer ancestry for these RNA-seq samples: How do I find…

Continue Reading Infer ancestry for RNA-seq data

How to subtract variants from one VCF file to another?

How to subtract variants from one VCF file to another? 1 I have 2 VCF files from running the GATK Joint Genotyping workflow on two different groups of samples. I would like to filter out all the variants that are common to both VCF files and output a new VCF…

Continue Reading How to subtract variants from one VCF file to another?

How to query 1000 genomes project VCF files for specific regions without downloading whole chromosomes first?

How to query 1000 genomes project VCF files for specific regions without downloading whole chromosomes first? 2 Hi, I am trying to find a way to extract an arbitrary region of human genome from the 1000 genomes project’s VCF files without having to download the genome or individual chromosome files…

Continue Reading How to query 1000 genomes project VCF files for specific regions without downloading whole chromosomes first?

Failed to open /ROH/.log. Try changing the –out parameter.

Error: Failed to open /ROH/.log. Try changing the –out parameter. 0 when I used this code in R system(“plink –vcf Pakistan.total.vcf –homozyg –homozyg-window-snp 50 –homozyg-snp 50 –homozyg-window-missing 3 –homozyg-kb 100 –homozyg-density 1000 –allow-extra-chr –out /ROH/plink/n”) I got this error: Error: Failed to open /ROH/plink/n.log. Try changing the –out parameter. How…

Continue Reading Failed to open /ROH/.log. Try changing the –out parameter.

Comparison of DNA sequencing services

This page lists the different DNA sequencing services. 2 main types can be distinguished: Whole exome sequencing is the middle ground between these two types, where a large amount of genes are sequenced, but only those that produce meaningful differences important for practical purposes, which is only 1% of the…

Continue Reading Comparison of DNA sequencing services

gatk SelectVariants is giving dupilicate allele error while extracting SNPs out of vcf file

gatk SelectVariants is giving dupilicate allele error while extracting SNPs out of vcf file 1 I am trying to extract snps out of merged vcf file using gatk SelectVariants command but it is giving following error: htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 73: Duplicate allele…

Continue Reading gatk SelectVariants is giving dupilicate allele error while extracting SNPs out of vcf file

Convert NCBI Downloaded files to ANNOVAR format

Convert NCBI Downloaded files to ANNOVAR format 0 I have been trying to understand from the ANNOVAR documentation and other sites the steps needed to make these files from NCBI available to ANNOVAR. I admit to being new to bioinformatics, but have been a software developer for 30+ years. My…

Continue Reading Convert NCBI Downloaded files to ANNOVAR format

Genomics England hiring PhD Bioinformatics Intern in London, England, United Kingdom

Company DescriptionGenomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our ability to…

Continue Reading Genomics England hiring PhD Bioinformatics Intern in London, England, United Kingdom

ASEReadCounter output wrong number of coverage

ASEReadCounter output wrong number of coverage 0 Hi, I am using ASEReadCounter to count the number of reads per variant in a BAM file. For some positions, it will report 1 read covered(1 refCount or 1 altCount) while there is no read covered at those positions after checking it in…

Continue Reading ASEReadCounter output wrong number of coverage

Building reference dbSNP file using WGS samples

Building reference dbSNP file using WGS samples 2 Dear scientific community, I have to call variants from WGS samples of citrus. I used GATK pipeline for post processing of aligned reads but reference dbSNP file is not available for citrus sinensis. I am using bootstraping method. Removed duplicates and called…

Continue Reading Building reference dbSNP file using WGS samples

How To Install bedtools on Debian 11

In this tutorial we learn how to install bedtools on Debian 11. bedtools is suite of utilities for comparing genomic features Introduction In this tutorial we learn how to install bedtools on Debian 11. What is bedtools bedtools is: The BEDTools utilities allow one to address common genomics tasks such…

Continue Reading How To Install bedtools on Debian 11

Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

Body: Hello, Biostars Community, I am working on creating a custom database of variants using the VCF from the latest dbSNP alpha release available at ftp.ncbi.nih.gov/snp/population_frequency/latest_release/. I have encountered a couple of issues that I’m hoping someone might help me resolve. Firstly, the chromosome encoding uses RefSeq IDs (e.g., NC_000007.12)…

Continue Reading Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

vcftools’ –weir-fst-pop and R hierfstats package’s varcomp.glob()

Differing output: vcftools’ –weir-fst-pop and R hierfstats package’s varcomp.glob() 0 I’m getting different Fst values when I use vcftools –weir-fst-pop vs. the varcomp.glob() function from the R hierfstats package. I’m not sure why because they both calculate Fst based on Weir & Cockerham’s 1984 paper. Can someone please explain why?…

Continue Reading vcftools’ –weir-fst-pop and R hierfstats package’s varcomp.glob()

vcftools

vcftools 1 Hi, I tried this code but I couldn’t get any output. Please guide me to resolve this issue bash for i in {1..2} do vcftools –LROH –vcf Pakistan.total.vcf –out ${i} –chr i done vcftools • 41 views • link updated 2 hours ago by Barista &utrif; 10 •…

Continue Reading vcftools

locuszoom error

locuszoom error 1 Hi there, I downloaded locuszoom in it’s entirty from their github page. I have everything in order but when I attempt to run the following I get an error: locuszoom_test/bin/locuszoom –metal chr7.snx13.marker.locuszoom.metal –ld chr7.imputed.concat.sorted.nomono.80.recode.maf01.pheno.vcf.2.ld.locuszoom.input –refsnp rs1533245 –prefix chr7.snx13.rs1533245 /share/hennlab/progs/locuszoom_test/bin/../src/m2zfast.py:82: SyntaxWarning: invalid escape sequence ‘\d’ RE_SNP_1000G = re.compile(“chr(\d+|[a-zA-z]+):(\d+)$”);…

Continue Reading locuszoom error

PhD Bioinformatics Intern Job in Greater London, Pharmaceuticals & Life Sciences Career, Intern/Graduate Jobs in Genomics England

Company Description Genomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our…

Continue Reading PhD Bioinformatics Intern Job in Greater London, Pharmaceuticals & Life Sciences Career, Intern/Graduate Jobs in Genomics England

Association analysis of production traits of Japanese quail (Coturnix japonica) using restriction-site associated DNA sequencing

Tsudzuki, M. Mutations of Japanese quail (Coturnix japonica) and recent advances of molecular genetics for this species. J. Poult. Sci. 45, 159–179 (2008). CAS  Google Scholar  Recoquillay, J. et al. A medium density genetic map and QTL for behavioral and production traits in Japanese quail. BMC Genom. 16, 10 (2015)….

Continue Reading Association analysis of production traits of Japanese quail (Coturnix japonica) using restriction-site associated DNA sequencing

Which program, tool, or strategy do you use to visualize genomic rearrangements?

Which program, tool, or strategy do you use to visualize genomic rearrangements? 5 Which program, tool, or strategy do you use to visualize genomic rearrangements? In relation to my master thesis I’m working on tools to visualize fusion genes. In that regard I’m interested in any and all strategies and…

Continue Reading Which program, tool, or strategy do you use to visualize genomic rearrangements?

Loftee no splice site annotations

Loftee no splice site annotations 1 Hello! I am using Loftee in my VEP pipeline and after some fights with my code everything works now, but the splice site annotations…meaning that I dont get them. There is no error at all, but my vcf files do not contain a single…

Continue Reading Loftee no splice site annotations

Where do these snpeff annotation come from?

Where do these snpeff annotation come from? 0 I am annotating a VCF with annotation from snpeff, which I want to use eventually to parse for predicted loss of function variants I want to understand the annotation better and document how they are happening. I run this command: snpEff “hg38″…

Continue Reading Where do these snpeff annotation come from?

BBtools bug in reporting the number of substitutions in the console output, it seems to report insanely high rates of heterozygosity

Hello, I know Brian is sometimes around, but here is my command: while read p; do callvariants.sh in=${p}.recal.bam ploidy=2 vcf=${p}.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g ; done <ID java -ea -Xmx50g -Xms50g -cp /home/alessandro/software/bbmap/current/ var2.CallVariants in=ancestor.recal.bam ploidy=2 vcf=ancestor.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g Executing var2.CallVariants [in=ancestor.recal.bam, ploidy=2, vcf=ancestor.20score.vcf, useiden tity=f, overwrite=true, ref=Adineta_vaga.fsa,…

Continue Reading BBtools bug in reporting the number of substitutions in the console output, it seems to report insanely high rates of heterozygosity

calculate nucleotide diversity from whole-genome-sequence data for individual genes

calculate nucleotide diversity from whole-genome-sequence data for individual genes 0 I am trying to calculate per gene nucleotide diversity (pi) for whole-genome-sequence data. I basically have whole genome resequenced data for many hundred individuals with ~1.2 million SNPs and a well annotated species with 36k genes. I was wondering if…

Continue Reading calculate nucleotide diversity from whole-genome-sequence data for individual genes

GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data

GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data 0 Dear all, Is there any best practice for the mouse snp indel vcf files using GATK Mutect2 for mouse whole exome data? For mm10, it seems have several available, for mm39, it seems the newest is from…

Continue Reading GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data

Transformer-based tool recommendation system in Galaxy | BMC Bioinformatics

Kumar A, Rasche H, Grüning B, Backofen R. Tool recommender system in Galaxy using deep learning. GigaScience. 2021. doi.org/10.1093/gigascience/giaa152. Article  PubMed  PubMed Central  Google Scholar  The galaxy community: the galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 50(W1):W345-W35104 2022. (2022). doi.org/10.1093/nar/gkac247 Gil Y, Ratnakar…

Continue Reading Transformer-based tool recommendation system in Galaxy | BMC Bioinformatics

Bcftools consensus when reference is a deletion

Bcftools consensus when reference is a deletion 1 Hello, I am trying to call a consensus on a VCF file like so: bcftools consensus species.vcf.gz -f Reference.fasta –absent N > Consensus.fasta Error: The site SUPER_1:173197 overlaps with another variant, skipping… I looked at this site and included the previous site…

Continue Reading Bcftools consensus when reference is a deletion

bam or VCF files from GSE75010

bam or VCF files from GSE75010 1 Hi all I’m planning to run a variant calling analysis using Microarray data GSE75010 that contains GSE75010_RAW.tar and GSE75010_complete_dataset.csv.gz. I used to download the .fastq files using SRA Run numbers through Ubuntu/Linux to get .bam and VCF files. However, this is not the…

Continue Reading bam or VCF files from GSE75010

VCF conservation into Treemix

VCF conservation into Treemix 1 I have a multi-sample vcf file with ~7 millions SNPs. Now I want to convert it into required format of the Treemix. I run it using vcf2treemix.sh along with plink2treemix.py, but plink2treemix.py works very very slow. So that if I use it, the analysis in…

Continue Reading VCF conservation into Treemix

Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome sequencing

Loftus, R. T., MacHugh, D. E., Bradley, D. G., Sharp, P. M. & Cunningham, P. Evidence for two independent domestications of cattle. Proc. Natl Acad. Sci. USA 91, 2757–2761 (1994). Article  ADS  CAS  PubMed  PubMed Central  Google Scholar  Verdugo Marta, P. et al. Ancient cattle genomics, origins, and rapid turnover…

Continue Reading Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome sequencing

Bedtools intersection

Bedtools intersection 0 While intersecting a VCF file and bed to obtain the reads that map to a class of genes, bedtools gives the following error: ***** WARNING: File TARGT_First.bed has inconsistent naming convention for record: chr1 45794952 45795134 MUTYH 1 ***** WARNING: File TARGT_First.bed has inconsistent naming convention for…

Continue Reading Bedtools intersection

[maftools]Too many multi_hit and missense mutation

[maftools]Too many multi_hit and missense mutation 0 Describe the issue When using maftools to plot mutational summary data, I encountered some issues: I use WES data to generate a filtered VCF file, and then utilize VEP for annotation to obtain an MAF file. The MAF file contains an excessive number…

Continue Reading [maftools]Too many multi_hit and missense mutation

The number of variations in the pan-genome is reduced compared to the variations in the input VCF file

The number of variations in the pan-genome is reduced compared to the variations in the input VCF file 0 Do vg filter out some variants during the construction of the pan-genome, and if so, what are the criteria for filtering? The number of variations in the pan-genome is reduced compared…

Continue Reading The number of variations in the pan-genome is reduced compared to the variations in the input VCF file

Annotated file cells show string

Annovar – Annotated file cells show string 0 Hello, I am new to bioinformatics and trying to get annovar to work. I was able to download the databases and get annovar working with the example files. But when I try to use VCF files, the annotated file’s cells have a…

Continue Reading Annotated file cells show string

Genotypes in vcf files

Genotypes in vcf files 0 Hi all, I have a variety of genotypes outputted in my vcf, these being: 0 1 0/0 0/1 1/1 1/2 Now I can assume these ones: 0 1 0/0 = homozygous for the reference 0/1 = heterozygous for the alternate 1/1 = homozygous for the…

Continue Reading Genotypes in vcf files

Pruning with –indep-pairwise with plink 1.9

I’m new to PLINK and I would like to obtain a file with SNPs in approximate linkage equilibrium. Here is my script and the outputs of each step. If someone could tell me if there is an error in the script because at…

Continue Reading Pruning with –indep-pairwise with plink 1.9

selection of reference genome

selection of reference genome 1 hello everyone, I got a vcf file with variation called using hg38 as reference genome. I wonder what would happen if I use hg19 as reference genome to annotate these variants. Would it be OK or get wrong? Thanks! hg19 reference genome hg38 • 25…

Continue Reading selection of reference genome

SNPs of a specific mouse strain

Hi, I wonder how can I get SNPs for a particular mouse strain like C57BL6. I have downloaded a mouse reference vcf from ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-2112-v8-SNPs_Indels/mgp_REL2021_snps.rsID.vcf.gz Its header is #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129P2_OlaHsd 129S1_SvImJ 129S5SvEvBrd A_J AKR_J B10.RIII BALB_cByJ BALB_cJ BTBR_T+_Itpr3tf_J BUB_BnC3H_HeH C3H_HeJ C57BL_10J C57BL_10SnJ C57BL_6NJ…

Continue Reading SNPs of a specific mouse strain

normalize not left-normalizing?

I’m running plink2 to convert a vcf to a pgen with pseudobiallelic variants. Calling –normalize does not seem to left-normalize as I would expect, at least when I look at the .pvar. Log PLINK v2.00a6LM AVX2 Intel (21 Nov 2023)       www.cog-genomics.org/plink/2.0/(C) 2005-2023 Shaun Purcell, Christopher Chang  …

Continue Reading normalize not left-normalizing?

Help finding the correct file version for dbSNP VCF ID replacement

Tried to use dbSNP version 156 using bcftools to replace the ID field in a reference VCF which originally contains a different position ID format. It seems the bcftools command did not work because a numeric chromosome column format in the #CHROM field which might not be compatible with bcftools…

Continue Reading Help finding the correct file version for dbSNP VCF ID replacement

Merging several vcf files for GWAS?

Merging several vcf files for GWAS? 0 Hello! I am a Medical Student without much background in Bioinformatics trying to perform analysis for my first GWAS study, tremendously overwhelmed. It’s a Case Control Association Study with samples from 50 subjects, that we sampled using Novogene NGS platform. The problem is,…

Continue Reading Merging several vcf files for GWAS?

Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA

Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA 0 Dear Biostar Community I’m currently trying to generate a protein FASTA containing all known variants from HeLa (from Cosmic CellLinesProject) for variant detection in proteomics measurements. For this, I’ve downloaded the variants file (VCF) and the…

Continue Reading Creating a Variant containing FASTA for proteomics search from VCF and genomic FASTA

update FMT/GT in VCF file using bcftools annotate

update FMT/GT in VCF file using bcftools annotate 1 Hi – I am trying to use bcftools to overwrite the existing FMT/GT values in a VCF file, matching by the ID column, in addition to CHROM and POS. I tried creating a .txt.gz file as an annotation file, but got…

Continue Reading update FMT/GT in VCF file using bcftools annotate

How to overlap patient VCF with ClinVar database annotation using bedtools?

How to overlap patient VCF with ClinVar database annotation using bedtools? 1 Hello, I’m trying to help a colleague who is trying to add ClinVar databases clinical significance column to VCF samples that she analysed. More specifically, we are trying to add overlapping/common variant annotation so that if the variant…

Continue Reading How to overlap patient VCF with ClinVar database annotation using bedtools?

All variants in a VCF register as “invalid genotype records in input file”

ANNOVAR Error: All variants in a VCF register as “invalid genotype records in input file” 0 Hello, I am running into an error with convert2annovar.pl where it is registering all of the variants in my VCF as invalid. My VCF is 1925 variants plus the header with the following format:…

Continue Reading All variants in a VCF register as “invalid genotype records in input file”

The Biostar Herald for Monday, November 20, 2023

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…

Continue Reading The Biostar Herald for Monday, November 20, 2023

bcftools info and filter error

bcftools info and filter error 0 Hi. I am trying to do a comparative analysis of my vcf file against the vcf files of ExAc. I’m using the bcftool isec here. I am getting an error that says: [W::vcf_parse_info] INFO ‘HOM_CONSANGUINEOUS’ is not defined in the header, assuming Type=String [W::vcf_parse_filter]…

Continue Reading bcftools info and filter error

BaseRecalibrator takes forever to run. Any suggestions?

BaseRecalibrator takes forever to run. Any suggestions? 1 Hello, I am trying to run BaseRecalibrator tool from GATK package and it takes forever (more than 4 days per one bam file). The command I’m using is: gatk BaseRecalibrator -I NG-01_1_S1_dedup_bwa.bam -R /rumi/shams/genomes/hg38/hg38.fa –known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz –known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz –known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table…

Continue Reading BaseRecalibrator takes forever to run. Any suggestions?

fasta – Get a certain gene sequence from bam/vcf and reference

I need to get a fasta sequence of a certain gene for a certain worm strain that is different from reference. I have a reference genome, BAM for the strain of interest, and coordinates of the gene. I know that vcftools can convert bam to fasta, but I do not…

Continue Reading fasta – Get a certain gene sequence from bam/vcf and reference

Compare two VCF / BAM / FASTQ files for degree of relativity

Compare two VCF / BAM / FASTQ files for degree of relativity 1 Hi, I have recently got reunited with a sibling, but we do not know if we only share the mother, or we have a common father as well. I already have made a full genome sequencing test…

Continue Reading Compare two VCF / BAM / FASTQ files for degree of relativity

Handling male samples chrX vcf genotype from 1000G high-coverage 30x

Handling male samples chrX vcf genotype from 1000G high-coverage 30x 0 Hello, I am working with the vcf files from the 1000G project high-coverage (30x) release. I do not completely understand how have the authors handled the genotypes of male individuals in the non-pseudoautosomal chrX regions. The genotypes in the…

Continue Reading Handling male samples chrX vcf genotype from 1000G high-coverage 30x

SNP calling with many samples using bcftools

SNP calling with many samples using bcftools 0 Hello, I aim to identify SNPs from approximately 500 BAM files (non-human). I’m opting for bcftools since GATK, even with the Spark addition, takes a substantial 6 hours per sample. My objective is to generate a single VCF file encompassing all SNPs…

Continue Reading SNP calling with many samples using bcftools

set GT values to missing in VCF file for specific sample-variant combinations

Hi – I have a multi-subject vcf file and would like to set specific genotypes (GT) to missing for a set of subjects. However, the subjects that I need to set to missing are different for each variant. For example, suppose I have this: CHROM POS ID FORMAT sub1 sub2…

Continue Reading set GT values to missing in VCF file for specific sample-variant combinations

ImputePipelinePlugin fails when trying to imputing SNPs on a gvcf file.

Hello everyone, I hope you’re doing great. I’m trying to impute a gvcf using a PHG database. As far as I’m concerned and because of the logs (attached here) of the steps 1 and 2 in the PHG Wiki guide, It seems that I have stablished and populated the PHG…

Continue Reading ImputePipelinePlugin fails when trying to imputing SNPs on a gvcf file.

Query regarding callsets used as known sites in Variant Calling

Query regarding callsets used as known sites in Variant Calling 0 Hi, Where can I learn more about the standard VCF files that are used as known sites during the BQSR step in Variant Calling with GATK? The files are: Homo_sapiens_assembly38.dbsnp138.vcf Homo_sapiens_assembly38.known_indels.vcf.gz Mills_and_1000G_gold_standard.indels.hg38.vcf.gz I am aware that these files are…

Continue Reading Query regarding callsets used as known sites in Variant Calling

Appropriate genome reference for converting TCGA VCF files to MAF

Appropriate genome reference for converting TCGA VCF files to MAF 0 I have a directory of MAF files obtained from TCGA and I want to convert it to VCF format. Reference: GRCh38.d1.vd1 Reference Sequence Source: gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference-files maf2vcf.pl –input-maf maf/* –output-dir VCF -ref-fasta /home/melchua/.vep/homo_sapiens/GRCh38/GRCh38.d1.vd1.fa.tar.gz Traceback: Use of uninitialized value $lines in…

Continue Reading Appropriate genome reference for converting TCGA VCF files to MAF

Bgen file not being opened by PRSice

Bgen file not being opened by PRSice 0 I used the following command to calculate PRS of a sequenced file coming from a collaborator. I imputed the vcf file which gave me separate vcf files for each chromosome. I then converted them to bgen and generated bgi and sample files…

Continue Reading Bgen file not being opened by PRSice

GATK SelectVariants –remove-unused-alternates dropping real INDELs?

I’m using a VCF that is generated by GenotypeGVCFs (so doing calibration based on a larger cohort of samples) and my goal is to only extract variants of interest to one specific sample. The VCF in the subset tends to include some variants that were present in the original joint…

Continue Reading GATK SelectVariants –remove-unused-alternates dropping real INDELs?

University of Alabama at Birmingham hiring BIOINFORMATICIAN I in Birmingham, Alabama, United States

Position Summary: The primary role is to execute a variety of data management and analysis tasks, ensuring the quality, reproducibility, and efficiency of processes related to high-dimensional data. You will collaborate with study investigators and fellow bioinformatics professionals within the department to contribute to high-quality, reproducible research across various scientific…

Continue Reading University of Alabama at Birmingham hiring BIOINFORMATICIAN I in Birmingham, Alabama, United States

Samtools index not working in Snakemake

I am setting up a Snakemake pipeline for sequencing reads alignment and variants calling. But the samtools index rule is not activated, and the subsequent haplotype caller rule fail. I think it is because the samtools index rule is not perceived as necessary to execute the output of rule all…

Continue Reading Samtools index not working in Snakemake

Bioconductor – AnnotationHub

DOI: 10.18129/B9.bioc.AnnotationHub     Client to access AnnotationHub resources Bioconductor version: Release (3.6) This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be…

Continue Reading Bioconductor – AnnotationHub

Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error

Source: bcftools Version: 1.18-1 Severity: serious Tags: ftbfs Justification: ftbfs Control: forwarded -1 github.com/samtools/bcftools/issues/2036 Dear Maintainer, bcftools currently ftbfs on armhf due to multiple test_vcf_merge failures with Bus error[1]. I already informed upstream[2]. This bug is mostly to keep track of the issue on Debian side and eventually comment on possible Debian specific…

Continue Reading Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error

MELT-SINGLE “priors” list usage

I’m trying to get “priors” working with MELT-SINGLE but nothing I’ve done seems to be making a difference. As a test, I run 1 sample without a priors list  java -Xmx6G -jar MELT/MELTv2.2.2.jar Single -bamfile HT-7604-01A-11D-2088.bam -t MELT/me_refs/Hg38/ALU_MELT.zip -h hg38.chrXYM_alts.fa -n MELT/add_bed_files/Hg38/Hg38.genes.bed -w HT-7604-01A-11D-2088-run1/ Command Line:MELT.jar Single -bamfile HT-7604-01A-11D-2088.bam -t…

Continue Reading MELT-SINGLE “priors” list usage

Divergent mechanisms of reduced growth performance in Betula ermanii saplings from high-altitude and low-latitude range edges

Aizawa M, Yoshimaru H, Saito H, Katsuki T, Kawahara T, Kitamura K et al. (2009) Range‐wide genetic structure in a north‐east Asian spruce (Picea jezoensis) determined using nuclear microsatellite markers. J Biogeogr 36(5):996–1007 Article  Google Scholar  Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated…

Continue Reading Divergent mechanisms of reduced growth performance in Betula ermanii saplings from high-altitude and low-latitude range edges

best annotator for mitochondria and heteroplasmy calculation

best annotator for mitochondria and heteroplasmy calculation 0 Dear all, I am analysing sequencing data on mitochondrial DNA with the aim of finding pathogenic variants. I only have the vcf’s. Does anyone have any suggestions as to where I can annotate them? Unfortunately Annovar doesn’t annotate anything for me and…

Continue Reading best annotator for mitochondria and heteroplasmy calculation

BAM file for phasing

BAM file for phasing 0 Hi all, I’m new in bioinformatics, and i’m trying to do phasing and imputation to WGS-level. For imputation with Beagle, I would like to make a bref file from a vcf file. And I have to phase the reference panel for that. Is a BAM…

Continue Reading BAM file for phasing

Bioinformatician job with Proclinical Staffing

Proclinical is seeking a Bioinformatician for a digital IT and data science provider located in Cambridge, MA. Must be eligible to work in the US. Job Responsibilities: Deploy and optimize bioinformatic workflows for the integration and analysis of NGS data, including short and long read sequencing data. Interpret results from…

Continue Reading Bioinformatician job with Proclinical Staffing

Landscape genomics reveals adaptive genetic differentiation driven by multiple environmental variables in naked barley on the Qinghai-Tibetan Plateau

Abebe TD, Naz AA, Léon J (2015) Landscape genomics reveal signatures of local adaptation in barley (Hordeum vulgare L.). Front Plant Sci 6:813 Article  PubMed  PubMed Central  Google Scholar  Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664 Article  CAS  PubMed …

Continue Reading Landscape genomics reveals adaptive genetic differentiation driven by multiple environmental variables in naked barley on the Qinghai-Tibetan Plateau