Tag: VCF.gz

Error getting the genome on clinvaR

Error getting the genome on clinvaR 1 Hi, I am trying to use clinvaR following this vignette (here ) but when I try to download and Import 1000 Genomes VCF, I get an error: Cannot open specified tabix file: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr15.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz Error in read.table(text = paste(output, collapse = “\n”), header =…

Continue Reading Error getting the genome on clinvaR

Annotate variants with ensembl rest api

Annotate variants with ensembl rest api 0 I have a variant file (.vcf.gz), and I want to annotate this file using the Ensembl Rest API, particularly the Vep Rest API. I am new to this variant annotation; however, I have seen a couple of codes from the Ensembl page on…

Continue Reading Annotate variants with ensembl rest api

Indigenous Australian genomes show deep structure and rich novel variation

Inclusion and ethics The DNA samples analysed in this project form part of a collection of biospecimens, including historically collected samples, maintained under Indigenous governance by the NCIG11 at the John Curtin School of Medical Research at the Australian National University (ANU). NCIG, a statutory body within ANU, was founded…

Continue Reading Indigenous Australian genomes show deep structure and rich novel variation

extract variants from 1000 Genome VCF files

extract variants from 1000 Genome VCF files 0 Hi everyone, I have a gVCF containing genetic information from different individuals, and I would like to extract specific SNPs. The SNPs of interest are listed in a BED file with the following structure (the end position rapresent the real position of…

Continue Reading extract variants from 1000 Genome VCF files

How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)?

How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)? 2 Does anyone know how display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)? Is this possible at all? The number of variants…

Continue Reading How to display a VCF/BCF file or stream as a paginated table in a python web framework (e.g. Django)?

GetPileupSummaries intervals-list with Targeted Sequencing?

GetPileupSummaries intervals-list with Targeted Sequencing? 0 Hi! I am applying the GetPileUpSummaries, for somatic variant calling starting from targeted sequencing .fasta. I aligned the file with the GrCh38 reference. And currently I am at the GetPileUpSummariesStep. gatk –java-options -Xmx200G GetPileupSummaries \ -I $RECBAM \ -L ???? \ -O $OUTPUT \…

Continue Reading GetPileupSummaries intervals-list with Targeted Sequencing?

Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

Body: Hello, Biostars Community, I am working on creating a custom database of variants using the VCF from the latest dbSNP alpha release available at ftp.ncbi.nih.gov/snp/population_frequency/latest_release/. I have encountered a couple of issues that I’m hoping someone might help me resolve. Firstly, the chromosome encoding uses RefSeq IDs (e.g., NC_000007.12)…

Continue Reading Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data

GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data 0 Dear all, Is there any best practice for the mouse snp indel vcf files using GATK Mutect2 for mouse whole exome data? For mm10, it seems have several available, for mm39, it seems the newest is from…

Continue Reading GATK Mutect2 mouse dbSNP vcf files recommendations for mouse whole exome data

Bcftools consensus when reference is a deletion

Bcftools consensus when reference is a deletion 1 Hello, I am trying to call a consensus on a VCF file like so: bcftools consensus species.vcf.gz -f Reference.fasta –absent N > Consensus.fasta Error: The site SUPER_1:173197 overlaps with another variant, skipping… I looked at this site and included the previous site…

Continue Reading Bcftools consensus when reference is a deletion

SNPs of a specific mouse strain

Hi, I wonder how can I get SNPs for a particular mouse strain like C57BL6. I have downloaded a mouse reference vcf from ftp.ebi.ac.uk/pub/databases/mousegenomes/REL-2112-v8-SNPs_Indels/mgp_REL2021_snps.rsID.vcf.gz Its header is #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129P2_OlaHsd 129S1_SvImJ 129S5SvEvBrd A_J AKR_J B10.RIII BALB_cByJ BALB_cJ BTBR_T+_Itpr3tf_J BUB_BnC3H_HeH C3H_HeJ C57BL_10J C57BL_10SnJ C57BL_6NJ…

Continue Reading SNPs of a specific mouse strain

update FMT/GT in VCF file using bcftools annotate

update FMT/GT in VCF file using bcftools annotate 1 Hi – I am trying to use bcftools to overwrite the existing FMT/GT values in a VCF file, matching by the ID column, in addition to CHROM and POS. I tried creating a .txt.gz file as an annotation file, but got…

Continue Reading update FMT/GT in VCF file using bcftools annotate

How to overlap patient VCF with ClinVar database annotation using bedtools?

How to overlap patient VCF with ClinVar database annotation using bedtools? 1 Hello, I’m trying to help a colleague who is trying to add ClinVar databases clinical significance column to VCF samples that she analysed. More specifically, we are trying to add overlapping/common variant annotation so that if the variant…

Continue Reading How to overlap patient VCF with ClinVar database annotation using bedtools?

BaseRecalibrator takes forever to run. Any suggestions?

BaseRecalibrator takes forever to run. Any suggestions? 1 Hello, I am trying to run BaseRecalibrator tool from GATK package and it takes forever (more than 4 days per one bam file). The command I’m using is: gatk BaseRecalibrator -I NG-01_1_S1_dedup_bwa.bam -R /rumi/shams/genomes/hg38/hg38.fa –known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz –known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz –known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table…

Continue Reading BaseRecalibrator takes forever to run. Any suggestions?

ImputePipelinePlugin fails when trying to imputing SNPs on a gvcf file.

Hello everyone, I hope you’re doing great. I’m trying to impute a gvcf using a PHG database. As far as I’m concerned and because of the logs (attached here) of the steps 1 and 2 in the PHG Wiki guide, It seems that I have stablished and populated the PHG…

Continue Reading ImputePipelinePlugin fails when trying to imputing SNPs on a gvcf file.

Query regarding callsets used as known sites in Variant Calling

Query regarding callsets used as known sites in Variant Calling 0 Hi, Where can I learn more about the standard VCF files that are used as known sites during the BQSR step in Variant Calling with GATK? The files are: Homo_sapiens_assembly38.dbsnp138.vcf Homo_sapiens_assembly38.known_indels.vcf.gz Mills_and_1000G_gold_standard.indels.hg38.vcf.gz I am aware that these files are…

Continue Reading Query regarding callsets used as known sites in Variant Calling

Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error

Source: bcftools Version: 1.18-1 Severity: serious Tags: ftbfs Justification: ftbfs Control: forwarded -1 github.com/samtools/bcftools/issues/2036 Dear Maintainer, bcftools currently ftbfs on armhf due to multiple test_vcf_merge failures with Bus error[1]. I already informed upstream[2]. This bug is mostly to keep track of the issue on Debian side and eventually comment on possible Debian specific…

Continue Reading Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error

Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly

Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly 0 We have two groups sample of yeast species, control (1 sample) and treatment (1 sample), whose complete reference genome isn’t available yet to do alignment nor variant calling. The objective of this project is straightforward, simply wanting to…

Continue Reading Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly

variant calling – How to run a GATK Docker Image with local files?

I’m trying to use the HaplotypeCaller from the GATK toolkit but I keep getting an error. I pulled GATK through Docker and am using this command: docker run -v /Users/rimo/ -it broadinstitute/gatk:latest gatk HaplotypeCaller -R /Users/rimo/reference.fasta -I /Users/rimo/sample1.bam -O /Users/rimo/sample1.g.vcf.gz -ERC GVCF /Users/rimo is my home directory it’s where the…

Continue Reading variant calling – How to run a GATK Docker Image with local files?

vcf – VEP annotation INFO field Ensembl IDs and locations

I have a vcf file that I annoteted with VEP, for human data. I have run VEP to annotate my files with some additional parameters (as shown below in the ##VEP-command-line). However, my output is rather strange (mainly the INFO column). ##VEP=”v108″ time=”2023-04-27 15:13:08″ cache=”workflow/resources/variants/cache_vep/homo_sapiens/108_GRCh38″ ensembl-funcgen=108.56bb136 ensembl-variation=108.a885ada ensembl-io=108.58d13c1 ensembl=108.d8a9c80 1000genomes=”phase3″…

Continue Reading vcf – VEP annotation INFO field Ensembl IDs and locations

bcftools compressing and indexing vcf files

bcftools compressing and indexing vcf files 2 Hello, I am trying to merge multiple VCF files using bcftools but it threw an error saying that the file is not compressed. I want to know if the right command to compress the file would be: bcftools view -I input.vcf -O z…

Continue Reading bcftools compressing and indexing vcf files

CombineGVCFs skips a chromosome

Hi! I am having issues for the first time with CombineGVCFs. Specifically, it outputs a combined gvcf without chromosome 8 (SUPER_8) even though this is present in the individual gvcfs that I input in the command. There is no error in the log file, the engine just shuts down after…

Continue Reading CombineGVCFs skips a chromosome

No samples in .vcf file.

I am trying to convert my vcf file into a BED format file.  When I use this command: plink –vcf merge.bacteria.vcf.gz –make-bed –out merge.bacteria.vcf.bed  I get the following error stating:  PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License…

Continue Reading No samples in .vcf file.

Bcftools Consensus – Choose Random Allele for Heterozygous Sites

Bcftools Consensus – Choose Random Allele for Heterozygous Sites 0 Hello, I am trying to generate a haploid consensus sequence based on a VCF file. For sites which are heterozygous, I want to randomly choose one of the alleles. I don’t want to always choose reference and I don’t want…

Continue Reading Bcftools Consensus – Choose Random Allele for Heterozygous Sites

problem with bcftools syntax

problem with bcftools syntax 1 Hi all! I am having difficulty with creating a bcftools command. I have a .vcf.gz file downloaded from the 1000G site and a csv file with columns chrom/pos/id/ref/alt. I would like to manipulate the downloaded vcf file so that it uses only the snps I…

Continue Reading problem with bcftools syntax

1000 Genomes download files

1000 Genomes download files 0 Hi! I would like to download VCF files for a certain population from the 1000 Genomes site. However, I would like to do this for around 100 people and each sample that I found in the data section, has .vcf.gz for each chromosome. My question…

Continue Reading 1000 Genomes download files

Help me understand “for stripped” in bcftools isec output

Hello, I was given a set of VCF files, comparing variants. The Readme gives the following command bcftools isec -p dir -n-1 -c all ref.vcf.gz S1.vcf.gz S2.vcf.gz S3.vcf.gz S4.vcf.GZ S5.vcf.gz S5.vcf.gz First, if I understand the doc correctly, -n-1 means “looks for SNPs found at most in a single file”?…

Continue Reading Help me understand “for stripped” in bcftools isec output

Allele frequncies in plink including physical position in the output

Allele frequncies in plink including physical position in the output 1 Hi, I am trying to compute allele frequencies for a large genotypic data set. The command I am using is as follow: plink2 –vcf my_file.vcf.gz –freq –map my_file.map –out my_outfile The reason I am using a map file is…

Continue Reading Allele frequncies in plink including physical position in the output

public databases – Converting VCF format to text for use with PLINK and understanding column mapping

I successfully completed Nature PRS tutorial, which is based on PLINK. Turning to my real data, I downloaded ukb-d-20544_1.vcf.gz. Now I’m facing the problem that I seem to be unable to use it in PLINK or find the correct data format to download at all, and I am a bit…

Continue Reading public databases – Converting VCF format to text for use with PLINK and understanding column mapping

AlphaMissense Plugin VEP

AlphaMissense Plugin VEP 0 I’ve installed alphamissense plugin in VEP, but I can’t use it. I’ve downloaded the requested files and launch the tabix command before use it. Then I’ve launched the command but I got this error: WARNING: Failed to instantiate plugin AlphaMissense: ERROR: No file specified Try using…

Continue Reading AlphaMissense Plugin VEP

GenotypeGVCF too many genotypes from pooled samples

Hello, I am trying to create a VCF file using GentypeGVCFs in GATK4. I have 60 samples and each sample is pooled data. The ploidy per sample is 60. This is due to the biological system I work in. This data has been processed in Haplotypecaller, below is an example…

Continue Reading GenotypeGVCF too many genotypes from pooled samples

vcf.gz to vcf

vcf.gz to vcf 1 Hello. I want to unzip ‘vcf.gz’ file. bgzip -d my.vcf.gz In Linux, when I run the above code, I get the following error: [bgzip] my.vcf.gz: not a compressed file — ignored How can I solve this problem? VCF • 19 views • link updated 6 minutes…

Continue Reading vcf.gz to vcf

Imputation server failing to see samples in VCF files

Imputation server failing to see samples in VCF files 0 I used the following command to generate VCF files from the 23andMe zip file for the Michigan Imputation server but it keeps failing with the validation error: At least 20 samples must be uploaded. java -jar vcf-tools-0.1.jar vcf-generator –in 23_n_me/23andme-tools-output/genome_name_v4_Full_20230822212500.zip…

Continue Reading Imputation server failing to see samples in VCF files

Quality Control of VCFs that used different genotyping arrays

I have three VCFs. Two of these VCFs were generated using the Precision Medicine Research Array (PMRA) and refer to SNPs as AX numbers. I was able to merge the two PMRA VCFs together. Merged PMRA VCFs (Total genotyping rate is 0.924427): 1 AX-150343089 0 837711 T C 1 AX-149471710…

Continue Reading Quality Control of VCFs that used different genotyping arrays

Automate the Splitting of a VCF File by Sample (bcftools)

Hi! I’m very new to working with large .vcf files, and am trying to split up a particular file by strain (sample). There are about 1500 samples in the file, so going through manually isn’t really an option (although I have managed to get it to work). My problem has…

Continue Reading Automate the Splitting of a VCF File by Sample (bcftools)

How to filter vcf file by MAF using bcftools?

How to filter vcf file by MAF using bcftools? 0 I saw this thread (github.com/samtools/bcftools/issues/357) and it seems sometimes bcftools recomputes MAF and sometimes it takes it from INFO. Which one is recommended to use and how is this calculation different than what stated on INFO? Also, what is the…

Continue Reading How to filter vcf file by MAF using bcftools?

bcftools error merging two VCFs: REF prefixes differ

Hi all, i am trying to merge two VCF files using bcftools merge. However, my command bcftools merge -m id VCF_d.vcf.gz VCF_p.vcf.gz -o merged.vcf.gz –force-samples returns the following The REF prefixes differ: TG vs GA (2,2) Failed to merge alleles at 18:786377 in VCF_d.vcf.gz These are the entries in the…

Continue Reading bcftools error merging two VCFs: REF prefixes differ

how to extract unique snps in a vcf file by comparing with multiple vcf files

how to extract unique snps in a vcf file by comparing with multiple vcf files 1 how to extract unique snps in a vcf file by comparing with multiple vcf files and make a file with unique snps EDIT by Ram OP created anotehr post a couple of hours later…

Continue Reading how to extract unique snps in a vcf file by comparing with multiple vcf files

Finding sequences in unannotated genomes using reference coordinates

Finding sequences in unannotated genomes using reference coordinates 0 Hey Stars! I have a really confounding issue at hand. I am working on extracting upstream regions of genes from 100 different genomes of A. thaliana. The problem being, I have one reference genome for TAIR10 version (which has an annotated…

Continue Reading Finding sequences in unannotated genomes using reference coordinates

Filtering VCF to divide with equal sizes

Filtering VCF to divide with equal sizes 1 Hello everyone! I have a very large VCF file (>400gb), and I want to divide it to use with VEP. VEP recommends separating the vcf, so I generated a list of contigs, based on the header, with 3^7 bases for each chromosome….

Continue Reading Filtering VCF to divide with equal sizes

Is a PON necessary for tumor-normal matched Mutect2?

Is a PON necessary for tumor-normal matched Mutect2? 1 I’m a bit confused on whether or not i should include GATK’s public PON (either 1000g_pon.hg38.vcf.gz since I aligned with hg38), make my own from my normal samples, or just leave it and not include a PON. I am planning on…

Continue Reading Is a PON necessary for tumor-normal matched Mutect2?

Downstream analysis on multi-sample or single-sample VCF files?

Downstream analysis on multi-sample or single-sample VCF files? 0 Hello, I use GATK best practices in my analysis (mainly dnaseq pipeline) and as it is suggested the pipeline calls genotypes on all the samples together and at the end creates an “allSamples.vcf.gz” file. At this stage one approach would be…

Continue Reading Downstream analysis on multi-sample or single-sample VCF files?

samtools – I am trying to create a subset of 10k variants from 25-30 unmapped contigs of a g.vcf file including the header

My objective is to take a g.vcf.gz file and from 25-30 unmapped contigs with titles like “NW_020192317.1”, I want to make a subset of ~10k variants from each of the unmapped contigs and make one final g.vcf file that includes the header from the original g.vcf.gz file. From the post…

Continue Reading samtools – I am trying to create a subset of 10k variants from 25-30 unmapped contigs of a g.vcf file including the header

there are extra regions when calculating Tajima’s D per gene

Hello all, I am new to PopGenome and would like to ask one question that greatly confused me. I was trying to calculate Tajima’s D by gene for my whole genome data. I imported the gff files and subsited the data by “gene”. See my codes below. However, when I…

Continue Reading there are extra regions when calculating Tajima’s D per gene

Convert dosage to hardcall genotype, inconsistent result

Dear PLINK2 team, I am trying to convert dosage (from TOPMED imputation server and stored in vcf.gz files) to hardcall genotype using plink2 and have observed inconsistent result between dosage and hardcall. Here are the code I used to test: plink2 –vcf myvcf.dose.vcf.gz dosage=DS \–hard-call-threshold 0.499 \–double-id \–make-bed \–out test…

Continue Reading Convert dosage to hardcall genotype, inconsistent result

How to generate a consensus sequence from BAM file with bcftools?

How to generate a consensus sequence from BAM file with bcftools? 0 Hello, I have aligned some files against a reference genome with BMA-MEM, deduplicated and sorted with sambamba to generate a AlnSrtDedSrt.bam file. The question is: how can I generate a consensus fasta file? I have this fragment of…

Continue Reading How to generate a consensus sequence from BAM file with bcftools?

bcftools view -r issue

bcftools view -r issue 1 Hi, I’m trying to extract a region from a VCF file using bcftools view, however no matter what I do it only extracts the whole chromosome, not the region. The chromosome name is farily complex (i.e. not just “chr1”). The command line I’m using is:…

Continue Reading bcftools view -r issue

Fastest way to find private SNPs for each sample

Fastest way to find private SNPs for each sample 0 Hello I have 20 VCF files, each representing a single sample, containing SNPs. I have to know, for each sample, what SNPs are private to this sample. What are the options. So far I am usinb bcftools isec, but I…

Continue Reading Fastest way to find private SNPs for each sample

BCFtools isec output vs BCFtools query

I have 5 samples and I have performed variant calling using the Strelka2 software. The output of this software is a VCFfile for each sample and a VCF file containing all the variants across all samples. I am interested in looking at which variants are shared among and/or unique to…

Continue Reading BCFtools isec output vs BCFtools query

From where can I obtain vcf files of healthy exomes

From where can I obtain vcf files of healthy exomes 2 I want to do a simulation analysis for my project which requires benchmarking on a cohort of 200-300 exomes of healthy people. I tried to download such data from GnomAD or the 1000 Genome project but each VCF contains…

Continue Reading From where can I obtain vcf files of healthy exomes

Why Beagle v5.4 fails during phasing large genotypic data set?

Why Beagle v5.4 fails during phasing large genotypic data set? 0 I am trying to phase large genotypic data set (~330 samples and ~25 millions SNPs) by using Beagle v5.4. My command is this: java -Xmx160g -jar ./beagle.22Jul22.46e.jar gt=./lines_ch1.vcf.gz ref=./ref_lines.bref3 chrom=1 map=./genmap_ch1.map nthreads=60 window=10.0 out=./pased_lines_ch1 impute=false The job ends when…

Continue Reading Why Beagle v5.4 fails during phasing large genotypic data set?

conform-gt error “Duplicate marker”

conform-gt error “Duplicate marker” 0 I am using below scripts: java -jar conform-gt.24May16.cee.jar ref=Chr01.merged_files_Miss_new.vcf.gz gt=batch1-8_450k_edited.beagle.39474SNPs_new.vcf.gz chrom=1 match=ID out=mod.chr01.consistent to align my input file to the reference (unphased yet). Below is the error message: Below is the error message: Exception in thread “main” java.lang.IllegalArgumentException: Duplicate marker Chr01.merged_files_Miss_new.vcf.gz]: 1 1 1_1.9e+07 C…

Continue Reading conform-gt error “Duplicate marker”

Problems with bcf concat

Problems with bcf concat 1 Hello! I have multiple VCFs files for the same sample. I used the commad bcftools concat -a -Oz -o concat.vcf.gz 242487_vs_242482.bcftools.vcf.gz 242487_vs_242482.freebayes.vcf.gz But i have the next error Checking the headers and starting positions of 2 files [W::hts_idx_load3] The index file is older than the…

Continue Reading Problems with bcf concat

vg tools is running, but memory consumption is not happened and log files are not updated

vg tools is running, but memory consumption is not happened and log files are not updated 2 Hi there, I use vg tools for giraffe mapping. I started vg tools 2022.01.13 and now(2022.01.17). vg makes log files : chunked fasta, chunked vcf file, etc… I has checked log files. log…

Continue Reading vg tools is running, but memory consumption is not happened and log files are not updated

Create a reference genome from aligned bam file

So, the bam file you have is an alignment from fastq files to a reference genome. And I’m also assuming you did it yourself and have access to the files, so you can test out other alignment softwares. A note is that these commands are after the alignment using bowtie2,…

Continue Reading Create a reference genome from aligned bam file

VG mapping paired-end reads: error [xg]: multiple hits for XXX

Hi everyone, I did create my first graph using vcf from HGDP as follow: vg construct -r ref.fa -v sub-chrXX.vcf.gz > pXX.vg vg ids -j $(for i in $(seq 1 22); do echo p${i}.vg; done) vg index -x all.xg $(for i in $(seq 1 22); do echo p${i}.vg; done) vg…

Continue Reading VG mapping paired-end reads: error [xg]: multiple hits for XXX

VG node not present in graph

Hello everyone, I’m currently working on constructing my first pangenome using VG and could use some help. My approach involves using VCF files from individuals in HGDP and following the steps outlined on their GitHub tutorial: GitHub Tutorial Link. This tutorial fits my needs but wasn’t updated since 2020 …..

Continue Reading VG node not present in graph

gatk – A bash script for running on a bunch of bam files

I have some bam files in this directory /data/Continuum/WES/results/ I want to run GATK mutation calling over bam files I googled and I realised for one function, I can do this cd /data/Continuum/WES/vcf/ for file in *.bam ; do ./gatk CollectSequencingArtifactMetrics -I /data/Continuum/WES/results/NG-27280_CLTSS_LTS_0017_lib506243_7661_2_MarkedDup.bam -O NG-27280_CLTSS_LTS_0017_lib506243_7661_2_MarkedDup –FILE_EXTENSION .txt -R resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta done;…

Continue Reading gatk – A bash script for running on a bunch of bam files

display vcf after giraffe alignment on IGV

display vcf after giraffe alignment on IGV 0 I performed Giraffe alignment to map reads to the HPRC pangenome, resulting in a gam file. Subsequently, I utilized the following commands to generate a .vcf file: Generated a pack file with the following command: ./vg pack –x hprc-v1.0-mc-grch38-minaf.0.1.gbz –g hprc1004mapped.gam >…

Continue Reading display vcf after giraffe alignment on IGV

The GATK “The given bam input has no sample names.” error

The GATK “The given bam input has no sample names.” error 1 for f in MINIMAP BWA ; do ~/gatk-4.2.0.0/gatk HaplotypeCaller –reference /home/tmichel/projects/rbge/HybSeq_thibauld/reference_genomes/Begonia_loranthoides_scaffold.fasta –input Hillebrandia_sorted.$f.bam –output Hillebrandia.$f.g.vcf.gz –emit-ref-confidence GVCF ; done I have used GATK to call variants in BAM files files with both minimap2 and bwa mem with the…

Continue Reading The GATK “The given bam input has no sample names.” error

bcftoold guess ploidy

bcftoold guess ploidy 0 Hi, I am trying to use bcftools guess-ploidy to check gender. This is how I tried to use it bcftools view sample.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy >guess_ploidy.output The VCF has around 50 samples data from Whole Genome Sequencing and has data on all chromosomes. However…

Continue Reading bcftoold guess ploidy

The result of Illumina/hap.py using the same file.

The result of Illumina/hap.py using the same file. 1 I am trying to use the tool ‘Illumina/hap.py’ from GA4GH to compare the results of variant calling tools (vcf files). Before I compared my results using this tool, I wanted to make sure that the results of this tool were reliable….

Continue Reading The result of Illumina/hap.py using the same file.

How does bcftools decide what sample name to assign when calling variants?

How does bcftools decide what sample name to assign when calling variants? 1 How does bcftools decide what sample names to assign in the vcf when performing variant calling using mpileup and call commands? I’m using bcftools to call variants from an aligned bam file like this samtools mpileup -A…

Continue Reading How does bcftools decide what sample name to assign when calling variants?

vcf file chr notation

“I have a single VCF file named ‘ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz’. The issue at hand is that the file uses different chromosomal notation and lacks the ‘chr’ prefix. Like this “##fileformat=VCFv4.3 ##FILTER=<ID=PASS,Description=”All filters passed”> ##fileDate=11032019_15h52m43s ##source=IGSRpipeline ##reference=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa ##contig=<ID=1> ##contig=<ID=2> ##contig=<ID=3> ##contig=<ID=4> ##contig=<ID=5> ##contig=<ID=6> ##contig=<ID=7> ##contig=<ID=8> ##contig=<ID=9> ##contig=<ID=10> ##contig=<ID=11> ##contig=<ID=12> ##contig=<ID=13> ##contig=<ID=14> ##contig=<ID=15> ##contig=<ID=16>…

Continue Reading vcf file chr notation

how to change the ‘bcftools plugin’ temp directory

how to change the ‘bcftools plugin’ temp directory 0 Hi, \ I am using the bcftools plugin ‘liftover’, cat input.vcf | bcftools +liftover –threads 10 -Oz -o output.liftover.vcf.gz — -s original_reference.fna -f new_reference.fna -c original_to_new.chain 2>liftover.log but my runs fail with the following error: [main_plugin] Error: cannot write to Perognathus_filtered.liftover.vcf.gz…

Continue Reading how to change the ‘bcftools plugin’ temp directory

How to add rsIDs to VCF?

How to add rsIDs to VCF? 1 Hey it’s quite some time ago but if anyone else is having a problem I just wanted to say following command worked for me: bcftools annotate -a /data/references/hg19/pipe/dbsnp138/00-All.vcf.gz -c ID -o samtools_annotated.vcf.gz samtools.vcf.gz The thing to look out for is, I think it…

Continue Reading How to add rsIDs to VCF?

Apply Plink2 Score – Error Invalid chromosome code

I am trying to run a calculator tool for polygenic scores called pgsc_calc (The Polygenic Score Catalog Calculator pipeline) that runs with nextflow and docker in linux, with my own VCF file. Its failing step 8: process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE **ERROR ~ Error executing process > ‘PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE (NG13RY1WV.vcf.gz chromosome ALL effect…

Continue Reading Apply Plink2 Score – Error Invalid chromosome code

GATK GetPileUpSummariesUsage

GATK GetPileUpSummariesUsage 0 Hi, I am doing variant calling using as reference hg19. After Gatk Mutect with PON Mutect2-WGS-panel-b37.vcf transformed into Mutect2-WGS-panel-b37-hg19.vcf and germline af-only-gnomad.raw.sites.vcf to af-only-gnomad.hg19.raw.sites.vcf (with Picard LiftOver). After doing Mutect2 next step is GATK GetPileUpSummaries, that has this usage from GATK website: gatk GetPileupSummaries \ -I tumor.bam…

Continue Reading GATK GetPileUpSummariesUsage

bcftools view to failed reader data

It’s so confused for data editing with vcftools – bcftools fusion preprocessing. I want to view all results in bcftools, vcftool and zcat when I typing command, but it doesn’t work. for example, i want to see same results with bcftools view input.file | less [environment]$ bcftools view input.vcf.gz |…

Continue Reading bcftools view to failed reader data

Beagle log not matching SNP data

Beagle log not matching SNP data 0 Hello, I am trying to impute the genotypes of 32 individuals to WGS with Beagle 5.4 and I found inconsistences between beagle logs and my SNP data. I run Beagle with this: java -Xss51m -Xmx64g -jar beagle.22Jul22.46e.jar gt=genotype_Chr1.vcf ref=training_ref_Chr1.vcf.gz out=genotypeChr1_imputed These are the…

Continue Reading Beagle log not matching SNP data

Converting string to numerical in bcftools

Converting string to numerical in bcftools 0 Hi everyone, I am using bcftools to filter variants from a VCF file. The variants from this VCF file have been annotated using ANNOVAR. I would like to filter variants having a CADD score > 20 in a field named “CADD_phred” which has…

Continue Reading Converting string to numerical in bcftools

bcftools – How can I retrieve the GRCh38 coordinates of a list of rsids?

I have a list of about 100,000 rsids and I want to get their genomic coordinates on the GRCh38 genome build. Is there a command line tool that allows me to do this? If yes, which one? I have tried bcftools but, given the error message I got, I believe…

Continue Reading bcftools – How can I retrieve the GRCh38 coordinates of a list of rsids?

merging individuals vcf.gz files to one vcf file in linux

merging individuals vcf.gz files to one vcf file in linux 1 Hello everyone I have animals imputed sequence wide genotypes from SkimSEEK (Low pass). I want merge all individuals together in VCF. I run bellow code but I get error: bcftools merge file1.vcf.gz file2.vcf.gz file3.vcf.gz …. -Oz -o merged4.vcf.gz failed…

Continue Reading merging individuals vcf.gz files to one vcf file in linux

how to pass Bam and Bam index as Input Channel?

Nextflow: how to pass Bam and Bam index as Input Channel? 2 I would like to pass in bam files pair_id.sorted.bam and their corresponding index files pair_id.sorted.bam.csi into a nextflow workflow. However I am having trouble passing in the files, with errors being thrown for def indexFile = new File(“${it.getPath()}.bai”)….

Continue Reading how to pass Bam and Bam index as Input Channel?

FORMAT column with GT for somatic SV VCFs

FORMAT column with GT for somatic SV VCFs 0 Hi, I have done tumor-only somatic structural variant (SV) calling with Manta. This results in several VCFs with candidate variants. However, some of these do not contain GT in FORMAT (GitHub issue). In tumor-only mode I do not get somaticSV.vcf.gz as…

Continue Reading FORMAT column with GT for somatic SV VCFs

Deepvariant variant calling by singularity

Deepvariant variant calling by singularity 1 The code I run: 1. BIN_VERSION=”1.5.0″ sudo apt -y update sudo apt-get -y install docker.io sudo docker pull google/deepvariant:”${BIN_VERSION}” 2. # Pull the image singularity pull :”${BIN_VERSION}” 3. mkdir 04.deepvariant # Run singularity singularity run -B /mountpoint/fastQ/:/input \ deepvariant_1.5.0.sif \ /opt/deepvariant/bin/run_deepvariant \ –model_type WES…

Continue Reading Deepvariant variant calling by singularity

unable to find most of SVs in constructed graph.vg

Hi vg team, I followed the instructions provided in the [Working with a whole genome variation graph](https://github.com/vgteam/vg/wiki/Working-with-a-whole-genome-variation-graph) to construct my own variation graph. After constructing the graph, I wanted to validate if my `input.vcf` file successfully passed all the structural variations (SVs) to the graph. My approach was to use…

Continue Reading unable to find most of SVs in constructed graph.vg

Mutect2 error – Cannot construct fragment from more than two reads

Mutect2 error – Cannot construct fragment from more than two reads 0 Hi, I’m trying to analyze WGS data and I’m currently running Mutect2, however, I’ve been receiving the following error message stating “Cannot construct fragment from more than two reads” and I’m not sure where things have gone wrong….

Continue Reading Mutect2 error – Cannot construct fragment from more than two reads

Annovar Error

I am trying to annotate a vcf using annovar using the following command perl /annovar/table_annovar.pl chr21.vcf.gz annovar/humandb/ -buildver hg38 -out chr21 -remove -protocol refGene,ensGene,esp6500siv2_aa,esp6500siv2_ea,esp6500siv2_all -operation g,g,r,r,r -nastring . -vcfinput –nopolish I am not getting the output in VCF format, not understanding the error. Error and log NOTICE: Running with system…

Continue Reading Annovar Error

multisample vcf filter with bcftools, condition true for ALL samples

Hello, I have a vcf file with 11 samples. I would like to keep sites where, for all samples, this condition is true: MAX(AD)/SUM(AD) <= 0.6 & MAX(AD)/SUM(AD) >=0.4 In other words, I want to keep sites where, for all samples, the allele frequencies to support the SNP are between…

Continue Reading multisample vcf filter with bcftools, condition true for ALL samples

gatk Funcotator

gatk Funcotator 0 Hello!! I’m getting an error like this “A USER ERROR has occurred: Cannot read because no suitable codecs found” while running gatk funcotator. Can anyone guide me how do I solve the error? Thanks!!! Used command: java -jar gatk Funcotator -V file.vcf –ref-version hg19 -R ref_all.fasta –data-sources-path…

Continue Reading gatk Funcotator

Help with don’t have ID when running bcftools annotate

Help with don’t have ID when running bcftools annotate 0 Hi all, I don’t know why I run similar command with the previous one I ran but I don’t get the ID this time. The only thing I change is the vcf from another tool. Would you please have a…

Continue Reading Help with don’t have ID when running bcftools annotate

write output files with default name

write output files with default name 0 I have prepared a shell script file and for the output I want to have a default name, but something is wrong. Can anybody revise this command? calldir=/profile/variant/input/ base=$(echo $sam | sed “s/.sam.*/_sorted/g”) sam=/profile/variant/input/s5000W_b2.bam –output $calldir/$(basename $base)_series_call.vcf.gz But in the output the file…

Continue Reading write output files with default name

[E::idx_find_and_load] Could not retrieve index file for Singularity /NextFlow

[E::idx_find_and_load] Could not retrieve index file for Singularity /NextFlow 0 Hi I am writing my first code is dsl2 Nextflow using Singularity container as below main.nf process pbc_varicall { publishDir “/data/shared/clinical/LongRead/Data/resources/” container ‘docker://google/deepvariant:1.5.0’ input: path bam output: file “*” path ‘out_m84011_220902_175841_Chr20.vcf.gz’ script: “”” run_deepvariant –model_type PACBIO –ref ${params.ref_genome} –reads ${bam}…

Continue Reading [E::idx_find_and_load] Could not retrieve index file for Singularity /NextFlow

object of type ‘NoneType’ has no len()

I run Truvari for benchmarking of 2 vcf files truvari bench -b NA12878_S1.genome.vcf.gz -c b1.vcf.gz -o out. However, it gives the following error. First vcf file contains format, info, filter, contig and maxdepth headers, which is the vcf file I found on Internet. The second vcf file is output of…

Continue Reading object of type ‘NoneType’ has no len()

Very few snp and indels variation were identified using PAV variation input file base on vg call

Very few snp and indels variation were identified using PAV variation input file base on vg call 0 Hi all, We want to find the snp and indels variation from the result vcf file BS_graph_call.vcf by using the pan_genome vg analysis software. There are only **fewer than 20 snp and…

Continue Reading Very few snp and indels variation were identified using PAV variation input file base on vg call

Get proper link to Mutation Overview webpages in COSMIC

Get proper link to Mutation Overview webpages in COSMIC 0 Hi everyone, I have a VCF annotated using COSMIC database VCFs where the ID column is now the COSV identifier for each variant and I also have the LEGACY ID in the INFO column. My question is, using either identifier…

Continue Reading Get proper link to Mutation Overview webpages in COSMIC

Truvari error: Failed to parse TBX_VCF

I am using truvari tools truvari bench -b basegenome.genome.vcf.gz -c genome1.vcf.gz -o out. But it gives the following error. 2023-06-16 10:27:58,831 [WARNING] Excluding 1 contigs present in comparison calls header but not baseline calls. [E::get_intv] Failed to parse TBX_VCF, was wrong -p [type] used? The offending line was: “ead mapping…

Continue Reading Truvari error: Failed to parse TBX_VCF

Whole Genome VCF splitting with and without tbi file

More Efficient: Whole Genome VCF splitting with and without tbi file 1 Hello, I am currently writing code to split a whole genome vcf file by chromosome. Right now, I do so with bcfTools to output 22 .vcf.gz files with the flag –target such that I can avoid the necessity…

Continue Reading Whole Genome VCF splitting with and without tbi file

PLINK, Unphased heterozygous hardcalls in partially-phased variants are poorly represented with bits=8

PLINK, Unphased heterozygous hardcalls in partially-phased variants are poorly represented with bits=8 1 Hi, I’m currently trying to convert dosage data from the vcg.gz format to bgen 1.2 format (8 bits), using plink2, in order to use it later with LDpred-2. However, during the conversion process, I encountered a warning…

Continue Reading PLINK, Unphased heterozygous hardcalls in partially-phased variants are poorly represented with bits=8

SnpEff output VCF in `.gz` format

SnpEff output VCF in `.gz` format 1 Can snpEff output in VCF format ? java -Xmx4g -jar /cluster/work/grlab/share/software/snpEff/snpEff.jar hg38 -c /cluster/work/grlab/share/software/snpEff/snpEff.config -nodownload chr22_joint_genotyped.vcf.gz -stats chr22_snpeEff_summary.html >chr22_joint_genotyped.ann.vcf.gz The output VCF is not in gzip format. Is there way to output the snpeff output in .vcf.gz format ? snpeff • 17 views…

Continue Reading SnpEff output VCF in `.gz` format

Infinite value present in GRM, between sample ‘XXXX’ and itself

when I try to run –pca command Iget the Error massge:” Infinite value present in GRM, between sample ‘XXXX’ and itself”what the problem can be?command:../scripts/plink2  –vcf filterxy.vcf.gz –pca allele-wts  –exclude pca-clean.prune.out –freq counts –remove pca-clean.king.cutoff.out.id –out pca-clean log: PLINK v2.00a5LM 64-bit Intel (16 May 2023)     www.cog-genomics.org/plink/2.0/(C) 2005-2023 Shaun Purcell,…

Continue Reading Infinite value present in GRM, between sample ‘XXXX’ and itself

no output from GATK CombineGVCFs

no output from GATK CombineGVCFs 1 Hello All, I am using GATK to do SNPs calling from 140 RNAseq data. After variant calling of each sample with HaplotypeCaller, I get 140 g.vcf.gz files. Before perform the final joint genotyping through GenotypeGVCFs, I need to combine these 140 g.vcf.gz files into…

Continue Reading no output from GATK CombineGVCFs

plink2 –pmerge question

Hi, there: I am so happy that –pmerge could now do all the work of –bmerge. It seems that I don’t need to use two different versions of plink to handle the merging of different datasets. I do have some quick question on –pmerge. I would deeply appreciate if someone…

Continue Reading plink2 –pmerge question

How to perform a phylogenetic analysis from a vcf file

How to perform a phylogenetic analysis from a vcf file 0 Hi, I have vcf file name as subset_filtered.vcf.gz and want to perform the phylogenetic analysis to find out the relationship among different accessions. Could someone guide me how to perform the analysis based on vcf file? variants haplotyping phylogeny…

Continue Reading How to perform a phylogenetic analysis from a vcf file

How to filter vcf file on minimum genotype depth and quality for each sample

How can I filter a vcf filter a VCF file on minimum genotype depth and genotype quality for each sample. I am looking for a way to filter variants from a VCF file by checking that all samples for a site pass 2 critera sample.DP > 10 sample.GQ > 15…

Continue Reading How to filter vcf file on minimum genotype depth and quality for each sample

PLINK not converting entire vcf to bed file

PLINK not converting entire vcf to bed file 0 here is my code : [ethan.kreuzer@hydra1 2_Population_stratification]$ plink –vcf ALL.2of4intersection.20100804.genotypes.vcf.gz –make-bed –out ALL.2of4intersection.20100804.genotypes [mii] Please select a module to run plink: MODULE PARENT(S) 1 plink/1.9b_6.21-x86_64 StdEnv/2020 2 plink/1.07 nixpkgs/16.09 intel/2018.3 3 plink/1.9b_5.2-x86_64 nixpkgs/16.09 4 plink/1.9b_4.1-x86_64 nixpkgs/16.09 Make a selection (1-4, q…

Continue Reading PLINK not converting entire vcf to bed file

Error in Adding 1000Genomes Ancestral Allele info: Using VCF tools fill-aa

Error in Adding 1000Genomes Ancestral Allele info: Using VCF tools fill-aa 1 Hi I am trying to add ancestral allele to 1000 Genomes Phase3 VCF files. I have used the “human_ancestor_GRCh37_e59.tar.bz2” files for ancestral allele input file. The steps I have used are: cat human_ancestor_3.fa | sed ‘s,^>.*,>1,’ | bgzip…

Continue Reading Error in Adding 1000Genomes Ancestral Allele info: Using VCF tools fill-aa

Removing indels in VCF file

Hello, I am trying to do something very simple, but running into confusing behaviour. I have a VCF file of multiple samples and want to remove all indels so that I can generate sequences with identical coordinates with bcftools consensus. I removed indels by specifying bcftools view –include ‘TYPE=”snp” ||…

Continue Reading Removing indels in VCF file

VEP/ CADD error – ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.

Dear Biostars, I am having a confusing issue with my CADD plugin. This is confusing because when I run VEP for my whole trio – all the plugins work fine. However when I try to run CADD for individual – pivoted files – it no longer does and I get…

Continue Reading VEP/ CADD error – ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.

Removing multi-variant records from vcf file

Removing multi-variant records from vcf file 3 I am using gatk ASEReadCounter to get the read counts per allele. To do so, I used the following command: gatk ASEReadCounter -R /path_to_genome/hg38_genome/GRCh38.p13.genome.fa -I sample.sorted.bam -V sample.vcf.gz -O output.table I used GATK4. but I realized In my VCF at position chr1:1574033, there…

Continue Reading Removing multi-variant records from vcf file