Tag: VCF

michigan imputation server

michigan imputation server 0 Hi, I performed imputation on my GWAS data using Michigan imputation server. Now I have two output files: 1).dose.vcf.gz and 2).info.gz Michigan imputation server use mimimac3 (–format GT,DS,GP) and in the output file “.dose.vcf.gz” are present all the three formats. I’m new on this kind of…

Continue Reading michigan imputation server

michigan imputation server

michigan imputation server 0 Hi, I performed imputation on my GWAS data using Michigan imputation server. Now I have two output files: 1).dose.vcf.gz and 2).info.gz Michigan imputation server use mimimac3 (–format GT,DS,GP) and in the output file “.dose.vcf.gz” are present all the three formats. I’m new on this kind of…

Continue Reading michigan imputation server

R: Main bedtools wrapper function.

R: Main bedtools wrapper function. tabix {bedr} R Documentation Main bedtools wrapper function. Description Main bedtools wrapper function. Usage tabix( region, file.name, params = NULL, tmpDir = NULL, deleteTmpDir = TRUE, outputDir = NULL, outputFile = NULL, check.zero.based = TRUE, check.chr = TRUE, check.valid = TRUE, check.sort = TRUE, check.merge…

Continue Reading R: Main bedtools wrapper function.

R: Main bedtools wrapper function.

R: Main bedtools wrapper function. tabix {bedr} R Documentation Main bedtools wrapper function. Description Main bedtools wrapper function. Usage tabix( region, file.name, params = NULL, tmpDir = NULL, deleteTmpDir = TRUE, outputDir = NULL, outputFile = NULL, check.zero.based = TRUE, check.chr = TRUE, check.valid = TRUE, check.sort = TRUE, check.merge…

Continue Reading R: Main bedtools wrapper function.

BAM dataset to Genotype data conversion using PLINK

BAM dataset to Genotype data conversion using PLINK 1 You must use a genotype caller in order to obtain genotypes from a .bam file. It’s not possible to ‘convert’ .bam to genotypes. There’s a lot of options, but maybe using bcftools is the most simple. Take a read of this…

Continue Reading BAM dataset to Genotype data conversion using PLINK

BAM dataset to Genotype data conversion using PLINK

BAM dataset to Genotype data conversion using PLINK 1 You must use a genotype caller in order to obtain genotypes from a .bam file. It’s not possible to ‘convert’ .bam to genotypes. There’s a lot of options, but maybe using bcftools is the most simple. Take a read of this…

Continue Reading BAM dataset to Genotype data conversion using PLINK

Does REF of a SNP in vcf must same as referance genome?

Does REF of a SNP in vcf must same as referance genome? 0 Does REF in VCF refer to the base of the reference genome or the base that is the majority of the population? My purpose is to add some VCF made by others into my VCF. My population…

Continue Reading Does REF of a SNP in vcf must same as referance genome?

Nebula Genomics Black Friday & Cyber Monday Deal: Save $100!

Nebula Genomics has a Black Friday and Cyber Monday deal! DNA tests make great holiday gifts and the biggest sale of the year is here just in time for Black Friday and Cyber Monday! Save $100 on the 30x Deep Test Kit on 11/26 -11/29. Also now through November 30,…

Continue Reading Nebula Genomics Black Friday & Cyber Monday Deal: Save $100!

vcftools 012 apparently is giving me wrong genotypes

vcftools 012 apparently is giving me wrong genotypes 0 Hi, I want to convert a vcf to a numeric format. I am using vctools 012 using vcftools/0.1.16. This is how my vcf looks like: zcat file.vcf.gz | grep -v “#” | cut -f 1,2,10,11,12 | head -n 4 Chr01 1076…

Continue Reading vcftools 012 apparently is giving me wrong genotypes

Trouble indexing a .vcf.gz file

Trouble indexing a .vcf.gz file 1 Hello everyone, I am trying to index a .vcf.gz file in order to get a fasta consensus with bcftools this is the simple command i give: tabix myFile.vcf.gz and I get the next error: [E: :get_intv] failed to parse TBX_VCF, was wrong -p [type]…

Continue Reading Trouble indexing a .vcf.gz file

struggling in using bcftools to set variants to missing

Below is a Python API solution using the pyvcf submodule from the fuc package I wrote. Imagine you have the following data: >>> from fuc import pyvcf >>> data = { … ‘CHROM’: [‘chr1’, ‘chr1’, ‘chr1’, ‘chr1’], … ‘POS’: [100, 101, 102, 103], … ‘ID’: [‘.’, ‘.’, ‘.’, ‘.’], ……

Continue Reading struggling in using bcftools to set variants to missing

012 genotype matrix using vcf tools

012 genotype matrix using vcf tools 1 Hello everyone, I have a vcf-file contains nearly 11millions SNPs. I want to convert my vcf file into 012 genotype matrix for LD pruning. I am using this code: /data/programs/vcftools_0.1.13/bin/vcftools –vcf my.file.vcf –012 –out output_geno.vcf So, I get the output, but I am…

Continue Reading 012 genotype matrix using vcf tools

Consensus sequence calling with normalisation of indels

Consensus sequence calling with normalisation of indels 0 I’m following the workflow suggested by samtools here to produce a fasta with the consensus sequence. samtools.github.io/bcftools/howtos/consensus-sequence.html The workflow goes like this: # call variants bcftools mpileup -Ou -f reference.fa alignments.bam | bcftools call -mv -Oz -o calls.vcf.gz bcftools index calls.vcf.gz #…

Continue Reading Consensus sequence calling with normalisation of indels

SKATO hangs after going through several genes

SKATO hangs after going through several genes 0 Hi there I’m doing skato analysis with rvtest/2.1.0. My script: rvtest –inVcf file.vcf.gz –pheno phenofile –out newfile –geneFile refFlat_hg19.txt –kernel skato it runs for a while, but them skato seems to hang after a few genes Retrieve remote version failed, use ‘–noweb’…

Continue Reading SKATO hangs after going through several genes

bcftools consensus -m file.bed error

bcftools consensus -m file.bed error 0 Hi everyone, with bcftools consensus I’m trying to substitute the regions with low coverage with ‘N’. I made a BED file from the bam file with bedtools: bedtools genomecov -bga -ibam [file.bam] | awk ‘$4<5’ > low_coverage.bed output: 1 0 10001 0 1 10001…

Continue Reading bcftools consensus -m file.bed error

Why bcftools can not call some variant in consensus sequence

Why bcftools can not call some variant in consensus sequence 0 Hi everyone! I have a query regarding the consensus sequence I used the below command to create a consensus sequence. bcftools mpileup -f ref.fasta mapped.bam -d 80000 | bcftools call -c | vcfutils.pl vcf2fq > cons.fq If I see…

Continue Reading Why bcftools can not call some variant in consensus sequence

Why bcftools can not call some variant in consensus sequence

Why bcftools can not call some variant in consensus sequence 0 Hi everyone! I have a query regarding the consensus sequence I used the below command to create a consensus sequence. bcftools mpileup -f ref.fasta mapped.bam -d 80000 | bcftools call -c | vcfutils.pl vcf2fq > cons.fq If I see…

Continue Reading Why bcftools can not call some variant in consensus sequence

minimac4: autopkgtest regression: *** stack smashing detected ***: terminated

Source: minimac4 Version: 1.0.2-3 X-Debbugs-CC: debian…@lists.debian.org Severity: serious User: debian…@lists.debian.org Usertags: regression Dear maintainer(s), With a recent upload of minimac4 the autopkgtest of minimac4 fails in testing when that autopkgtest is run with the binary packages of minimac4 from unstable. It passes when run with only packages from testing. In…

Continue Reading minimac4: autopkgtest regression: *** stack smashing detected ***: terminated

minimac4: autopkgtest regression: *** stack smashing detected ***: terminated

Source: minimac4 Version: 1.0.2-3 X-Debbugs-CC: debian…@lists.debian.org Severity: serious User: debian…@lists.debian.org Usertags: regression Dear maintainer(s), With a recent upload of minimac4 the autopkgtest of minimac4 fails in testing when that autopkgtest is run with the binary packages of minimac4 from unstable. It passes when run with only packages from testing. In…

Continue Reading minimac4: autopkgtest regression: *** stack smashing detected ***: terminated

Creating SNP index

Creating SNP index 0 Hello, I’m having a problem in creating SNP index of Brassica allopolyploids using gmap. Where can I find the SNP data of Brassica? In order to create a pseudo genome I need to have the SNP index. I have tried the snpindex command of gmap with…

Continue Reading Creating SNP index

Using tabix to subset a region from a VCF file

Using tabix to subset a region from a VCF file 0 I’ve read quite a few posts that suggest I can subset my VCF file for a region of interest using Tabix. However, when I try as below the output file is empty VCF=”/Volumes/Seagate Expansion Drive/temp/130iPSC_061118.snp.vcf.gz” tabix -p vcf “$VCF”…

Continue Reading Using tabix to subset a region from a VCF file

bcftools consensus

Good morning, I’m new in bioinformatics and I need some help with a task . I’m trying to create a consensus file fasta from a sorted.marked.bam with bcftools. The command I’m using is: call variants bcftools mpileup -B -Ou -f reference.fa alignments.bam | bcftools call -mv -M -Oz -o calls.vcf.gz…

Continue Reading bcftools consensus

Convert SHAPEIT haps file to vcf

Convert SHAPEIT haps file to vcf 0 I’ve phased some sequencing using SHAPEIT, producing two files: 125QiPSC.haps 125QiPSC.log I would now like to subset the file for my SNPs of interest. To achieve this, I’m first attempting to convert it to a vcf. The SHAPEIT manual suggests this can be…

Continue Reading Convert SHAPEIT haps file to vcf

SHAPEIT using VCF unphased genotype input

I can get SHAPEIT to work with the default Plink PED/MAP format input files, but not with a VCF as input. As an example, here I use the demo data that comes with SHAPEIT, which runs well. DEMO=/Users/michaelflower/bin/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/example shapeit -B $DEMO/gwas.bed $DEMO/gwas.bim $DEMO/gwas.fam -M $DEMO/genetic_map.txt -O “$DIR”/shapeit/gwas.phased However, when I…

Continue Reading SHAPEIT using VCF unphased genotype input

Polyploidy found, and not supported by vcftools for a diploid data set.

Polyploidy found, and not supported by vcftools for a diploid data set. 0 Hi, I used gatk mutect2-select variant (retained only SNPs)-combinegvcfs to generate a vcf file for a diploid species. When I tried to process the vcf file using vcf tools, some of the commands did work, however, when…

Continue Reading Polyploidy found, and not supported by vcftools for a diploid data set.

Extract the DP and AD from VCF file along with the chromosome postion and alteration

Extract the DP and AD from VCF file along with the chromosome postion and alteration 2 Hi, I would like to extract the DP and AD from my VCF file, along with the chromosome position and alteration Example VCF file CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRRS1…

Continue Reading Extract the DP and AD from VCF file along with the chromosome postion and alteration

Extract the DP and AD from VCF file along with the chromosome postion and alteration

Extract the DP and AD from VCF file along with the chromosome postion and alteration 2 Hi, I would like to extract the DP and AD from my VCF file, along with the chromosome position and alteration Example VCF file CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRRS1…

Continue Reading Extract the DP and AD from VCF file along with the chromosome postion and alteration

Phasing using Beagle with a map file

I’d like to phase the SNPs in a vcf file and output consensus files for each haplotype, as suggested in this post: www.biostars.org/p/298635/ I’ve managed to install beagle in a conda environment: conda create -n beagle -c conda-forge -c bioconda beagle conda activate beagle When I run beagle using this…

Continue Reading Phasing using Beagle with a map file

GATK4 stripping header from .bam???? What the heck? : bioinformatics

Hi all. I have a problem. Code posted below for those who want to take a look. I have a series of 167 .bam files I need to variant call for my project. Aside from them being an absolute nightmare to work with on other grounds, a new problem has…

Continue Reading GATK4 stripping header from .bam???? What the heck? : bioinformatics

Post filtering analysis for exome data

Post filtering analysis for exome data 0 Hello I am following GATK pipeline to process exome data set. I am done with preprocessing step and filtered the dataset by hard filtering method. Now, I am looking for variants shared between the affected individuals. In the vcf file, I get the…

Continue Reading Post filtering analysis for exome data

heterozygous SNV AB>0.15, heterozygous indel

heterozygous SNV AB>0.15, heterozygous indel<0.20 in UKB-WES 0 These gVCFs were joint genotyped using GLnexus (www.biorxiv.org/content/10.1101/572347v1) to create a single, unfiltered project-level VCF (pVCF). Genotype depth filters (SNV DP≥7, indel DP≥10) were applied prior to variant site filters requiring at least one variant genotype passing an allele balance filter (heterozygous…

Continue Reading heterozygous SNV AB>0.15, heterozygous indel

phasing VCF files with missing genotype

phasing VCF files with missing genotype 0 I want to phasing a VCF file with missing value(./.), the output I want get (a VCF also)is all the genotype are phased, but the ./. will not be imputed. I have tried shapeit and beagle, but both of them impute the missing…

Continue Reading phasing VCF files with missing genotype

error while annotating vcf file using snpEff

error while annotating vcf file using snpEff 0 I want to annotate my vcf file with geneid of ensemble. I used snpEff tool to annotate vcf file with geneID of ensemble. But in my case its the same gene ID being assigned for every rsid. This is the command I…

Continue Reading error while annotating vcf file using snpEff

Quality score discrepency (bcftool and freebayes)

Quality score discrepency (bcftool and freebayes) 0 Hello, I variant called my data using both bcftools and freebayes, In the vcf they differ in how they report their Phred quality score. Bcftools reported quality for the same variant site as 56, while freebayes reported it as 530. I was wondering…

Continue Reading Quality score discrepency (bcftool and freebayes)

How do I get bcftools to rename some samples in a .vcf file?

How do I get bcftools to rename some samples in a .vcf file? 1 I am trying to use the reheader tool from bcftools in order to rename some samples in a .vcf.gz file, but it does not work, and is giving some pretty strange errors. Before trying to rename…

Continue Reading How do I get bcftools to rename some samples in a .vcf file?

identification of ROH using plink

identification of ROH using plink 0 Hello All I generated vcf file using GATK (First Haplotypecaller –> CombinedGVCF –> GenotypeGVCF and then Hard filtering ). After this, I converted filtered vcf file into plink binary PED files (.bed, .fem, .bim, plink v1.9) using –make-bed command. However, when I used these…

Continue Reading identification of ROH using plink

Shared variants

Shared variants 1 Hello I have exome data sets from 6 individuals, in which 4 are affected and 2 are not affected. I have to identify the variants which are shared between the four affected individuals. I did the joint call genotyping for the 4 affected individuals and filtered the…

Continue Reading Shared variants

GATK HaplotypeCaller works without GVCF option, but errors with GVCF

I’ve extracted chromosome 4 from a whole genome bam file as follows: samtools view -h “$BAM” chr4 > “$EXT/temp/”$PREFIX”_chr4.sam” samtools view -bS “$EXT”/temp/$PREFIX”_chr4.sam” > “$EXT”/temp/$PREFIX”_chr4.bam” Then added read groups, as required by GATK picard AddOrReplaceReadGroups I=”$BAM” O=”$EXT”/temp/$PREFIX”_chr4_rg.bam” RGID=4 RGLB=lib1 RGPL=ILLUMINA RGPU=unit1 RGSM=20 Index the bam: samtools index “$BAM” Download the…

Continue Reading GATK HaplotypeCaller works without GVCF option, but errors with GVCF

“snp order check fail; snp list not ordered”

admixTools convertf error on ped file: “snp order check fail; snp list not ordered” 0 I am trying to run the convertf tool in AdmixTools, as I have successfully run before. I am getting an error (warning perhaps? Because it appears to run to completion..?) that “snp order check fail;…

Continue Reading “snp order check fail; snp list not ordered”

Creating a per sample file from multi-sample vcf

Creating a per sample file from multi-sample vcf 1 I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this: ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1ID2 chr2:87432_A:T_1/1ID3 chr11:432434:T:G chr14:34234234:C:G…

Continue Reading Creating a per sample file from multi-sample vcf

PGT only available for some variants in GATK .vcf

PGT only available for some variants in GATK .vcf 1 I’ve got a vcf file someone else prepared using GATK. I’m interested in the phasing information in the PGT tag e.g. 0|1. This information seems to be available for some variants, but not for others e.g. below chr1 16977 ….

Continue Reading PGT only available for some variants in GATK .vcf

GATK Mutect2 errors during basic variant calling

GATK Mutect2 errors during basic variant calling 0 I’ve just installed GATK and am trying to do some basic variant calling. However when I try and run this line gatk Mutect2 -R $REF -I “$BAM” -O “$DIR”/gatk/$PREFIX”_bwa_gatk_unfiltered.vcf” I get the error below. Reading the output, it looks like this is…

Continue Reading GATK Mutect2 errors during basic variant calling

Windows-Bases Software Packages Which Can Analyze Vcf Files

Windows-Bases Software Packages Which Can Analyze Vcf Files 6 I would like to work with VCF files. Select one person, subset one gene or chromosome or chromosome part. I tried VMware and Ubuntu and VCFtools and GATK and tabix but I run into a lot of errors. I don’t have…

Continue Reading Windows-Bases Software Packages Which Can Analyze Vcf Files

htsjdk.tribble.TribbleExpection: The provided VCF file is malformed at approximately line number 5880: Duplicate allele added to VariantContext: GT

Error: htsjdk.tribble.TribbleExpection: The provided VCF file is malformed at approximately line number 5880: Duplicate allele added to VariantContext: GT 2 I am trying to index a vcf file using igvtools. For some reason, I am getting the following error. Error: htsjdk.tribble.TribbleExpection: The provided VCF file is malformed at approximately line…

Continue Reading htsjdk.tribble.TribbleExpection: The provided VCF file is malformed at approximately line number 5880: Duplicate allele added to VariantContext: GT

Phasing with Beagle 5.2 and no reference panel

Phasing with Beagle 5.2 and no reference panel 0 Hi everyone, I have a question about phasing with Beagle 5.2 without a reference panel. I have seen in answers in a couple other posts about Beagle that trying to phase with too few samples and no reference panel is not…

Continue Reading Phasing with Beagle 5.2 and no reference panel

False negatives -Hard filtering

False negatives -Hard filtering 0 Hello I need some suggestions in filtering the variants in the exome data. I combined all the GVCF files as one file and did joint call genotyping and created one vcf file. The variants in the file were hard-filtered. As first step to evaluate the…

Continue Reading False negatives -Hard filtering

How can I know what is a good and bad variant call?

How can I know what is a good and bad variant call? 0 Hi, this is my first post/question, sorry if it is not allowed. I am generating variant calls with a variant caller. The variant caller I am using is bcftools. I am analyzing/looking the VCF in Excel and…

Continue Reading How can I know what is a good and bad variant call?

Malformed walker argument using MarkDuplicatesSpark

Malformed walker argument using MarkDuplicatesSpark 1 I am creating my own NGS pipeline from illumina-fastq file to vcf. This is for pure learning purposes. When I run the following code everything is ok java -Xmx4000m “$javatmp” -jar “$picardpath” SortSam INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/sam/1.sam OUTPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam SORT_ORDER=coordinate COMPRESSION_LEVEL=5 java -Xmx4000m “$javatmp” -jar “$picardpath” MarkDuplicates INPUT=/home/mdb1c20/my_onw_NGS_pipeline/files/bam/1_sorted.bam…

Continue Reading Malformed walker argument using MarkDuplicatesSpark

Linked supergenes underlie split sex ratio and social organization in an ant

Significance Some social insects exhibit split sex ratios, wherein a subset of colonies produce future queens and others produce males. This phenomenon spawned many influential theoretical studies and empirical tests, both of which have advanced our understanding of parent–offspring conflicts and the maintenance of cooperative breeding. However, previous studies assumed…

Continue Reading Linked supergenes underlie split sex ratio and social organization in an ant

How to get the sample ID number to print with variant Information

How to get the sample ID number to print with variant Information 0 Hello, I am very new to bioinformatics and I am working on analyzing a file. Using code: bcftools query -f ‘%CHROMt%POS[t%GTt]n’ Regiononly.vcf.gz Prints out the genotypes and one sample is showing heterozygous for the variant of interest….

Continue Reading How to get the sample ID number to print with variant Information

In the NGS pipeline, why read are sorted before marking duplicates?

In the NGS pipeline, why read are sorted before marking duplicates? 0 I am creating my own NGS pipeline (from Illumina fastq to vcf file). I am using best practices GATK and the pipeline already created in the clinical lab I am working. I have seen that the fastq is…

Continue Reading In the NGS pipeline, why read are sorted before marking duplicates?

How to add sample information to a new record?

I have to correct and remove some fields in the info column of a VCF file. Pysam seems the logical choice, but I fail to write a new file. Adding SAMPLE to a pysam.VariantFile seems to be the issue. Any insight highly appreciated! Code import pysam import re def build_new_header(vcf_header:pysam.VariantHeader)…

Continue Reading How to add sample information to a new record?

Janis Germline Variant-Calling Workflow (GATK)

This is a genomics pipeline to do a single germline sample variant-calling, adapted from GATK Best Practice Workflow. This workflow is a reference pipeline for using the Janis Python framework (pipelines assistant). Alignment: bwa-mem Variant-Calling: GATK HaplotypeCaller Outputs the final variants in the VCF format. Resources This pipeline has been…

Continue Reading Janis Germline Variant-Calling Workflow (GATK)

VCF file generation from multiple samples fro PCA

VCF file generation from multiple samples fro PCA 0 I am trying to generate vcf file for 80 samples(human) and use it for pca. But when trying to get eigen vectors using plink it says genotyping rate is 0.12 and when i remove snps with missing data threshold all data…

Continue Reading VCF file generation from multiple samples fro PCA

VCF file generation from multiple samples fro PCA

VCF file generation from multiple samples fro PCA 0 I am trying to generate vcf file for 80 samples(human) and use it for pca. But when trying to get eigen vectors using plink it says genotyping rate is 0.12 and when i remove snps with missing data threshold all data…

Continue Reading VCF file generation from multiple samples fro PCA

How to Annotate human samples vcf with dbSNP for GRch38 version?

How to Annotate human samples vcf with dbSNP for GRch38 version? 0 i am trying to annotate my 80 samples vcf file with dbsnp as mentioned in all dna analysis pipelines. I am new to bioinformatics. dbsnp human grch38 annotation vcf • 47 views Login before adding your answer. Read…

Continue Reading How to Annotate human samples vcf with dbSNP for GRch38 version?

How to interpret AC/RC fields in VCF files for a single sample

Hi everyone, I have received separate VCF files for multiple samples. I merged each sample into a single VCF file for each chromosome, and then sorted the resulting files using: bcftools merge -l ID_list.txt -r $chrom -O z -o ${chrom}_merged.vcf.gz bcftools sort ${chrom}_merged.vcf.gz -O z -o ${output}/${chrom}_sorted.vcf.gz I’ve pasted a…

Continue Reading How to interpret AC/RC fields in VCF files for a single sample

sciclone iteration does not converge

sciclone iteration does not converge 1 Hello everyone: I have a problem when using sciclone. I extracted the relevant information needed by sciclone from the vcf file generated from the paired normal tumor data as input, and then the following problem has been encountered. I would like to ask whether…

Continue Reading sciclone iteration does not converge

vcf – Ensembl Variant Effect Predictor (VEP) issue during execution

vcf – Ensembl Variant Effect Predictor (VEP) issue during execution – Bioinformatics Stack Exchange …

Continue Reading vcf – Ensembl Variant Effect Predictor (VEP) issue during execution

Using VEP to get gnomAD frequencies

Hi all, I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases. For example, when I pass in: 10 69408929 COSM3751912 A…

Continue Reading Using VEP to get gnomAD frequencies

Using VEP to get gnomAD frequencies

Hi all, I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases. For example, when I pass in: 10 69408929 COSM3751912 A…

Continue Reading Using VEP to get gnomAD frequencies

Developing my own NGS pipeline

Developing my own NGS pipeline 1 I am a trainee bioinformatician working in a genomics lab. For learning proposes I want to develop my own NGS pipeline (from fastq file to VCF file). it would be great if someone could please pass me links where I can step by step…

Continue Reading Developing my own NGS pipeline

The provided VCF file is malformed

htsjdk.tribble.TribbleException: The provided VCF file is malformed 1 I have VCF files that I want to convert to a more readable TSV file using GATK VariantsToTable, and I also want to load in the VCF in IGV. However, when trying to do this, I get the same error for both…

Continue Reading The provided VCF file is malformed

How To Split Multiple Samples In Vcf File Generated By Gatk?

There now also is a plugin in bcftools which does the split in a single pass over the multi-sample VCF/BCF file. It does not seem to be very fast, but looks correct and there are options to do the split in custom ways. You do need to install bcftools with…

Continue Reading How To Split Multiple Samples In Vcf File Generated By Gatk?

Why does write.ped remove the first locus?

Why does write.ped remove the first locus? 0 In order to get a VCF file from genind, I am going through hierfstat function write.ped() and then with plink I convert the result to vcf. This is my code (apologies, but I cannot provide a reproducible data for this particular scenario):…

Continue Reading Why does write.ped remove the first locus?

Problem with vcf file columns

Problem with vcf file columns 0 Hello. I’m having troubles with a vcf file I just generated with Stacks. The thing is that the column of the first sample (the first individual in my vcf file) instead of having the information about the genotype, the depth and other things, it…

Continue Reading Problem with vcf file columns

VCF filters and variant intersection

VCF filters and variant intersection 0 Hi Guys, I am using joint genotyping method to generate multisample VCF file that involves variant calling, joint data aggregation and joint genotyping steps. I wonder about the filters which I need to apply to VCF. Which filters should I apply ?? I also…

Continue Reading VCF filters and variant intersection

I have genomic file. But it has different representation than usual

I have genomic file. But it has different representation than usual 0 Recently I got access genomic data by an organization. It has .bgen file, so I converted it to vcf file by qctool. But it has different SNP representation than I used to. I used to SNP representation like…

Continue Reading I have genomic file. But it has different representation than usual

Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service | BMC Bioinformatics

Since the opening of the open-ended Call in February 2020 [30], Laniakea@ReCaS has accepted ten project proposals for a total of 18 Galaxy instances operating on the ReCaS infrastructure that altogether launched almost 30 k jobs, as of March 2021 (Fig. 3). Fig. 3 Cumulative number of jobs launched by all the…

Continue Reading Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service | BMC Bioinformatics

No header in VCF file

No header in VCF file 0 Hello everyone, I am working with a specific variant calling pipeline, and the output is a VCF file missing headers. It seems there are no option to add the header in the output. Trying to add an header with picard FixVcfHeader, I get errors…

Continue Reading No header in VCF file

fuc.pyvcf Attribute Error

fuc.pyvcf Attribute Error 0 Hello, I try to extract GT information from FORMAT field in my .vcf file using fuc.pyvcf submodule. When I try to run my script: from fuc import pyvcf import pandas as pd vf = pyvcf.VcfFrame.from_file(‘P1_test.vcf’) vf.df vf.extract_format(‘GT’) I’ve got the error: Traceback (most recent call last):…

Continue Reading fuc.pyvcf Attribute Error

map files

map files 0 Hi all, I am performing imputation using IMPUTE2. the reference file is a custom genotype vcf file extracted using the b37 build. will i need to provide a different genome map file for the custom set, or can I use the 1000genome data provided by IMPUTE2? and…

Continue Reading map files

How to convert GEN or .gen format from impute.me to vcf on windows 10?

How to convert GEN or .gen format from impute.me to vcf on windows 10? 1 I tried for days to convert a gen file to vcf but it did not work. I am a beginner so i don’t know what are in vcf files and gen files or how they…

Continue Reading How to convert GEN or .gen format from impute.me to vcf on windows 10?

sciClone input vaf file?

sciClone input vaf file? 3 Dear All, Hi, I want to use sciclone on our exome sequencing data. but one thing I can’t understand that is how can I got varCount equal to 0? I have no idea about this, following data i just grep from sciclone-meta-master manuscript figure3 data…

Continue Reading sciClone input vaf file?

merge individual runs after bcftools mpileup

merge individual runs after bcftools mpileup | bcftools call 0 Hello! I am running bcftools mpileup | bcftools call for variant calling and I have no problems getting the output file when I run 1 or 2 samples. When I try all samples (~50), I get the error message: “Failed…

Continue Reading merge individual runs after bcftools mpileup

Splitting A Vcf File

Splitting A Vcf File 7 Hi i downloaded a VCF file conatins multiple genome data(Muliple sample)> i want to split the VCF file to each geome file(VCF file with 1 geome). I diidnt find any script. if you have any please share with me vcf • 18k views I know…

Continue Reading Splitting A Vcf File

How to import dosage information to plink binary files?

How to import dosage information to plink binary files? 0 Hi All, I recently converted a very large Topmed imputed VCF files into a plink format. The command I used to convert this VCF was plink1.9 –vcf ${VCF} –make-bed –out ${VCF}_binary. Additionally, I also spent a significant amount of time…

Continue Reading How to import dosage information to plink binary files?

Alelle frequency plot

Alelle frequency plot 1 Hi, I have to plot allele frequencies of two different SNP chip datasets. I have two VCF files and would like to make a scatterplot in which these 2 datasets are plotted one against each other. What is the easiest way to do this? I apologize…

Continue Reading Alelle frequency plot

Unknown genotypes (.) in VCF, but have supporting reads?

Unknown genotypes (.) in VCF, but have supporting reads? 0 In a VCF created by HaplotypeCaller, with reads from two haploid samples, I have some entries in which one sample has a mutation but the other doesn’t, where as expected I see a 1 for one sample and a 0…

Continue Reading Unknown genotypes (.) in VCF, but have supporting reads?

HaplotypeCaller calling mutations based on one read?

HaplotypeCaller calling mutations based on one read? 0 I’m using GATK HaplotypeCaller, via grenepipe, with the default options as specified by grenepipe except for -ploidy 1 as I am working with haploid yeast. I am seeing some mutations called based on one single read only if I am interpreting the…

Continue Reading HaplotypeCaller calling mutations based on one read?

GATK-Allele frequency

GATK-Allele frequency 0 Hi Guys, I am running GATK on bam file for variant calling. In the output file, I noticed that the Allele frequency is computed as 0.5 and 1.00. What may be the reason for this? Is it calculated correctly? VCF Allele GATK frequency • 27 views Login…

Continue Reading GATK-Allele frequency

Problems Imputing X Chromosome with TOPMed

I have a large dataset whose autosomes I was able to successfully phase and impute using TOPMed. I have tried doing the same with the X chromosome but keep running into issues. Before trying to impute with TOPMed, I did per-individual QC and per-marker QC, then ran checkVCF, and corrected…

Continue Reading Problems Imputing X Chromosome with TOPMed

bcftools merge GP format issues

Hello, I am trying to merge VCF files from several samples from different sequencing runs. I ran bcftools merge on the VCF files and after ten hours I got the error message “Incorrect number of FORMAT/GP values at chr_Y:216795, cannot merge. The tag is defined as Number=G, but found 2…

Continue Reading bcftools merge GP format issues

GT field in a 8 ploidy vcf

Hello, What is the meaning of lines that have only 4 GT values in an 8 ploidy VCF file? for example: 1/1/1/1:8:0,8:0:0:8:262 1/1/1/1:3:0,3:0:0:3:105 instead of 1/1/1/1/1/1/1/1:2:0,2:0:0:2:72 this is the command I used to create each one of the VCF files: freebayes -f $REF -p 8 $SORTED_BAM > $OUTPUT this is…

Continue Reading GT field in a 8 ploidy vcf

Calculate allele frequency from many VCF files in specific locus

Calculate allele frequency from many VCF files in specific locus 1 Dear all, I have 100 VCF files (100 different samples). I would like to calculate allele frequency in specific sites. In one specific locus I have three genotypes (GATK best practices workflow): rs-xxxxx: A/A occurring in 30 samples (ref…

Continue Reading Calculate allele frequency from many VCF files in specific locus

Compare genotype genome sequences at basepair level

I have recently explored various alternatives to a similar problem and came away with the following potential solutions: Solution 1 The “easiest” to do this would be to generate a VCF variant file with a SNP calling tool, then transform that variant file into a tabular file with bcftools view….

Continue Reading Compare genotype genome sequences at basepair level

Question about VCFtools –window-pi –window-pi-step

Hi all I’m using VCFtools (v0.1.17) for estimating nucleotide diversity of my study species. I already got a VCF file which was made form mapping to a draft genome, then I used it to calculate pi value. As you can see, the output showed the bin size and variants(here, I…

Continue Reading Question about VCFtools –window-pi –window-pi-step

Mappability calculation based on 150 bp reads after mapping with bwa

Mappability calculation based on 150 bp reads after mapping with bwa 0 Hi, I am trying to apply some filters on whole exome sequencing data. Firstly I did the mapping using bwa and then I followed the proposed pipeline from GATK for Calling variants on cohorts of samples using the…

Continue Reading Mappability calculation based on 150 bp reads after mapping with bwa

Calculating Allele Balance in GATK4

Calculating Allele Balance in GATK4 0 Hi All, I know GATK3 has option to compute Allele Balance and populate ABHet and ABHom fields. I do not see this option in GATK4. I used to run this command in GATK3: java ${JAVAOPTS} -jar /usr/local/genome/GATK-3.6-0/GenomeAnalysisTK.jar -T VariantAnnotator -A AlleleBalance -I AF1.vcf.gz -R…

Continue Reading Calculating Allele Balance in GATK4

Plink v2.0 does not produce a Z-compressed file (.zst)

Plink v2.0 does not produce a Z-compressed file (.zst) 0 Good morning, I would like to convert a merged VCF in a Plink compressed format (.pgen, .psam and .pvar files), so I run plink2 –vcf MyMerged.vcf.gz –make-pgen –zst-level 3 –out MySamples It basically works, as it produces such files: ls…

Continue Reading Plink v2.0 does not produce a Z-compressed file (.zst)

Unrecognized values used for CHROM, Replacing with 0.

VCFTools error: Unrecognized values used for CHROM, Replacing with 0. 1 Hi all! I was trying to run VCFtools on .vcf output file from dDocent program (ddocent.wordpress.com/) and I get this error: Unrecognized values used for CHROM: E81_L257 –  Replacing with 0. I was wondering if anyone encountered that and…

Continue Reading Unrecognized values used for CHROM, Replacing with 0.

bcftools error in variant calling chapter

bcftools error in variant calling chapter 1 Hi, I am reading through the variant calling chapter of biostar book and faced a problem at below step: # Compute the genotypes from the alignment file. bcftools mpileup -Ovu -f $REF $BAM > genotypes.vcf # Then I get this error: Could not…

Continue Reading bcftools error in variant calling chapter

Human Exome Variant Reference

Human Exome Variant Reference 3 Hi, I want to compare the variants for my WES analysis result using Illumina/hap.py. However I cant find the reference variants for the whole exome. I know that files (vcf, bed) in GiaB are usually used as reference variants, but I don’t know which file…

Continue Reading Human Exome Variant Reference

troubleshooting benchmarking small variants: hap.py and rtg

Hi! I tried to do what other posts reported and I have a problem that I do not fully understand why … 1) I downloaded the fastq files from Garvan (ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/) with the bed file. I had to convert the bed file to hg38 (my_regions) … as I understand it…

Continue Reading troubleshooting benchmarking small variants: hap.py and rtg

The Biostar Herald for Monday, November 01, 2021

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, GenoMax, and was…

Continue Reading The Biostar Herald for Monday, November 01, 2021

Construction of the reference genome database (GCA_000001405.15_GRCh38) with snpeff

Construction of the reference genome database (GCA_000001405.15_GRCh38) with snpeff 1 Dear colleagues I used the reference genome GRCh38 version GCA_000001405.15_GRCh38 / seqs_for_alignment_pipelines.ucsc_ids downloaded from ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/ This version was used for alignment and variant calling, however, I wanted to annotate genetic variants by snpeff v5. I did not find this version…

Continue Reading Construction of the reference genome database (GCA_000001405.15_GRCh38) with snpeff

Recreating QC of 1000 Genomes project

Recreating QC of 1000 Genomes project – removing non overlapping SNPs 0 Hi everyone, I am attempting to recreate the the quality control analysis performed in the 1000 genomes project (tcag.ca/documents/tools/omni25_qcReport.pdf). I am fairly new to performing QC on a dataset, and am currently stuck on section 5.1 of the…

Continue Reading Recreating QC of 1000 Genomes project

How can I obtain genotypes from .bams of RNAseq data?

How can I obtain genotypes from .bams of RNAseq data? 0 Hi all, I am hoping to run an allele specific expression analysis on a set of RNAseq samples I have. I need to obtain the genotypes for all samples to determine heterozygosity of each variant which is needed for…

Continue Reading How can I obtain genotypes from .bams of RNAseq data?

pooled-heterozygosity calculation

pooled-heterozygosity calculation 0 As Rubin et al, one method of selection signature identification in a genome-scale study is pooled heterozygosity (Hp) calculation. “Hp = 2ƩnMAJƩnMIN/( ƩnMAJ + ƩnMIN)^2, where nMAJ and nMIN are the numbers of reads corresponding to the most and least abundant allele, respectively, the sum of theses…

Continue Reading pooled-heterozygosity calculation

Missing some predictions based on dbNSFP v4.2a

Missing some predictions based on dbNSFP v4.2a 0 Hello everybody, I used dbNSFP v4.2a database for functional prediction and variant annotation. As mentioned in the download site sites.google.com/site/jpopgen/dbNSFP, this version compiles prediction scores from several prediction algorithms (SIFT, SIFT4G, Polyphen2-HDIV, Polyphen2-HVAR. ….), and other information, including allele frequencies observed in…

Continue Reading Missing some predictions based on dbNSFP v4.2a