Category: VCF

sciclone iteration does not converge

sciclone iteration does not converge 1 Hello everyone: I have a problem when using sciclone. I extracted the relevant information needed by sciclone from the vcf file generated from the paired normal tumor data as input, and then the following problem has been encountered. I would like to ask whether…

Continue Reading sciclone iteration does not converge

vcf – Ensembl Variant Effect Predictor (VEP) issue during execution

vcf – Ensembl Variant Effect Predictor (VEP) issue during execution – Bioinformatics Stack Exchange …

Continue Reading vcf – Ensembl Variant Effect Predictor (VEP) issue during execution

Using VEP to get gnomAD frequencies

Hi all, I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases. For example, when I pass in: 10 69408929 COSM3751912 A…

Continue Reading Using VEP to get gnomAD frequencies

Using VEP to get gnomAD frequencies

Hi all, I am using Ensembl VEP (command line) to annotate a VCF I have. I am specifically looking for gnomAD allele frequencies, which is fairly straight forward to do, technically speaking. However, the data looks off in some cases. For example, when I pass in: 10 69408929 COSM3751912 A…

Continue Reading Using VEP to get gnomAD frequencies

Developing my own NGS pipeline

Developing my own NGS pipeline 1 I am a trainee bioinformatician working in a genomics lab. For learning proposes I want to develop my own NGS pipeline (from fastq file to VCF file). it would be great if someone could please pass me links where I can step by step…

Continue Reading Developing my own NGS pipeline

The provided VCF file is malformed

htsjdk.tribble.TribbleException: The provided VCF file is malformed 1 I have VCF files that I want to convert to a more readable TSV file using GATK VariantsToTable, and I also want to load in the VCF in IGV. However, when trying to do this, I get the same error for both…

Continue Reading The provided VCF file is malformed

How To Split Multiple Samples In Vcf File Generated By Gatk?

There now also is a plugin in bcftools which does the split in a single pass over the multi-sample VCF/BCF file. It does not seem to be very fast, but looks correct and there are options to do the split in custom ways. You do need to install bcftools with…

Continue Reading How To Split Multiple Samples In Vcf File Generated By Gatk?

Why does write.ped remove the first locus?

Why does write.ped remove the first locus? 0 In order to get a VCF file from genind, I am going through hierfstat function write.ped() and then with plink I convert the result to vcf. This is my code (apologies, but I cannot provide a reproducible data for this particular scenario):…

Continue Reading Why does write.ped remove the first locus?

Problem with vcf file columns

Problem with vcf file columns 0 Hello. I’m having troubles with a vcf file I just generated with Stacks. The thing is that the column of the first sample (the first individual in my vcf file) instead of having the information about the genotype, the depth and other things, it…

Continue Reading Problem with vcf file columns

VCF filters and variant intersection

VCF filters and variant intersection 0 Hi Guys, I am using joint genotyping method to generate multisample VCF file that involves variant calling, joint data aggregation and joint genotyping steps. I wonder about the filters which I need to apply to VCF. Which filters should I apply ?? I also…

Continue Reading VCF filters and variant intersection

I have genomic file. But it has different representation than usual

I have genomic file. But it has different representation than usual 0 Recently I got access genomic data by an organization. It has .bgen file, so I converted it to vcf file by qctool. But it has different SNP representation than I used to. I used to SNP representation like…

Continue Reading I have genomic file. But it has different representation than usual

Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service | BMC Bioinformatics

Since the opening of the open-ended Call in February 2020 [30], Laniakea@ReCaS has accepted ten project proposals for a total of 18 Galaxy instances operating on the ReCaS infrastructure that altogether launched almost 30 k jobs, as of March 2021 (Fig. 3). Fig. 3 Cumulative number of jobs launched by all the…

Continue Reading Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service | BMC Bioinformatics

No header in VCF file

No header in VCF file 0 Hello everyone, I am working with a specific variant calling pipeline, and the output is a VCF file missing headers. It seems there are no option to add the header in the output. Trying to add an header with picard FixVcfHeader, I get errors…

Continue Reading No header in VCF file

fuc.pyvcf Attribute Error

fuc.pyvcf Attribute Error 0 Hello, I try to extract GT information from FORMAT field in my .vcf file using fuc.pyvcf submodule. When I try to run my script: from fuc import pyvcf import pandas as pd vf = pyvcf.VcfFrame.from_file(‘P1_test.vcf’) vf.df vf.extract_format(‘GT’) I’ve got the error: Traceback (most recent call last):…

Continue Reading fuc.pyvcf Attribute Error

map files

map files 0 Hi all, I am performing imputation using IMPUTE2. the reference file is a custom genotype vcf file extracted using the b37 build. will i need to provide a different genome map file for the custom set, or can I use the 1000genome data provided by IMPUTE2? and…

Continue Reading map files

How to convert GEN or .gen format from impute.me to vcf on windows 10?

How to convert GEN or .gen format from impute.me to vcf on windows 10? 1 I tried for days to convert a gen file to vcf but it did not work. I am a beginner so i don’t know what are in vcf files and gen files or how they…

Continue Reading How to convert GEN or .gen format from impute.me to vcf on windows 10?

sciClone input vaf file?

sciClone input vaf file? 3 Dear All, Hi, I want to use sciclone on our exome sequencing data. but one thing I can’t understand that is how can I got varCount equal to 0? I have no idea about this, following data i just grep from sciclone-meta-master manuscript figure3 data…

Continue Reading sciClone input vaf file?

merge individual runs after bcftools mpileup

merge individual runs after bcftools mpileup | bcftools call 0 Hello! I am running bcftools mpileup | bcftools call for variant calling and I have no problems getting the output file when I run 1 or 2 samples. When I try all samples (~50), I get the error message: “Failed…

Continue Reading merge individual runs after bcftools mpileup

Splitting A Vcf File

Splitting A Vcf File 7 Hi i downloaded a VCF file conatins multiple genome data(Muliple sample)> i want to split the VCF file to each geome file(VCF file with 1 geome). I diidnt find any script. if you have any please share with me vcf • 18k views I know…

Continue Reading Splitting A Vcf File

How to import dosage information to plink binary files?

How to import dosage information to plink binary files? 0 Hi All, I recently converted a very large Topmed imputed VCF files into a plink format. The command I used to convert this VCF was plink1.9 –vcf ${VCF} –make-bed –out ${VCF}_binary. Additionally, I also spent a significant amount of time…

Continue Reading How to import dosage information to plink binary files?

Alelle frequency plot

Alelle frequency plot 1 Hi, I have to plot allele frequencies of two different SNP chip datasets. I have two VCF files and would like to make a scatterplot in which these 2 datasets are plotted one against each other. What is the easiest way to do this? I apologize…

Continue Reading Alelle frequency plot

Unknown genotypes (.) in VCF, but have supporting reads?

Unknown genotypes (.) in VCF, but have supporting reads? 0 In a VCF created by HaplotypeCaller, with reads from two haploid samples, I have some entries in which one sample has a mutation but the other doesn’t, where as expected I see a 1 for one sample and a 0…

Continue Reading Unknown genotypes (.) in VCF, but have supporting reads?

HaplotypeCaller calling mutations based on one read?

HaplotypeCaller calling mutations based on one read? 0 I’m using GATK HaplotypeCaller, via grenepipe, with the default options as specified by grenepipe except for -ploidy 1 as I am working with haploid yeast. I am seeing some mutations called based on one single read only if I am interpreting the…

Continue Reading HaplotypeCaller calling mutations based on one read?

GATK-Allele frequency

GATK-Allele frequency 0 Hi Guys, I am running GATK on bam file for variant calling. In the output file, I noticed that the Allele frequency is computed as 0.5 and 1.00. What may be the reason for this? Is it calculated correctly? VCF Allele GATK frequency • 27 views Login…

Continue Reading GATK-Allele frequency

Problems Imputing X Chromosome with TOPMed

I have a large dataset whose autosomes I was able to successfully phase and impute using TOPMed. I have tried doing the same with the X chromosome but keep running into issues. Before trying to impute with TOPMed, I did per-individual QC and per-marker QC, then ran checkVCF, and corrected…

Continue Reading Problems Imputing X Chromosome with TOPMed

bcftools merge GP format issues

Hello, I am trying to merge VCF files from several samples from different sequencing runs. I ran bcftools merge on the VCF files and after ten hours I got the error message “Incorrect number of FORMAT/GP values at chr_Y:216795, cannot merge. The tag is defined as Number=G, but found 2…

Continue Reading bcftools merge GP format issues

GT field in a 8 ploidy vcf

Hello, What is the meaning of lines that have only 4 GT values in an 8 ploidy VCF file? for example: 1/1/1/1:8:0,8:0:0:8:262 1/1/1/1:3:0,3:0:0:3:105 instead of 1/1/1/1/1/1/1/1:2:0,2:0:0:2:72 this is the command I used to create each one of the VCF files: freebayes -f $REF -p 8 $SORTED_BAM > $OUTPUT this is…

Continue Reading GT field in a 8 ploidy vcf

Calculate allele frequency from many VCF files in specific locus

Calculate allele frequency from many VCF files in specific locus 1 Dear all, I have 100 VCF files (100 different samples). I would like to calculate allele frequency in specific sites. In one specific locus I have three genotypes (GATK best practices workflow): rs-xxxxx: A/A occurring in 30 samples (ref…

Continue Reading Calculate allele frequency from many VCF files in specific locus

Compare genotype genome sequences at basepair level

I have recently explored various alternatives to a similar problem and came away with the following potential solutions: Solution 1 The “easiest” to do this would be to generate a VCF variant file with a SNP calling tool, then transform that variant file into a tabular file with bcftools view….

Continue Reading Compare genotype genome sequences at basepair level

Question about VCFtools –window-pi –window-pi-step

Hi all I’m using VCFtools (v0.1.17) for estimating nucleotide diversity of my study species. I already got a VCF file which was made form mapping to a draft genome, then I used it to calculate pi value. As you can see, the output showed the bin size and variants(here, I…

Continue Reading Question about VCFtools –window-pi –window-pi-step

Mappability calculation based on 150 bp reads after mapping with bwa

Mappability calculation based on 150 bp reads after mapping with bwa 0 Hi, I am trying to apply some filters on whole exome sequencing data. Firstly I did the mapping using bwa and then I followed the proposed pipeline from GATK for Calling variants on cohorts of samples using the…

Continue Reading Mappability calculation based on 150 bp reads after mapping with bwa

Calculating Allele Balance in GATK4

Calculating Allele Balance in GATK4 0 Hi All, I know GATK3 has option to compute Allele Balance and populate ABHet and ABHom fields. I do not see this option in GATK4. I used to run this command in GATK3: java ${JAVAOPTS} -jar /usr/local/genome/GATK-3.6-0/GenomeAnalysisTK.jar -T VariantAnnotator -A AlleleBalance -I AF1.vcf.gz -R…

Continue Reading Calculating Allele Balance in GATK4

“initial epsilon zero or 1 locus 6912”

bayenv2: “initial epsilon zero or 1 locus 6912” 1 Hi everyone, currently I’m running bayenv2 on my data. First,I’ve converted my vcffile into bayenvformat but then, when I try to generate the covariance matrix, I get the message initial epsilon zero or 1 locus 6912. I’ve read that the epsilon…

Continue Reading “initial epsilon zero or 1 locus 6912”

Plink v2.0 does not produce a Z-compressed file (.zst)

Plink v2.0 does not produce a Z-compressed file (.zst) 0 Good morning, I would like to convert a merged VCF in a Plink compressed format (.pgen, .psam and .pvar files), so I run plink2 –vcf MyMerged.vcf.gz –make-pgen –zst-level 3 –out MySamples It basically works, as it produces such files: ls…

Continue Reading Plink v2.0 does not produce a Z-compressed file (.zst)

Unrecognized values used for CHROM, Replacing with 0.

VCFTools error: Unrecognized values used for CHROM, Replacing with 0. 1 Hi all! I was trying to run VCFtools on .vcf output file from dDocent program (ddocent.wordpress.com/) and I get this error: Unrecognized values used for CHROM: E81_L257 –  Replacing with 0. I was wondering if anyone encountered that and…

Continue Reading Unrecognized values used for CHROM, Replacing with 0.

bcftools error in variant calling chapter

bcftools error in variant calling chapter 1 Hi, I am reading through the variant calling chapter of biostar book and faced a problem at below step: # Compute the genotypes from the alignment file. bcftools mpileup -Ovu -f $REF $BAM > genotypes.vcf # Then I get this error: Could not…

Continue Reading bcftools error in variant calling chapter

Human Exome Variant Reference

Human Exome Variant Reference 3 Hi, I want to compare the variants for my WES analysis result using Illumina/hap.py. However I cant find the reference variants for the whole exome. I know that files (vcf, bed) in GiaB are usually used as reference variants, but I don’t know which file…

Continue Reading Human Exome Variant Reference

troubleshooting benchmarking small variants: hap.py and rtg

Hi! I tried to do what other posts reported and I have a problem that I do not fully understand why … 1) I downloaded the fastq files from Garvan (ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/) with the bed file. I had to convert the bed file to hg38 (my_regions) … as I understand it…

Continue Reading troubleshooting benchmarking small variants: hap.py and rtg

The Biostar Herald for Monday, November 01, 2021

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, GenoMax, and was…

Continue Reading The Biostar Herald for Monday, November 01, 2021

Construction of the reference genome database (GCA_000001405.15_GRCh38) with snpeff

Construction of the reference genome database (GCA_000001405.15_GRCh38) with snpeff 1 Dear colleagues I used the reference genome GRCh38 version GCA_000001405.15_GRCh38 / seqs_for_alignment_pipelines.ucsc_ids downloaded from ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/ This version was used for alignment and variant calling, however, I wanted to annotate genetic variants by snpeff v5. I did not find this version…

Continue Reading Construction of the reference genome database (GCA_000001405.15_GRCh38) with snpeff

Recreating QC of 1000 Genomes project

Recreating QC of 1000 Genomes project – removing non overlapping SNPs 0 Hi everyone, I am attempting to recreate the the quality control analysis performed in the 1000 genomes project (tcag.ca/documents/tools/omni25_qcReport.pdf). I am fairly new to performing QC on a dataset, and am currently stuck on section 5.1 of the…

Continue Reading Recreating QC of 1000 Genomes project

How can I obtain genotypes from .bams of RNAseq data?

How can I obtain genotypes from .bams of RNAseq data? 0 Hi all, I am hoping to run an allele specific expression analysis on a set of RNAseq samples I have. I need to obtain the genotypes for all samples to determine heterozygosity of each variant which is needed for…

Continue Reading How can I obtain genotypes from .bams of RNAseq data?

pooled-heterozygosity calculation

pooled-heterozygosity calculation 0 As Rubin et al, one method of selection signature identification in a genome-scale study is pooled heterozygosity (Hp) calculation. “Hp = 2ƩnMAJƩnMIN/( ƩnMAJ + ƩnMIN)^2, where nMAJ and nMIN are the numbers of reads corresponding to the most and least abundant allele, respectively, the sum of theses…

Continue Reading pooled-heterozygosity calculation

Missing some predictions based on dbNSFP v4.2a

Missing some predictions based on dbNSFP v4.2a 0 Hello everybody, I used dbNSFP v4.2a database for functional prediction and variant annotation. As mentioned in the download site sites.google.com/site/jpopgen/dbNSFP, this version compiles prediction scores from several prediction algorithms (SIFT, SIFT4G, Polyphen2-HDIV, Polyphen2-HVAR. ….), and other information, including allele frequencies observed in…

Continue Reading Missing some predictions based on dbNSFP v4.2a

Checking chromosome builds for genotyping data

Checking chromosome builds for genotyping data 0 Hi, I have several studies worth of data (In both PLINK and vcf format), and I was wondering if anyone knew of an online tool which I could use to check my chromosome build i.e GRCh37 vs GRCh38. (I thought I used one…

Continue Reading Checking chromosome builds for genotyping data

Prepare Allele Frequency input file for Sweepfinder2

Prepare Allele Frequency input file for Sweepfinder2 1 Hi, there. I have some problems about converting vcf to the inputfile of Sweepfinder2. According to the Manual of Sweepfinder2, the forth columns of the input file is the indicator as to whether the site has been polarized (i.e., whether it is…

Continue Reading Prepare Allele Frequency input file for Sweepfinder2

How to convert vcf file to frequency file for sweepfinder2

How to convert vcf file to frequency file for sweepfinder2 1 Hi, I am analysing genome wide scan for selective sweeps using SF2, but some problems blocked me. I appreciate if you could help me with the file conversion from VCF to desired allele frequency file. I have converted VCF…

Continue Reading How to convert vcf file to frequency file for sweepfinder2

Extract columns from a vcf file using identifiers from a second file

Extract columns from a vcf file using identifiers from a second file 4 Dears people Maybe I am too naive but I am pretty new to bioinformatics. I have two files. One is a normal vcf file with column1 having the CHROMOSOME information and column2 the POSITION information. My second…

Continue Reading Extract columns from a vcf file using identifiers from a second file

How to merge different VCF files

How to merge different VCF files 0 Hi everyone, I have 2 different VCF files (v4.3) : one containing SNP and Indels and the other one containing CNVs. The latter has only the mandatory fields (CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO) while the former has also the field…

Continue Reading How to merge different VCF files

Best way to fill VCF with ancestral allele info AA for each SNP hg19

Best way to fill VCF with ancestral allele info AA for each SNP hg19 1 Hi all, I am currently working with some full-genome human sequence data (mapped to hg19) and created VCF (called only SNPS) files from my BAM files with gatk. For each SNP in my VCF I…

Continue Reading Best way to fill VCF with ancestral allele info AA for each SNP hg19

Bioconductor – Bioconductor 3.14 Released

Home Bioconductor 3.14 Released October 27, 2021 Bioconductors: We are pleased to announce Bioconductor 3.14, consisting of 2083 software packages, 408 experiment data packages, 904 annotation packages, 29 workflows and 8 books. There are 89 new software packages, 13 new data experiment packages, 10 new annotation packages, 1 new workflow,…

Continue Reading Bioconductor – Bioconductor 3.14 Released

Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file

Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file 1 I am trying to calculate Pooled Heterozygosity (hp) by identifying nMAJ and nMIN from vcf file with a sliding window 150kb. I am confused after reading papers where they calculated using formula…

Continue Reading Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file

Error:Could not open individual file

Error:Could not open individual file 0 When I ran the following command to Fst calculation vcftools –gzvcf All_samples.vcf.gz –weir-fst-pop Pop1_list.txt –weir-fst-pop Pop2_list.txt –fst-window-size 50000 –fst-window-step 25000 –out pop1_pop2_50_25.windowed.weir.txt I get this error Error:Could not open individual file: Pop1_list.txt format of Pop1_list.txt is one individual per line like ind1 ind2 ind3…

Continue Reading Error:Could not open individual file

gatk GetPileupSummaries and CalculateContamination result in NaN on mouse data

Hello! I ran gatk toolchain including CalculateContamination in galaxy on human exome sequencing data, and it worked fine. However when i try feeding it with murine data (and murine reference files), CalculateContamination gives me this contamination table: sample contamination error mouse1_tumor NaN 1.0 And being fed with this result, FilterMutectCalls…

Continue Reading gatk GetPileupSummaries and CalculateContamination result in NaN on mouse data

How to parallelize bcftools mpileup with GNU parallel?

I think that this would be a solution: I started with a file that contains the scaffold name and lengths: JH739887 30495534 JH739888 29527584 JH739889 22321128 […] I then added a column with “1” in it to make it the “start” position. Since the scaffolds are not placed on the…

Continue Reading How to parallelize bcftools mpileup with GNU parallel?

“Given ref” field is empty when a ref. allele was in VCF input

VEP: “Given ref” field is empty when a ref. allele was in VCF input 0 Hi there, I’m running VEP using the following command: ref=”GRCh38.primary_assembly.genome.fa” vep=”/opt/vep_ensembl/ensembl-vep/vep” for ea in *Somatic.hc.vcf do $vep -i $ea -o vep/”$(echo $ea | sed s/.vcf//)”_VEP.txt –cache –dir_cache “/home/shared/vep_cache/” –assembly GRCh38 –merged –fasta $ref –hgvs –hgvsg…

Continue Reading “Given ref” field is empty when a ref. allele was in VCF input

How do you convert vcf file to a genotype file useable by eigenstrat

How do you convert vcf file to a genotype file useable by eigenstrat 0 I am looking to use an R based eigenstrat function that takes ” a m*n matrix, in which the element in the ith row and the jth column represents the genotype of the jth subject at…

Continue Reading How do you convert vcf file to a genotype file useable by eigenstrat

Haplotype frequency calculation from .vcf files

Haplotype frequency calculation from .vcf files 1 Hi! I’m relatively new to bioinformatics and I’ve been working on haplotyping malaria based on the sequencing of two gene markers (msp1 and msp2). However, I have not found a good software/pipeline that can help me to calculate the haplotype frequency per sample…

Continue Reading Haplotype frequency calculation from .vcf files

Index genome not working in Tracy

I’m trying to follow the variant calling guide for Tracy. www.gear-genomics.com/docs/tracy/cli/#variant-calling I have a viral genome just as a fasta file, and when I try to call variants like this: tracy decompose -v -a cmv -r CMVrefGenome.fasta -o oututfile inputfile.ab1 It tells me the genome needs to be shorter than…

Continue Reading Index genome not working in Tracy

Removing indels +/- a buffer area? How? : bioinformatics

Hey everyone. Hopefully an easy question but my Googling and looking for papers hasn’t really come up with much. I am using a software (IBDMix) to analyze some Neanderthal DNA vs. Modern humans using the new HG38 1000 Genomes data from earlier this year. The method in the IBDMix paper…

Continue Reading Removing indels +/- a buffer area? How? : bioinformatics

vcftools not ouputting log file when run from perl

I am running 325 vcftools commands to generate Fst values, which obviously needs to be automated. An example: vcftools –vcf big.vcf –weir-fst-pop pop_lists/pop1.txt –weir-fst-pop pop_lists/pop2.txt –out weir_fst_results/pop1_vs_pop2 and when I run this job, it works fine when I run it one by one by the command line, i.e. there are…

Continue Reading vcftools not ouputting log file when run from perl

How to annotate SNVs in a BAC sequenced by NGS

Hello, I’m trying to annotate variations in NGS data from bacterial artificial chromosomes with respect to the reference sequence. To do this i build a map of the BAC (including vector) and map the NGS reads to this BAC map. I also use a variant caller to find any differences…

Continue Reading How to annotate SNVs in a BAC sequenced by NGS

How to convert a bcf file to vcf with bcftools?

How to convert a bcf file to vcf with bcftools? 1 I’ve been following the guide on the Tracy website for looking at variants in some Sanger sequences I have: www.gear-genomics.com/docs/tracy/cli/#variant-calling I’ve now generated the bcf files, and it says I can convert these to vcf using bcftools. How do…

Continue Reading How to convert a bcf file to vcf with bcftools?

samples cannot be empty when i run gatk-package-4.1.4.1-local.jar HaplotypeCaller

java.lang.IllegalArgumentException: samples cannot be empty when i run gatk-package-4.1.4.1-local.jar HaplotypeCaller 3 hello, when i run gatk, the error always occur, like this “java.lang.IllegalArgumentException: samples cannot be empty” is there mistake in my input file? thank you for your help ! my bam file as following: HWI-EAS418:3:37:1070:1462 83 chr20 46689301 255…

Continue Reading samples cannot be empty when i run gatk-package-4.1.4.1-local.jar HaplotypeCaller

export protein for 3D-View from own data

export protein for 3D-View from own data 0 Hello, I have a 30xWGS. I own the CRAM, FastQ and the VCF file. Now I would like to export individual proteins from this data and view them in a 3D viewer. Is this possible or which tool do I need for…

Continue Reading export protein for 3D-View from own data

How to filter a VCF file with a list of CHR or contig IDs?

I need to subset/filter a SNP vcf file by a long list of non-sequential contig IDs, which appear in the CHR column. My VCF file contains 13,971 contigs currently, and I want to retain a specific set of 7,748 contigs and everything associated with those contigs (headers, all variants and…

Continue Reading How to filter a VCF file with a list of CHR or contig IDs?

how to manage reference differences

VCF from GRCH37 to GRCH38: how to manage reference differences 0 Imagine you have a VCF with variants annotated for the GRCH37 assembly. Then, you want to convert these variants to the GRCH38 genome. Of course, new coordinates can be obtained using liftover and ref/alt will remain the same where…

Continue Reading how to manage reference differences

Difference between . and ./. for missing genotype in VCF

Difference between . and ./. for missing genotype in VCF 1 What is the difference between . and ./. for a missing genotype in a VCF file? For example in one VCF record I have these two sample genotypes. GT:AD:DP:GQ:MMQ:PGT:PID:PL .:0,0:.:.:.:.:.:. ./.:0,0:0:.:.:.:.:0,0,0 There is also is a difference in which…

Continue Reading Difference between . and ./. for missing genotype in VCF

Speeding up Eagle phasing and imputation

Speeding up Eagle phasing and imputation 0 Hi, I am doing imputation using Eagle and it is quite slow. The program gives me a Warning; > WARNING: –vcfRef does not end in ‘.bcf’; BCF input is fastest The command line I am using is; eagle –vcfRef ref.vcf.gz –vcfTarget target.vcf.gz –geneticMapFile=genetic_map_1cMperMb.txt…

Continue Reading Speeding up Eagle phasing and imputation

gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz)

gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz) 0 I need the known indels vcf to run gatk BaseRecalibrator. So I need hg19 (not the b37) version of the known indels: Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz However this file is no longer available at: ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz the broad institute documentation says the ftp site was disabled…

Continue Reading gatk legacy bundles (where to get Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz)

Michigan imputation server input preparation

Michigan imputation server input preparation 1 I’m running the imputation on some WGS VCF files, and I’ve never done it before, so I started with Chr 22 (GRCh37). I’m just worried because I started with 594949 SNPs in the file, and after the prep (using the Will Rayner perl script…

Continue Reading Michigan imputation server input preparation

Filtering with vcftools by QUAL, Does not produce an output file

Filtering with vcftools by QUAL, Does not produce an output file 1 Hello, I want to filter my VCF file by QUAL >10 . AND GET AN OUTPUT VCF FILE so I could continue with my pipeline When I try to use this command there are no errors but no…

Continue Reading Filtering with vcftools by QUAL, Does not produce an output file

ABBA BABA test for two population

ABBA BABA test for two population 0 I generated a VCF file including 20 samples belonging to two species (A and B), variant calling was done using GATK best practice pipeline (only autosomes chromosomes). Now I want to detect the introgressed regions from A sp. to B sp. I searched…

Continue Reading ABBA BABA test for two population

filter barplot large VCF

filter barplot large VCF 0 hello everyone, I am doing in Rstudio the large VCF sorting DP by loci using dp = extract.gt (vcf, element=”DP”, as.numeric = TRUE) dploc = apply (dp, 1, sum) barplot (sort (dploc), las = 3, main = “DP”, col = 1:12). How can I save…

Continue Reading filter barplot large VCF

Get Allele specific copy number regions from WGS data using ASCAT?

Get Allele specific copy number regions from WGS data using ASCAT? 0 Hi, I have some paired samples(normal vs. tumour) and want to do allele specific CNA analysis. can you help me to check if I use the right pipeline? I use bcftools mpileup and call to call snp from…

Continue Reading Get Allele specific copy number regions from WGS data using ASCAT?

Get Data ready for rqtl package

Get Data ready for rqtl package 0 I have a vcf file from gbs data. now I want to perform qtl analysis on this. I found out about the rqtl package in R. but it take csv file as an input or some cross format. I just want to know…

Continue Reading Get Data ready for rqtl package

Set ancestral alleles to upper case in vcf file

Set ancestral alleles to upper case in vcf file 2 I am trying to set my reference allele as the ancestral allele in 1000genomes vcf files. I can do this using the –derived option in vcftools. However most of the ancestral alleles are in lowercase so vcftools is not able…

Continue Reading Set ancestral alleles to upper case in vcf file

annotating vcf with variant type and variant effect, and most harmful effect

annotating vcf with variant type and variant effect, and most harmful effect 0 Hello, I have a VCF with ~6000 variants. The build is GRCh37. I want to annotate each variant with its type (substitution, deletion, inversion) and its effect (missense, silent, intergenic). If there are competing or multiple effects,…

Continue Reading annotating vcf with variant type and variant effect, and most harmful effect

Sample not found in BAM header

GATK mutech2: Sample not found in BAM header 0 I get the following error when running mutech2, any idea what the reason is: A USER ERROR has occurred: Bad input: Sample N-PANCNGS-006 is not in BAM header: [] gatk Mutect2 –native-pair-hmm-threads 30 -R ~/genomes/BWA/Homo_sapiens.GRCh38.dna.primary_assembly.fa -I T-PANCNGS-006.bam -I N-PANCNGS-006.bam -normal N-PANCNGS-006…

Continue Reading Sample not found in BAM header

filtering large vcf. GQ and DP

filtering large vcf. GQ and DP 1 hi everyone, i’m new in bioinformatics, i have a big VCF file, i get dp from there using dp = extract.gt (vcf, element=”DP”, as.numeric = TRUE) dpp = apply (dp, 2, sum) and I build a plot barplot (sort (dpp), las = 3,…

Continue Reading filtering large vcf. GQ and DP

Calling CNV using bcftools

Calling CNV using bcftools 1 Hi, I used VarScan2 to call variants from whole exome sequencing data. The vcf file output looks like this: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR chr1 972251 . A AC . PASS DP=173;SOMATIC;SS=2;SSC=75;GPV=1;SPV=2.5611e-08 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:65:63:0:0%:36,27,0,0 0/1:.:108:74:33:30.84%:46,28,18,15 I ran this command…

Continue Reading Calling CNV using bcftools

Calculating sensitivity and specificity between different NGS pipelines

Calculating sensitivity and specificity between different NGS pipelines 0 Hi all, We have implemented the last software available of Illumina in our Myseq sequencer and I am comparing the data generated between the previous version and the new version changing some parameters. I have one VCF (actually this is 16…

Continue Reading Calculating sensitivity and specificity between different NGS pipelines

Extract the DP from VCF file along with the chromosome postion and alteration

Extract the DP from VCF file along with the chromosome postion and alteration 1 Hi, I would like to extract the DP from my VCF file, along with the chromosome position and alteration Example VCF file CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRRS1 chr1 8004518 . A…

Continue Reading Extract the DP from VCF file along with the chromosome postion and alteration

Children’s Hospital of Philadelphia hiring Bioinformatics Scientist II – DBHI in Philadelphia, Pennsylvania, United States

Location: LOC_HOME-Home/Remote Office Location Req ID: 150400 Shift: Days Employment Status: Regular – Full Time Job Summary The Children’s Hospital of Philadelphia (CHOP) Research Institute and its Department of Biomedical and Health Informatics (DBHi) are seeking a bioinformatics scientist to help advance an enterprise-level data and informatics platform called “Arcus”….

Continue Reading Children’s Hospital of Philadelphia hiring Bioinformatics Scientist II – DBHI in Philadelphia, Pennsylvania, United States

Base recalibration in normal vs. tumor somatic variant calling in WXS data?

Base recalibration in normal vs. tumor somatic variant calling in WXS data? 0 Hi there, I have a tumor and a normal BAM file and am preparing to run base recalibration. I was planning on calling variants on the normal and using that, in addition to dbSNP, as input for…

Continue Reading Base recalibration in normal vs. tumor somatic variant calling in WXS data?

Oxford Nanopore Variant Calling Pipeline Calls Very Few of Variants of NA12878

Hello everyone, First, I am so sorry for this long and very amateur question. I am trying to build a pipeline for SNP calling for Oxford Nanopore MinION based long reads. I need to test the pipeline but apparently the number of test data is really low. I only have…

Continue Reading Oxford Nanopore Variant Calling Pipeline Calls Very Few of Variants of NA12878

inner merge vcfs in one step?

inner merge vcfs in one step? 1 I have 3 vcfs with single sample each and different variants. I want one vcf with 3 samples and only variants that are present in all 3 vcfs. I think I can do this a long way with bcftools isec –nfiles 3 a.vcf…

Continue Reading inner merge vcfs in one step?

No VCF records found in the specified interval

Beagle 5 error: No VCF records found in the specified interval 0 Hi, I am running into an issue while doing Imputation with Beagle 5 and not sure what is causing the error. I have vcf files converted from PLINK by the following command ./plink –bfile qcd_in–chr 20 –recode vcf-iid…

Continue Reading No VCF records found in the specified interval

vcf file indiviual column interpretation

vcf file indiviual column interpretation 0 QUAL 2460.98 FILTER PASS INFO ASP;CAF=0.9778,0.02216;COMMON=1; FORMAT GT:AD:AF:DP:F1R2:F2R1:GQ:PL sample1 0/0:2,0:.:2:.:.:6:0,6,57 sample2 0/0:30,0:.:30:.:.:90:0,90,1012 sample3 0/0:24,0:.:24:.:.:72:0,72,779 sample4 0/0:29,0:.:29:.:.:86:0,86,976 I would like to know what is the each sample explaining. The result belong to 1 rsid reference to with rsid file interpretation vcf • 34 views Read…

Continue Reading vcf file indiviual column interpretation

Error with Treemix input file

Error with Treemix input file 0 Hi everyone. I’m trying to run Treemix, I just installed it and a new conda environment (cause I read that sometimes it has problems with other programs’ dependencies). I had a vcf filtered file and using Stacks I generated the treemix input file. populations…

Continue Reading Error with Treemix input file

GT and GL fields in VCF file

GT and GL fields in VCF file 2 Hi! This might sound completely stupid, lazy and silliest question. But please help in understanding these acronyms of VCF file. I did had a look at the format pdf of VCF file and I got further confused. So, I have my data…

Continue Reading GT and GL fields in VCF file

VEP allele frequency from gnomAD genomes

VEP allele frequency from gnomAD genomes 1 Hi, Biostars community. According to VEP documentation, gnomAD genomes database could be used with –custom option. Example from VEP dosc: ./vep -i examples/homo_sapiens_GRCh38.vcf –cache –custom gnomad.genomes.r2.0.1.sites.GRCh38.noVEP.vcf.gz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH But there is no gnomAD genomes file for all chromosomes on ensembl’s ftp source: ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/ Only data…

Continue Reading VEP allele frequency from gnomAD genomes

EBI European Nucleotide Archive (ERA) aspera access broken

EBI European Nucleotide Archive (ERA) aspera access broken 0 I’m trying to download FASTQ files from the ENA via aspera. FTP still works. ascp -QT -P33001 -l 200m -i /home/me/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/SRR663/009/SRR6639099/SRR6639099_1.fastq.gz ./ As of sometime last week I am constantly getting: Session Stop (Error: failed to authenticate) ascp: failed to…

Continue Reading EBI European Nucleotide Archive (ERA) aspera access broken

Sort a sub column within a column while keeping the feature (LINUX)

I have a vcf file with these column headers: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BS_25YES2E3 BS_G5B6AD28 BS_QCGPE1ZX A sample feature within that vcf file chr1 10450 . T C 27.94 VQSRTrancheSNP99.90to100.00+ AC=1;AF=0.167;AN=6;BaseQRankSum=-1.676e+00;ClippingRankSum=0.789;DP=102;ExcessHet=4.7712;FS=4.868;MLEAC=1;MLEAF=0.167;MQ=34.67;MQRankSum=-1.084e+00;PG=0,0,0;QD=1.55;ReadPosRankSum=-2.169e+00;SOR=0.707;VQSLOD=-1.050e+01;culprit=MQ;ANN=C|upstream_gene_variant|MODIFIER|**DDX11L1**|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||1560|1||SNV|HGNC|HGNC:37102||||chr1:g.10450T>C,C|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||1419|1||SNV|HGNC|HGNC:37102|YES|||chr1:g.10450T>C,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene|||||||||||3954|-1||SNV|HGNC|HGNC:38034|YES|||chr1:g.10450T>C GT:AD:DP:FT:GQ:JL:JP:PL:PP 0/0:28,0:28:lowGQ:0:1:1:0,0,663:0,0,666 0/1:13,5:18:PASS:35:1:1:34,0,342:35,0,345 0/0:44,0:44:lowGQ:0:1:1:0,0,802:0,0,805 The portion in bold is what I want (DDX11L1). I…

Continue Reading Sort a sub column within a column while keeping the feature (LINUX)

About Foreign Fields in VCF (4.3)

About Foreign Fields in VCF (4.3) 0 I was looking at the VCF format 4.3 here – page 7 and 4.2 page 4. More or less they are similar, stating 8 mandatory and fixed, tab delimited columns. I was bit lost about the term ‘fixed’, I am wondering, say for…

Continue Reading About Foreign Fields in VCF (4.3)

Counting the number of SNPs (VCF) for each genomic coordinates (BED)

Counting the number of SNPs (VCF) for each genomic coordinates (BED) 1 I want to count the number of recorded variations for each genomic coordinates of a .bed file from the corresponding .vcf file. I guess it should be solved by vcftools, but I could not find any suitable option…

Continue Reading Counting the number of SNPs (VCF) for each genomic coordinates (BED)

when choosing the best K for ADMIXTURE should I use delta K method or cross validation error

when choosing the best K for ADMIXTURE should I use delta K method or cross validation error 0 I ran ADMIXTURE with my VCF data The ADMIXTURE manual says that I can find the appropriate K by checking the cross validation error. what I read about cross validation error is…

Continue Reading when choosing the best K for ADMIXTURE should I use delta K method or cross validation error

How we can put PASS filters on all variants in your VCF file and create a separate file?

By default, we trust PASS filters on VCF files. If a VCF is not filtered or PASS filters are not present, we use a very basic universal filter that relies on QUAL (Quality) and DP (Depth), which is better than nothing. If you want to run all of your data…

Continue Reading How we can put PASS filters on all variants in your VCF file and create a separate file?

Technical Support Specialist – BioInformatics – Invitae

POSITION SUMMARYThe Technical Support Specialist provides first level technical support on Invitae Somatic Oncology products from the Invitae office in Boulder, CO. This individual will escalate customer inquiries effectively, collect customer feedback and share this with internal teams to improve products. This individual will assist in coordinating activities for special…

Continue Reading Technical Support Specialist – BioInformatics – Invitae

Technical Support Specialist – BioInformatics – Invitae (Formerly ArcherDx)

POSITION SUMMARYThe Technical Support Specialist provides first level technical support on Invitae Somatic Oncology products from the Invitae office in Boulder, CO. This individual will escalate customer inquiries effectively, collect customer feedback and share this with internal teams to improve products. This individual will assist in coordinating activities for special…

Continue Reading Technical Support Specialist – BioInformatics – Invitae (Formerly ArcherDx)