Tag: VCF.gz

How to modify VCF file?

Hi community, I have a question: the SNP position in vcf file is from GRCh37/hg19, I need to change the position to GRCh38. So, I used UCSC liftover to replace the hg19 pos by GRCh38 pos and deleted some SNPs, then sorted the pos and saved to a new vcf…

Continue Reading How to modify VCF file?

Bcftools equivalent of vcftools conversion to ped & map

Bcftools equivalent of vcftools conversion to ped & map 1 I am converting a VCF to ped & map thus in vcftools vcftools –gzvcf ZZZZZTYT.vcf.gz –plink –out ZZZZZTYT which works fine. However, I have been searching and searching, can bcftools do the same with a bcf? bcftools • 103 views…

Continue Reading Bcftools equivalent of vcftools conversion to ped & map

difficulty filtering vcf file with vcftools

difficulty filtering vcf file with vcftools 1 I had a large VCF file named “common_known_variants.vcf ” which contains all known human variants downloaded from ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-common_all.vcf.gz -O common_known_variants.vcf.gz I’m trying to extract the known variants from only chromosomes 1,2,3,9,22, and X and write them in a new vcf file with the…

Continue Reading difficulty filtering vcf file with vcftools

wrong number of fields ?

Error occurence after merging files with bcftools: wrong number of fields ? 1 I have multiple vcf of CASES and CONTROLS variations annotated by VEP, SNPEff, SnpSift. first pair vcf -> only variations| CASES and CONTROLS second pair vcf -> variations + SnpEff | CASES and CONTROLS third pair vcf->…

Continue Reading wrong number of fields ?

split gtex genotype data by chromosomes.

Hello, I used and edited the command line to use –vcf to import vcf file. I used these commands: for chr in $(seq 1 22); do      plink –vcf /dbGAP/GTEx_Analysis_2017-06-05_v8_WholeExomeSeq_979Indiv_VEP_annot.vcf.gz            –chr $chr            –recode            –out…

Continue Reading split gtex genotype data by chromosomes.

Understanding the number of intersection in bedtools jaccard

Understanding the number of intersection in bedtools jaccard 1 Hello, I am using bedtools jaccard to compare two vcf files, as: bedtools jaccard -a ancestors.calls.norm.snp.vcf.gz -b GC078310.calls.norm.snp.vcf.gz intersection union-intersection jaccard n_intersections 1606899 1806667 0.889427 1536700 What I do not get is why n_intersections is equal to 1536700. Especially, the difference…

Continue Reading Understanding the number of intersection in bedtools jaccard

bcftools merged vcf file assigns all variants to one sample

bcftools merged vcf file assigns all variants to one sample 0 I’ve made one vcf file for each of three samples. I then combined them using bcftools, like so: # Make a list of vcf files to merge cat “${OUT}/results/variants/vcf_list” /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/3a7a-10.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/MF3.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/R507H-FB_S355_L001.vcf.gz Then merge the list: bcftools merge -l…

Continue Reading bcftools merged vcf file assigns all variants to one sample

variant – Error running gatk HaplotypeCaller with allele specific annotations

I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…

Continue Reading variant – Error running gatk HaplotypeCaller with allele specific annotations

why my VCF file generated with manta is missing genotype information

Hi, everybody, I am pretty new to coding and bioinformatics. I am using Manta as a tool to infer somatic structural variants (SVs) from a paired tumor/normal sample call. However, my somaticSV.vcf.gz file does not contain information about the genotype nor the genotype quality (there is a dot instead of…

Continue Reading why my VCF file generated with manta is missing genotype information

What file type does “PLINK –block” accept as input?

What file type does “PLINK –block” accept as input? 0 Hi, I have set of SNPs (distributed over all the chromosomes) and I am trying to do some haplotype block estimation to identify whether some of them are part of the same haplotype block, etc. It seems like “PLINK –blocks”…

Continue Reading What file type does “PLINK –block” accept as input?

help with CrossMap

help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…

Continue Reading help with CrossMap

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD,  Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…

Continue Reading Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

bcftools merge of over 9000+ vcf files

Hi all, I have around 9000+ vcf files that I’m trying to merge using bcftools merge. They are all located in their own folder so essentially I have a folder containing 9000+ separate folders, each containing one vcf.gz file. I have tried out the following code via this tutorial bcftools…

Continue Reading bcftools merge of over 9000+ vcf files

Blast command line pipeline not working

Blast command line pipeline not working 0 Hello, I am running now a local blast pipeline using MacOs. The goal here is to take interval of the 5 best hits and then extract the SNP variants from multiple vcf.gz files. But I am facing an error which I cannot solve….

Continue Reading Blast command line pipeline not working

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

gatk VariantRecalibrator positional argument error

I’m trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error “Illegal argument value: Positional arguments were provided”. But I don’t know what this means, or how to correct it! Here’s my call: gatk VariantRecalibrator -R “/Volumes/Seagate Expansion Drive/refs/hg38/gatk download/Homo_sapiens_assembly38.fasta” -V “$OUT”/results/variants/”$SN”.norm.vcf.gz -AS –resource hapmap,known=false,training=true,truth=true,prior=15.0: “/Volumes/Seagate…

Continue Reading gatk VariantRecalibrator positional argument error

How to call LOH with FreeC

How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…

Continue Reading How to call LOH with FreeC

How to merge vcf files

How to merge vcf files 3 Hi, I have 90 VCF files which I am looking to merge into one VCF file. I am trying to use VCFtools to merge these files. For that I am following the below process but while using vcf-merge command is not able to merge…

Continue Reading How to merge vcf files

GitHub – AI-sandbox/gnomix

This repository includes a python implemenation of Gnomix, a fast and accurate local ancestry method. Gnomix can be used in two ways: training a model from scratch using reference training data or loading a pre-trained Gnomix model (see Pre-Trained Models below) In both cases the models are used to infer…

Continue Reading GitHub – AI-sandbox/gnomix

Filter criteria for variants based on GBS data

Filter criteria for variants based on GBS data 0 Are there recommended filter criteria for variants based on GBS data? I currently use this filter formula that is used in bcbio for WGS based variants soft-filtering bcftools –soft-filter GATKCutoffSNP -e TYPE=”snp” && (MQRankSum < -12.5 || ReadPosRankSum < -8.0 ||…

Continue Reading Filter criteria for variants based on GBS data

Relatedness vs relatedness2 from vcftools give different results

Relatedness vs relatedness2 from vcftools give different results 1 Hello All How to you think about this relatedness results? When I use relatedness2 in vcftools I got this: vcftools –gzvcf p123.vcf.gz –relatedness2 INDV1 INDV2 N_AaAa N_AAaa N1_Aa N2_Aa RELATEDNESS_PHI p1 p1 47388 0 47388 47388 0.5 p1 p2 28084 0…

Continue Reading Relatedness vs relatedness2 from vcftools give different results

Best Omic file compressor?

Best Omic file compressor? 1 Our team has been having storage space issues; we predicted that we will not have enough available memory to store the files generated by our pipelines. Standard file compressors (gzip, bzip2, 7zip) weren’t cutting it and I started experimenting with file-specific compressors. This is where…

Continue Reading Best Omic file compressor?

IMPUTE2 -merge_ref_panels

IMPUTE2 -merge_ref_panels 0 Hi all, i am trying to use IMPUTE2 with 2 reference panels to be merged. i am applying the code as per the example provided on IMPUTE2 page. but somehow the merged reference panel doesnt get produced (the REF file as per the example). any ideas? for…

Continue Reading IMPUTE2 -merge_ref_panels

Interpreting output of BCFtools RoH

Interpreting output of BCFtools RoH 0 Hello! I am using BCFtools RoH for the first time, and I am having some trouble understanding its output file. The input is a gvcf file with genotype calls for one sample only, and I want to infer where there might be autozygous tracts….

Continue Reading Interpreting output of BCFtools RoH

Ensembl VEP Plugin not working

Ensembl VEP Plugin not working 0 Hi all… I’m using SubsetVCF plugin to extract some fields from my VCF file when using VEP annotator. I’ve noticed that using VCFv4.1, the plugin works fine but not with VCFv4.2. Are there any limitations to the VCF version that impacts the plugin? Does…

Continue Reading Ensembl VEP Plugin not working

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

tabix for ID column

tabix for ID column 4 Hello, I’m looking for something similar to tabix. But instead of looking for informations within a given region, I would like to use the values in the ID column for quickly lookup. So for example I would like to take the compressed dbSNP file, index…

Continue Reading tabix for ID column

The result of plink –freq is filled with NA

The result of plink –freq is filled with NA 0 I downloaded the vcf file. Then I used plink to convert it to a bed file and calculated the array frequency. However, the result of plink –freq was filled with NA. Can anyone give us an opinion? command ① ./plink –vcf…

Continue Reading The result of plink –freq is filled with NA

Annotate Structural variants with population specific allele frequency values

Annotate Structural variants with population specific allele frequency values 0 Hi, Has anyone tried filtering structural variants based on pupulation specific allele frequency (AF) values (for example gnomAD-SV or phase 3 1000 genome SV)? I have a set of SVs that I detected using a multipronged approach. For prioritising variants,…

Continue Reading Annotate Structural variants with population specific allele frequency values

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

Speeding up HaplotypeCaller analysis

Speeding up HaplotypeCaller analysis 0 how can I speed up the HaplotypeCaller command running? input bam file is about 16G and running time using the below command is about 15 hours. java -Xmx64G -jar GenomeAnalysisTK.jar -nt 1 -nct 34 -T HaplotypeCaller -R Renamed.fasta -I realigned.bam -o raw_variants.g.vcf.gz -ERC GVCF GATK…

Continue Reading Speeding up HaplotypeCaller analysis

Fasta.fai file error

Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…

Continue Reading Fasta.fai file error

How To Uncompress The 1000 Genome Vcf.Gz File

How To Uncompress The 1000 Genome Vcf.Gz File 2 Hello, Can somebody tell me how to uncompress 1000 Genome vcf.gz files? I am performing an RNA-editing analysis and would like to substract annotated SNPs/INDELs. I have already done so using dbSNP data with bedtools instersect, but am still stuck with…

Continue Reading How To Uncompress The 1000 Genome Vcf.Gz File

comparing variants between two VCF files

comparing variants between two VCF files 1 I have two VCF files (e.g. SV1.vcf.gz, SV2.vcf.gz) and a bed file (reg.bed). I would like to compare the variants among them in the BED regions. The comparison includes the common variants and unique variants present in SV1 and SV2. I am currently…

Continue Reading comparing variants between two VCF files

Association test to get p values and OR in plink2, and file input format

Association test to get p values and OR in plink2, and file input format 0 Are there any commands for association testing in plink2 which will output p-value and OR in the resulting output file? If so, what kind of file input do I need to use for such commands…a…

Continue Reading Association test to get p values and OR in plink2, and file input format

Performing population stratification based on GWA tutorial

Performing population stratification based on GWA tutorial 0 Hi, I’m performing QC steps of Andries T. Marees GWA tutorial, currently I’m stuck at 7th step where you should begin the population stratification downloading a 61GB vcf.gz file of 1000genomes containing genetic data of 629 individuals from different ethnic backgrounds. Successively…

Continue Reading Performing population stratification based on GWA tutorial

Output per variant and per sample heterozygosity fraction from VCF.

Output per variant and per sample heterozygosity fraction from VCF. 2 As a QC measure I would like to know the per variant and per sample heterozygosity fraction. I already used vcftools to output the missingness per variant and sample. vcftools.github.io/man_latest.html Is there any tool that can do the same…

Continue Reading Output per variant and per sample heterozygosity fraction from VCF.

Association test to get p values and OR in plink2, and file input format?

Association test to get p values and OR in plink2, and file input format? 0 Are there any commands for association testing in plink2 which will output p-value and OR in the resulting output file? If so, what kind of file input do I need to use for such commands?…

Continue Reading Association test to get p values and OR in plink2, and file input format?

phase_trio.sh | searchcode

phase_trio.sh | searchcode PageRenderTime 24ms CodeModel.GetById 16ms app.highlight 5ms RepoModel.GetById 1ms app.codeStats 0ms /Phase/phase_trio.sh github.com/BioinformaticsArchive/fCNV Shell |…

Continue Reading phase_trio.sh | searchcode

bcftools merge; retaining sample names

bcftools merge; retaining sample names 2 When I do bcftools merge, the headers do not retain the filenames.  How can I specify filenames? This is my command  bcftools merge vcf/unfiltered/*.vcf.gz -O z > msa/pooled.vcf.gz However this is the relevant part of my header, despite the filenames I gave it.  Is…

Continue Reading bcftools merge; retaining sample names

VariantRecalibrator no positional argument is defined for this tool.

Hi, I am trying to run the following command: gatk VariantRecalibrator -R genome.fa -V all.Sample.SNP.vcf.gz –trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR…

Continue Reading VariantRecalibrator no positional argument is defined for this tool.

How to include/keep only the samples in a list in VCF.gz file?

How to include/keep only the samples in a list in VCF.gz file? 3 Dear Friends, I have a list of 8000 samples in a file “samples.txt”: samples.txt: TCGA..barcode.. TCGA..barcode.. . . I am using bcftools to only keep these samples in the vcf.gz file. The vcf.gz file has 10000 samples….

Continue Reading How to include/keep only the samples in a list in VCF.gz file?

Base recalibration -Java run time error and no sequence dictionary

Base recalibration -Java run time error and no sequence dictionary 0 Hello I am stuck with base recalibration step in NGS analysis. Used this command for the base calibration step: gatk BaseRecalibrator -I sample1.bam -R gch38.fa –known-sites GCF_000001405.39 -O recal_data.table I got the following warning: WARN IndexUtils – Feature file…

Continue Reading Base recalibration -Java run time error and no sequence dictionary

Trio de novo analyses

Hi all, I have three VCFs, a child (male), a father and a mother, and I would like to extract de novo variants in the child. All three samples were called separately, however, using the same GATK pipeline. I ran rtg tool to try to extract the de novo variants…

Continue Reading Trio de novo analyses

Filter on Allele Balance using BCFTools

Filter on Allele Balance using BCFTools 0 Hi All, I need to filter my variants based on the following criteria. 1) Include SNP sites with at least one heterozygous with allele balance(AB) > 0.15 or at least one homozygous variant 2) Include INDEL sites with at least one heterozygous with…

Continue Reading Filter on Allele Balance using BCFTools

Error when Phasing with Beagle 5.2

Error when Phasing with Beagle 5.2 0 I’m having trouble phasing a multi-sample (9-samples) vcf file produced by gatk HaplotypeCaller with Beagle 5.2. I do not have a genetic map or reference panel. I am working with a very heterozygous group of organisms (sea urchins). When I run beagle with…

Continue Reading Error when Phasing with Beagle 5.2

So many variants detected.

So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…

Continue Reading So many variants detected.

How to set variant FILTER in a VCF file based on overlap with regions in a BED file

I figured out how to do the annotation using BCFTools. 2 steps are needed. Input BED file requires 1 for each region where the annotation should be set Chr_01 1000 2000 1 Chr_05 5000 6000 1 Input header file: ##INFO=<ID=BAD_REGION,Number=0,Type=Flag,Description=”My bad region for some reason”> bgzip and tabix the bed…

Continue Reading How to set variant FILTER in a VCF file based on overlap with regions in a BED file

Understanding bcftools command

Understanding bcftools command 1 I need to perform the following action to combine multiple vcf files into one BCF=/path_to_bcftools export BCFTOOLS_PLUGINS=$BCF/plugins DIR=/path_to_normal_vcf_file $BCF/bcftools merge -m all -f PASS,. –force-samples $DIR/*.vcf.gz | $BCF/bcftools plugin fill-AN-AC | $BCF/bcftools filter -i ‘SUM(AC)>1′ > panel_of_normal.vcf I don’t have access to command-line bcftools, and since…

Continue Reading Understanding bcftools command

Error while subsetting VCF – error doesn’t check out with (z)grep

Error while subsetting VCF – error doesn’t check out with (z)grep 0 I’m using bcftools view -s to subset a VCF.gz file. I ran into an error: [E::vcf_parse_format] Number of columns at chr9:44897051 does not match the number of samples (90 vs 99) To look at this site, I ran…

Continue Reading Error while subsetting VCF – error doesn’t check out with (z)grep

bcftools consensus still returns “Could not parse the header” error

bcftools consensus still returns “Could not parse the header” error 0 I attempted to create a consensus fasta file using bcftools, i.e. bgzip -c All_SRR_SNP_Clean.vcf > All_SRR_SNP_Clean.vcf.gz tabix All_SRR_SNP_Clean.vcf.gz cat $ref| bcftools consensus $vcf_dir/All_SRR_SNP_Clean.vcf.gz > consensus.fasta where $ref is the path to a Drosophila reference genome fa and the vcf…

Continue Reading bcftools consensus still returns “Could not parse the header” error