Tag: BCFtools

Genome-wide identification of enhancers and transcription factors regulating the myogenic differentiation of bovine satellite cells | BMC Genomics

1. Yin H, Price F, Rudnicki MA. Satellite cells and the muscle stem cell niche. Physiol Rev. 2013;93(1):23–67. CAS  PubMed  PubMed Central  Google Scholar  2. Hoppeler H, Fluck M. Plasticity of skeletal muscle mitochondria: structure and function. Med Sci Sport Exer. 2003;35(1):95–104. CAS  Google Scholar  3. Astruc T: Carcass Composition,…

Continue Reading Genome-wide identification of enhancers and transcription factors regulating the myogenic differentiation of bovine satellite cells | BMC Genomics

bcftools merge of over 9000+ vcf files

Hi all, I have around 9000+ vcf files that I’m trying to merge using bcftools merge. They are all located in their own folder so essentially I have a folder containing 9000+ separate folders, each containing one vcf.gz file. I have tried out the following code via this tutorial bcftools…

Continue Reading bcftools merge of over 9000+ vcf files

Senior Bioinformatics Scientist (Statistical Geneticist) – Research – Cambridge, UK in San Diego, California

Senior Bioinformatics Scientist – Cambridge, UK Candidates wishing to work remotely from the Netherlands, France, or Belgium may also be considered. Overview Since 2001, the cost of DNA sequencing has dropped more than 100,000-fold, from $100,000,000 USD per human genome to less than $600 USD today. This is resulting in…

Continue Reading Senior Bioinformatics Scientist (Statistical Geneticist) – Research – Cambridge, UK in San Diego, California

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

How to call LOH with FreeC

How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…

Continue Reading How to call LOH with FreeC

How to merge vcf files

How to merge vcf files 3 Hi, I have 90 VCF files which I am looking to merge into one VCF file. I am trying to use VCFtools to merge these files. For that I am following the below process but while using vcf-merge command is not able to merge…

Continue Reading How to merge vcf files

Filter criteria for variants based on GBS data

Filter criteria for variants based on GBS data 0 Are there recommended filter criteria for variants based on GBS data? I currently use this filter formula that is used in bcbio for WGS based variants soft-filtering bcftools –soft-filter GATKCutoffSNP -e TYPE=”snp” && (MQRankSum < -12.5 || ReadPosRankSum < -8.0 ||…

Continue Reading Filter criteria for variants based on GBS data

plotting roh from bcftools

plotting roh from bcftools 0 Heys, I am following this small tutorial on how to calculate ROHs from a vcf file using bcftools (samtools.github.io/bcftools/howtos/roh-calling.html) and I am getting this txt file: # This file was produced by: bcftools roh(1.10.2+htslib-1.10.2-3) # The command line was: bcftools roh -G30 –AF-dflt 0.4 my_file.vcf…

Continue Reading plotting roh from bcftools

Interpreting output of BCFtools RoH

Interpreting output of BCFtools RoH 0 Hello! I am using BCFtools RoH for the first time, and I am having some trouble understanding its output file. The input is a gvcf file with genotype calls for one sample only, and I want to infer where there might be autozygous tracts….

Continue Reading Interpreting output of BCFtools RoH

How to merge many huge gVCFs with high speed.

How to merge many huge gVCFs with high speed. 3 Hello, In order to perform population gnomic analysis, I am trying to merge many and huge variants data (gVCF), such as several dozens Gb, over 20 files. Bcftools merge and vcf-merge were used so far but very slow to merge…

Continue Reading How to merge many huge gVCFs with high speed.

Phylogeographic reconstruction of the marbled crayfish origin

Procambarus fallax collections and PCR genotyping Animals were collected from various wild populations (Table S1) in compliance with state and local regulations (Georgia department of natural resources scientific collection permit 115621108, state of Florida collection permits S-19-10 and S-20-04). DNA was isolated from abdominal muscle tissue using SDS-based extraction and precipitation…

Continue Reading Phylogeographic reconstruction of the marbled crayfish origin

User friendly (visual&interactive) VCF/BCF mining tools (2021)

What is currently the best user friendly (visual and interactive) VCF/BCF mining tool in 2021? For VCF/BCF similar to size or even larger than the 1000 human genomes VCF? I guess most organization do not have a visual and interactive mining VCF mining tool but use either: A website front-end…

Continue Reading User friendly (visual&interactive) VCF/BCF mining tools (2021)

compare two vcf files

compare two vcf files 1 Hi. I have a problem I want to compare the rs numbers in two vcf files. so I want to check which of the Rs numbers are in the top 10 percent. I don’t know what to do. Can you help me if I have…

Continue Reading compare two vcf files

The sardine run in southeastern Africa is a mass migration into an ecological trap

INTRODUCTION Large-scale annual migrations occur in an extraordinary range of animals, from insects to the great whales. While the driving mechanisms of these migrations are varied and sometimes poorly understood, they often represent a way of optimizing conditions for breeding and adult fitness when these are in conflict. Often, populations…

Continue Reading The sardine run in southeastern Africa is a mass migration into an ecological trap

High frequency of an otherwise rare phenotype in a small and isolated tiger population

Significance Small and isolated populations have low genetic variation due to founding bottlenecks and genetic drift. Few empirical studies demonstrate visible phenotypic change associated with drift using genetic data in endangered species. We used genomic analyses of a captive tiger pedigree to identify the genetic basis for a rare trait,…

Continue Reading High frequency of an otherwise rare phenotype in a small and isolated tiger population

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Filtering long indels from VCF

Filtering long indels from VCF 1 Hi, to create a multi-sample VCF in a large cohort of WES samples of very different quality I have to select only high-quality variants genotyped in as many samples as possible. I figured out that long indels have low quality only substitutions do not…

Continue Reading Filtering long indels from VCF

BCFtools Allelic Depth format nowhere explained?

BCFtools Allelic Depth format nowhere explained? 1 Hi, I had a question about Allelic depth (AD) of BCFtools. It has this format e.g. AD=262,18,0 – What number shows the depth of the REF and what is the ALT? and what is the ‘0’? I found 1 form where someone said…

Continue Reading BCFtools Allelic Depth format nowhere explained?

bcftools merge

Check out the vcf_merge command I wrote: $ fuc vcf_merge -h usage: fuc vcf_merge [-h] [–how TEXT] [–format TEXT] [–sort] [–collapse] vcf_files [vcf_files …] This command will merge multiple VCF files (both zipped and unzipped). It essentially wraps the ‘pyvcf.merge’ method from the fuc API. By default, only the GT…

Continue Reading bcftools merge

Edit vcf file 0|0 to 0

Edit vcf file 0|0 to 0 1 I have a vcf file with GT format as 0|0 0|1 1|1 etc. I would like to convert those to a single number to create a dosage file. Ex: Editing the vcf so that 0|0 become 0, 0|1 becomes 1 1|1 becomes 2…

Continue Reading Edit vcf file 0|0 to 0

Pacific Biosciences hiring Bioinformatics Software Engineer in United States

PacBio’s Application Software Group focuses on building solid, strategic value around our core data type – highly accurate, long read sequencing – by producing innovative software that unlocks genomics in ways never seen before. We’re growing an interdisciplinary team of bioinformatic experts to tackle some of the most interesting problems…

Continue Reading Pacific Biosciences hiring Bioinformatics Software Engineer in United States

How to filter GATK vcf file using other programs

How to filter GATK vcf file using other programs 0 hi everyone I called variants for a WGS project using GATK (HaplotypeCaller). Now, when I want to filter that VCF file by VariantFiltration command in GATK, so the following error message appears. java.lang.NumberFormatException: For input string: “10.90” I asked my…

Continue Reading How to filter GATK vcf file using other programs

comparing variants between two VCF files

comparing variants between two VCF files 1 I have two VCF files (e.g. SV1.vcf.gz, SV2.vcf.gz) and a bed file (reg.bed). I would like to compare the variants among them in the BED regions. The comparison includes the common variants and unique variants present in SV1 and SV2. I am currently…

Continue Reading comparing variants between two VCF files

bcftools isec -n operators

bcftools isec -n operators 0 I am still very confused by the use of the bcftools isec -n flag. According to the manual: samtools.github.io/bcftools/bcftools.html#isec): -n, –nfiles [+-=]INT|~BITMAP output positions present in this many (=), this many or more (+), this many or fewer (-), or the exact same (~) files…

Continue Reading bcftools isec -n operators

bcftools multiallelic split not working

I am attempting to split multiallelic sites using bcftools norm with the following command: zcat ${inputVcf} | sed ‘s/AD,Number=./AD,Number=R/g’ | sed ‘s/ADR,Number=./ADR,Number=R/g’ | sed ‘s/ADF,Number=./ADF,Number=R/g’ | bcftools norm –fasta-ref ${genomeFa} –check-ref s –multiallelics -any –output ${outputVcf} The sed commands were based on the recommendation from here. However I’m still getting…

Continue Reading bcftools multiallelic split not working

phase_trio.sh | searchcode

phase_trio.sh | searchcode PageRenderTime 24ms CodeModel.GetById 16ms app.highlight 5ms RepoModel.GetById 1ms app.codeStats 0ms /Phase/phase_trio.sh github.com/BioinformaticsArchive/fCNV Shell |…

Continue Reading phase_trio.sh | searchcode

bcftools merge; retaining sample names

bcftools merge; retaining sample names 2 When I do bcftools merge, the headers do not retain the filenames.  How can I specify filenames? This is my command  bcftools merge vcf/unfiltered/*.vcf.gz -O z > msa/pooled.vcf.gz However this is the relevant part of my header, despite the filenames I gave it.  Is…

Continue Reading bcftools merge; retaining sample names

Bcftools how to add DP to FORMAT field (get per sample read depth for REF vs ALT alleles )

Bcftools how to add DP to FORMAT field (get per sample read depth for REF vs ALT alleles ) 1 I’m trying to achieve what this post was looking for Add Dp Tag To Genotype Field Of Vcf File Currently this is my command: bcftools mpileup -Ou –max-depth 8000 –min-MQ…

Continue Reading Bcftools how to add DP to FORMAT field (get per sample read depth for REF vs ALT alleles )

Vcfutils error code

Vcfutils error code 20-08-2021 code at line (I think) just to get it to write a proper fq. Second issue is this error: substr outside of string at /usr/local/bin/object91.ru line We can do this in a single…

Continue Reading Vcfutils error code

Pybedtools error sans

Pybedtools error sans 20-08-2021 pysam – Error when I install samtools for python on windows – i trying install pysam, pybedtools modules on python got error: ($i=1; $i[email protected] temp]$ conda install pysam bedtools hisat2 [ snip. However,…

Continue Reading Pybedtools error sans

how to install conda bcftools +fill-tags plugin ?

how to install conda bcftools +fill-tags plugin ? 0 Hi, I am very new to bioinformatics but I wanted to know is there a way to install bcftools +fill-tags plugin in conda env. Plz consider that when I check my env there is not a bcftools folder and subsequently the…

Continue Reading how to install conda bcftools +fill-tags plugin ?

Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup

Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup 2 I am working with about 500 samples of human exome data. used hg19 to align my reads and ran a standard best-practices GATK workflow. Later only to realise that a small 1Mb loci has not mapped properly due…

Continue Reading Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup

Extracting variations in the gene regions and from 100 bp of gene boundary from multiple VCF files

Extracting variations in the gene regions and from 100 bp of gene boundary from multiple VCF files 0 Hi, I sincerely hope that I am not repeating an already answered question. I couldn’t find the answer to my exact problem. I have three VCF files derived using bcftools (isec). Those…

Continue Reading Extracting variations in the gene regions and from 100 bp of gene boundary from multiple VCF files

How to include/keep only the samples in a list in VCF.gz file?

How to include/keep only the samples in a list in VCF.gz file? 3 Dear Friends, I have a list of 8000 samples in a file “samples.txt”: samples.txt: TCGA..barcode.. TCGA..barcode.. . . I am using bcftools to only keep these samples in the vcf.gz file. The vcf.gz file has 10000 samples….

Continue Reading How to include/keep only the samples in a list in VCF.gz file?

Filter on Allele Balance using BCFTools

Filter on Allele Balance using BCFTools 0 Hi All, I need to filter my variants based on the following criteria. 1) Include SNP sites with at least one heterozygous with allele balance(AB) > 0.15 or at least one homozygous variant 2) Include INDEL sites with at least one heterozygous with…

Continue Reading Filter on Allele Balance using BCFTools

Inquiry related to vcf file and formatting

Hello everyone, I am trying to run predixcan software. But its showing error as segmentation fault implying that there is something wrong with my vcf files. I am sharing the header of vcf file. ##fileformat=VCFv4.1 ##INFO=<ID=LDAF,Number=1,Type=Float,Description=”MLE Allele Frequency Accounting for LD”> ##INFO=<ID=AVGPOST,Number=1,Type=Float,Description=”Average posterior probability from MaCH/Thunder”> ##INFO=<ID=RSQ,Number=1,Type=Float,Description=”Genotype imputation quality from…

Continue Reading Inquiry related to vcf file and formatting

How to set variant FILTER in a VCF file based on overlap with regions in a BED file

I figured out how to do the annotation using BCFTools. 2 steps are needed. Input BED file requires 1 for each region where the annotation should be set Chr_01 1000 2000 1 Chr_05 5000 6000 1 Input header file: ##INFO=<ID=BAD_REGION,Number=0,Type=Flag,Description=”My bad region for some reason”> bgzip and tabix the bed…

Continue Reading How to set variant FILTER in a VCF file based on overlap with regions in a BED file

Understanding bcftools command

Understanding bcftools command 1 I need to perform the following action to combine multiple vcf files into one BCF=/path_to_bcftools export BCFTOOLS_PLUGINS=$BCF/plugins DIR=/path_to_normal_vcf_file $BCF/bcftools merge -m all -f PASS,. –force-samples $DIR/*.vcf.gz | $BCF/bcftools plugin fill-AN-AC | $BCF/bcftools filter -i ‘SUM(AC)>1′ > panel_of_normal.vcf I don’t have access to command-line bcftools, and since…

Continue Reading Understanding bcftools command

EOF marker absent in VCF

EOF marker absent in VCF – can this be safely ignored? 0 Hi, I generated a VCF file using a bcftools mpileup | bcftools call pipeline. I have done this before, and the file produced then looks fine. However, the log for this one had [W::bgzf_read_block] EOF marker is absent….

Continue Reading EOF marker absent in VCF

Error while subsetting VCF – error doesn’t check out with (z)grep

Error while subsetting VCF – error doesn’t check out with (z)grep 0 I’m using bcftools view -s to subset a VCF.gz file. I ran into an error: [E::vcf_parse_format] Number of columns at chr9:44897051 does not match the number of samples (90 vs 99) To look at this site, I ran…

Continue Reading Error while subsetting VCF – error doesn’t check out with (z)grep

bcftools consensus still returns “Could not parse the header” error

bcftools consensus still returns “Could not parse the header” error 0 I attempted to create a consensus fasta file using bcftools, i.e. bgzip -c All_SRR_SNP_Clean.vcf > All_SRR_SNP_Clean.vcf.gz tabix All_SRR_SNP_Clean.vcf.gz cat $ref| bcftools consensus $vcf_dir/All_SRR_SNP_Clean.vcf.gz > consensus.fasta where $ref is the path to a Drosophila reference genome fa and the vcf…

Continue Reading bcftools consensus still returns “Could not parse the header” error