Tag: VCF

Bcftools equivalent of vcftools conversion to ped & map

Bcftools equivalent of vcftools conversion to ped & map 1 I am converting a VCF to ped & map thus in vcftools vcftools –gzvcf ZZZZZTYT.vcf.gz –plink –out ZZZZZTYT which works fine. However, I have been searching and searching, can bcftools do the same with a bcf? bcftools • 103 views…

Continue Reading Bcftools equivalent of vcftools conversion to ped & map

Z697 – YFull YTree Info

R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697     Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…

Continue Reading Z697 – YFull YTree Info

difficulty filtering vcf file with vcftools

difficulty filtering vcf file with vcftools 1 I had a large VCF file named “common_known_variants.vcf ” which contains all known human variants downloaded from ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-common_all.vcf.gz -O common_known_variants.vcf.gz I’m trying to extract the known variants from only chromosomes 1,2,3,9,22, and X and write them in a new vcf file with the…

Continue Reading difficulty filtering vcf file with vcftools

Error in BAFFromGVCFs – GenotypeGVCFs

Bug Report Affected module(s) or script(s) Module00c/BAFFromGVCFs/GenotypeGVCFs Affected version(s) Description I’m running GATKSVPipelineBatch and I got the following error in the GenotypeGVCFs task: A USER ERROR has occurred: Input /tmp/scratch/bean-resources/broad-references/v0/Homo_sapiens_assembly38.dbsnp138.vcf must support random access to enable queries by interval. If it’s a file, please index it using the bundled tool…

Continue Reading Error in BAFFromGVCFs – GenotypeGVCFs

Latest dbSNP VCF

This is the directory you’re looking for: ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/ curl -s ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.39.gz | zcat | head ##fileformat=VCFv4.2 ##fileDate=20210513 ##source=dbSNP ##dbSNP_BUILD_ID=155 ##reference=GRCh38.p13 ##phasing=partial ##INFO=<ID=RS,Number=1,Type=Integer,Description=”dbSNP ID (i.e. rs number)”> ##INFO=<ID=GENEINFO,Number=1,Type=String,Description=”Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a…

Continue Reading Latest dbSNP VCF

Missing data per site

Hi, I want to calculate statistics of missing data per each site in my vcf file. Using vcftools –missing-site gives wrong stats for several sites. Is there is any other way to calculate it? Thank you! I have 36 samples and here is an example of the vcftools –missing-site output…

Continue Reading Missing data per site

bedtools interset doesn’t return a VCF file?

bedtools interset doesn’t return a VCF file? 1 I am filtering a VCF file with a bed file using Bedtools. I have carried out this successfully with bedtools intersect -wb -a myVCF.vcf -b myBEDfile.bed > output.txt However, what I want is to get a VCF file with the metadata and…

Continue Reading bedtools interset doesn’t return a VCF file?

Hard filtering on GATK HaplotypeCaller giving multiple warnings

I’m using this pipeline for deriving variants from RNA sequencing data: github.com/modupeore/VAP which uses specific versions of various tools, including HaplotypeCaller from GATK (v3.8-0-ge9d806836). The final step is a set of hard filters on the called variants (applied using VariantFilter), but looking at the log files, there are a lot…

Continue Reading Hard filtering on GATK HaplotypeCaller giving multiple warnings

How Can I Merge VCF File ?

The multiple secure and trustworthy solution to merge several VCF files into a single VCF is by establishing an efficacious VCF Merge Tool. In this respect, one of my colleagues has just used the VCF Merge Tool which permitted him to merge multiple VCF files by maintaining high data integrity….

Continue Reading How Can I Merge VCF File ?

snp – Reference variant detected as altered one in bam file

I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…

Continue Reading snp – Reference variant detected as altered one in bam file

Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing

** Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing ** The NeuroGenomics and Informatics (NGI) Center lead by Dr. Carlos Cruchaga at Washington University School of Medicine is recruiting a Bioinformatics Scientist to work on Whole Genome and Whole Exome Sequencing. We are seeking an experienced, self-motivated, self-driven scientist…

Continue Reading Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Sequencing data We used publicly available sequencing data from the GIAB consortium45, 1000 Genomes Project high-coverage data46 and Human Genome Structural Variation Consortium (HGSVC)4. All datasets include only samples consented for public dissemination of the full genomes. Statistics and reproducibility For generating the assemblies, we used all 14 samples for…

Continue Reading Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

how to extract unique variants from GVCF

how to extract unique variants from GVCF 1 [note: cross-posted on GATK forum – still awaiting a response] I have a GVCF (generated using GATK’s HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample….

Continue Reading how to extract unique variants from GVCF

wrong number of fields ?

Error occurence after merging files with bcftools: wrong number of fields ? 1 I have multiple vcf of CASES and CONTROLS variations annotated by VEP, SNPEff, SnpSift. first pair vcf -> only variations| CASES and CONTROLS second pair vcf -> variations + SnpEff | CASES and CONTROLS third pair vcf->…

Continue Reading wrong number of fields ?

L1193 – YFull YTree Info

I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193     FGC87558     Y72031     Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…

Continue Reading L1193 – YFull YTree Info

Y18411 – YFull YTree Info

J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…

Continue Reading Y18411 – YFull YTree Info

How to Merge VCF files in Windows 10

Many organizations working on VCF have to face collecting and combining emails. Hiring technicians increase the data management cost. Along with the disadvantage, downtime is a big issue. It hampers work. Technicians often try to fix the problem manually. It is a time-consuming process, so trusting a vcf merge application is…

Continue Reading How to Merge VCF files in Windows 10

Variant quality and filters on GATK HaplotypeCaller generated VCFs

Variant quality and filters on GATK HaplotypeCaller generated VCFs 0 Hi, I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for “Germline short variants discovery” for single-sample data to generate a VCF using HaplotypeCaller. The guidelines then point to the use…

Continue Reading Variant quality and filters on GATK HaplotypeCaller generated VCFs

Merge only bim files with plink

Merge only bim files with plink 0 Hello For the same dataset they provide a single BED and FAM files for all the chromosomes. However, the BIM files are split in chromosomes. I would like to generate the VCF file with the genotyping calls of all chromosomes but I need…

Continue Reading Merge only bim files with plink

BioInformatics Product Manager at Helix (remote)

You + Helix Helix is a place where innovators and doers gather in order to drive significant progress in population genomics. We have come together to work at the intersection of clinical care, research, and genomics.   If you’re excited by the idea of making a meaningful impact and joining a…

Continue Reading BioInformatics Product Manager at Helix (remote)

rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

I am working with non-model plant RNA samples which we have been deep sequenced and analysed using STAR aligner under default parameters. Aim We would like to conduct SNP discovery of these samples. Objective Our ultimate goal with this genotypic data is to search for variants (both SNPs and indels)…

Continue Reading rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias

Parallel reduction in flowering time from de novo mutations enable evolutionary rescue in colonizing lineages

Díaz, S. et al. Summary for Policymakers of the Global Assessment Report on Biodiversity and Ecosystem Services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES, 2019). Fisher, R. A. The correlation between relatives on the supposition of Mendelian inheritance. Earth Environ. Sci. Trans. R. Soc. Edinb. 52,…

Continue Reading Parallel reduction in flowering time from de novo mutations enable evolutionary rescue in colonizing lineages

using ANNOVAR annotation clinvar database out wrong position

using ANNOVAR annotation clinvar database out wrong position 0 Hello Biostars, I was trying to annotate the VCF using ANNOVAR,but I get a wrong out ,it seems my clinvar database is not sutibale bcftools_callCommand=call -m -v -o /project/plantform/20220316PCR/03.amplify/L2107973CFD7G5kxT1/L2107973CFD7G5kxT1.variation.vcf /project/plantform/20220316PCR/03.amplify/L2107973CFD7G5kxT1/L2107973CFD7G5kxT1.mpileup.vcf clinvar ANNOVAR • 34 views Read more here: Source link

Continue Reading using ANNOVAR annotation clinvar database out wrong position

M8498 – YFull YTree Info

B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…

Continue Reading M8498 – YFull YTree Info

FGC15109 – YFull YTree Info

I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109     Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…

Continue Reading FGC15109 – YFull YTree Info

bedtools -u not giving unique files

bedtools -u not giving unique files 1 The following are the steps Im following: First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38): tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf’ output({sample_name}_out.vcf’) chr2 113982416 rs56177103 TATAAAATAAAATAAA…

Continue Reading bedtools -u not giving unique files

bam – Detect mutation context in a read of a sam file

That kind of custom fiddling with reads and variants is very cumbersome, non-standard and also error-prone. Do a standard variant callign pipeline and then filter for the mutations that you want. Then extract the variant position (so the coordinates) and get the variant context from the reference genome. Using individual…

Continue Reading bam – Detect mutation context in a read of a sam file

vcf – Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?

I have always thought that 1/0 and 0/1 in VCF genotype fields are equivalent. And yet, GATK uses both. For example, these are two variants called in the same sample and the same run of GATK 4.1.4.0: chr7 117120317 . ATTCATTGTTTTGAAAGAAAGATGGAAGAATGAACTGAAG A 748.97 . AC=1;AF=0.5;AN=2;DP=64;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=11.89;SOR=7.223 GT:AD:DP:GQ:PL:SB 1/0:0,36:63:99:2294,1042,933:0,0,0,36 chr7 117120306 ….

Continue Reading vcf – Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?

split gtex genotype data by chromosomes.

Hello, I used and edited the command line to use –vcf to import vcf file. I used these commands: for chr in $(seq 1 22); do      plink –vcf /dbGAP/GTEx_Analysis_2017-06-05_v8_WholeExomeSeq_979Indiv_VEP_annot.vcf.gz            –chr $chr            –recode            –out…

Continue Reading split gtex genotype data by chromosomes.

Understanding the number of intersection in bedtools jaccard

Understanding the number of intersection in bedtools jaccard 1 Hello, I am using bedtools jaccard to compare two vcf files, as: bedtools jaccard -a ancestors.calls.norm.snp.vcf.gz -b GC078310.calls.norm.snp.vcf.gz intersection union-intersection jaccard n_intersections 1606899 1806667 0.889427 1536700 What I do not get is why n_intersections is equal to 1536700. Especially, the difference…

Continue Reading Understanding the number of intersection in bedtools jaccard

FGC19851 – YFull YTree Info

R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851     Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…

Continue Reading FGC19851 – YFull YTree Info

Re: Quick Way to Merge Multiple VCF Files into One

vCard files(.vcf) are essential for professional, personal, and even home purposes. Users need to merge multiple vCard files when they do not manage multiple vCard files. Another reason for users to combine multiple VCF files is security concerns. Here are some of the reasons:   Organize your address book contacts…

Continue Reading Re: Quick Way to Merge Multiple VCF Files into One

YP4024 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…

Continue Reading YP4024 – YFull YTree Info

HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd

Bioinformatics Scientist 2 Hours: 37.5 hours per week Salary: Competitive Ref No: HRJOB7442 Business Unit: Diagnostic Services Location: Craigavon or Manchester Open To: Internal and External Applicants The Company Almac Diagnostic Services is a leading stratified medicine business, specialising in biomarker-driven clinical trials. We are incredibly proud to be involved…

Continue Reading HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd

Genomic variation from an extinct species is retained in the extant radiation following speciation reversal

Vamosi, J. C., Magallon, S., Mayrose, I., Otto, S. P. & Sauquet, H. Macroevolutionary patterns of flowering plant speciation and extinction. Annu. Rev. Plant Biol. 69, 685–706 (2018). CAS  PubMed  Google Scholar  Rhymer, J. M. & Simberloff, D. Extinction by hybridization and introgression. Annu. Rev. Ecol. Syst. 27, 83–109 (1996)….

Continue Reading Genomic variation from an extinct species is retained in the extant radiation following speciation reversal

How to calculate r2 for IMPUTE2

How to calculate r2 for IMPUTE2 0 Hi all, I was finally able with all the help to remove some SNPs from the vcf file and then run it through IMPUTE2. This means I have the original vcf and the imputed vcf, how do run r2 analysis? Is there a…

Continue Reading How to calculate r2 for IMPUTE2

Y570 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status AF2 —— Q-Y570 Q-Y570*, Q-F746* Hg19 .BAM Ancient 1X, 1.3 Mbp, 94 bp YF093124 —— Q-M120* —— Hg38 .BAM Nebula Genomics 57X, 23.6 Mbp, 150 bp Kolyma1 Russia (Sakha, Respublika [Yakutiya]) Q-Y222276* —— Hg19 .BAM Ancient 7X, 13.4…

Continue Reading Y570 – YFull YTree Info

How to apply vcftools –diff and extract only the different variants

How to apply vcftools –diff and extract only the different variants 0 Hello, I am trying to apply vcftools –diff in order to extract the different variants between two VCF files. vcftools –vcf marked_IO002_tumor-pe.vcf –diff marked_IO002_normal-pe.vcf –diff-site –out t_v_n I am getting this as result : VCFtools – 0.1.16 (C)…

Continue Reading How to apply vcftools –diff and extract only the different variants

PF6747 – YFull YTree Info

E-PF6747 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF010216 Azerbaijan (Qəbələ) E-PF6747* —— Hg19 .BAM FTDNA (Y500) 50X, 13.7 Mbp, 165 bp YF064736 Egypt (Al Minūfīyah) E-FT97857* —— Hg38 .BAM FTDNA (Y700) 35X, 18.5 Mbp, 151 bp YF093064 Yemen (Tā’izz) E-Y280593…

Continue Reading PF6747 – YFull YTree Info

PostDoc Plant Bioinformatics job with SKOLKOVO INSTITUTE OF SCIENCE AND TECHNOLOGY

<p><strong>Want to participate to the outstanding new area of agro-genomics ? To put into the practice how the genetic diversity and genome-assisted breeding in crops contribute to provide healthy and high quality food in a sustainable way to humankind? Strong in bioinformatics and interested in working with very large datasets…

Continue Reading PostDoc Plant Bioinformatics job with SKOLKOVO INSTITUTE OF SCIENCE AND TECHNOLOGY

java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…

Continue Reading java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

Z2039 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF003382 Finland (Länsi-Suomen lääni) I-Z2040* —— Hg19 .BAM FTDNA (Y500) 47X, 13.3 Mbp, 165 bp YF067917 Ireland I-FGC69701* —— Hg19 .BAM Dante Labs 9X, 22.9 Mbp, 151 bp YF078735 Belarus (Vicebskaja voblasc’) / Polish I-FGC69702 —— Hg38 .VCF…

Continue Reading Z2039 – YFull YTree Info

BY7447 – YFull YTree Info

E-BY7447 – YFull YTree Info SNPs currently defining E-BY7447 BY7447     Sample ID Country / Language Info Ref File Testing company Statistics Status YF075635 Yemen (Al Bayḑā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 39X, 18.2 Mbp, 151 bp YF067501 Yemen (Şan’ā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 44X, 18.8 Mbp,…

Continue Reading BY7447 – YFull YTree Info

Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

I’ve annotated some variants using VEP, and was looking at the minor allele frequencies. Some of the variants had very different MAFs in the annotation than I expected (I expected MAF < 1%, whereas some annotated MAFs were >50%). I looked up the same variants on the gnomAD v3 browser,…

Continue Reading Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser

Bioconductor on Microsoft Azure – Microsoft Tech Community

Co-authored by: Nitesh Turaga – Scientist at Dana Farber/Harvard, Bioconductor Core Team Erdal Cosgun – Sr. Data Scientist at Microsoft Biomedical Platforms and Genomics team Vincent Carey – Professor at Harvard Medical School, Bioconductor Core Team   Introduction   The Bioconductor project promotes the statistical analysis and comprehension of current and emerging…

Continue Reading Bioconductor on Microsoft Azure – Microsoft Tech Community

DF109 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016926 Ireland R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 27X, 12.7 Mbp, 165 bp YF016394 United States (Ohio) R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 34X, 11.9 Mbp, 151 bp YF011566 Ireland (Mayo) R-DF109 R-DF109*, R-A18726*, R-FGC23742* Hg38…

Continue Reading DF109 – YFull YTree Info

Errors when compiling older version **samtools**

Errors when compiling older version **samtools** 0 I have downloaded bcf file from this website ricevarmap. In order to “view” this old bcf format and convert it to a newer one, it’s said that I have to install samtools-0.1.17, which has a older version bcftools in it. When I make…

Continue Reading Errors when compiling older version **samtools**

GATK HaplotypeCaller with interval list

I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…

Continue Reading GATK HaplotypeCaller with interval list

Split multiallelic SNPs to biallelic from vcf

Dear all, I have a particular vcf file like this, chrX 29 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2 I tried various tools to split this, but I get the following results, so the FORMAT and INFO lines are identical. chrX 29 . G A . PASS AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP…

Continue Reading Split multiallelic SNPs to biallelic from vcf

ZP77 – YFull YTree Info

R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562     Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…

Continue Reading ZP77 – YFull YTree Info

python – How can I fix the dash bio error: devtools cannot load source map dashbio@1.0.1 bundle.js.map?

I am implementing a website in Python with Django framework and using django-plotly-dash to display data. I am trying to use dash_bio’s IGV feature to display some chromosome data, but when I attempt to call the functionality, I receive the following errors and the callback that returns ‘dashbio.igv’ is unable…

Continue Reading python – How can I fix the dash bio error: devtools cannot load source map dashbio@1.0.1 bundle.js.map?

bcftools merged vcf file assigns all variants to one sample

bcftools merged vcf file assigns all variants to one sample 0 I’ve made one vcf file for each of three samples. I then combined them using bcftools, like so: # Make a list of vcf files to merge cat “${OUT}/results/variants/vcf_list” /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/3a7a-10.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/MF3.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/R507H-FB_S355_L001.vcf.gz Then merge the list: bcftools merge -l…

Continue Reading bcftools merged vcf file assigns all variants to one sample

variant – Where should you put you cache for ensembl-vep using conda

I’ve installed vep in conda like so: conda install ensembl-vep=105.0-0 And then I installed the human cache like so: vep_install -a cf -s homo_sapiens -y GRCh38 -c /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep –CONVERT But when I try and run vep I get an error: vep –dir_cache /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep -i /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated.vcf -o /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated_vep.vcf Am I doing…

Continue Reading variant – Where should you put you cache for ensembl-vep using conda

variant – Error running gatk HaplotypeCaller with allele specific annotations

I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…

Continue Reading variant – Error running gatk HaplotypeCaller with allele specific annotations

linux – How to fix Perl from anaconda not installing bioperl? Bailing out the installation for BioPerl

vep -i examples/homo_sapiens_GRCh38.vcf –database Can’t locate Bio/PrimarySeqI.pm in @INC (you may need to install the Bio::PrimarySeqI module) (@INC contains: /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/modules /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0 /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2 /home/youssef/anaconda3/envs/ngs1/lib/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/5.26.2 .) at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. Compilation failed in require at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm line 84. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm…

Continue Reading linux – How to fix Perl from anaconda not installing bioperl? Bailing out the installation for BioPerl

Variant calls of published already assembled genomes

Variant calls of published already assembled genomes 0 I have a set of short read sequencing for the 172 KB Epstein-barr virus genome. We successfully called our variants using GATK to a reference genome. A publication linked below from a different population compared variants (also from short read sequencing) to…

Continue Reading Variant calls of published already assembled genomes

why my VCF file generated with manta is missing genotype information

Hi, everybody, I am pretty new to coding and bioinformatics. I am using Manta as a tool to infer somatic structural variants (SVs) from a paired tumor/normal sample call. However, my somaticSV.vcf.gz file does not contain information about the genotype nor the genotype quality (there is a dot instead of…

Continue Reading why my VCF file generated with manta is missing genotype information

bedtools intersect error: Invalid record in file

Hello to all I am trying to run bedtools intersect with vcf file and a bed file (my goal is to add the depth data to my VCF) I get an error running this command: bedtools intersect -a depth.bed -b fish.vcf -wa -wb > $out The error: “Error: Invalid record…

Continue Reading bedtools intersect error: Invalid record in file

What file type does “PLINK –block” accept as input?

What file type does “PLINK –block” accept as input? 0 Hi, I have set of SNPs (distributed over all the chromosomes) and I am trying to do some haplotype block estimation to identify whether some of them are part of the same haplotype block, etc. It seems like “PLINK –blocks”…

Continue Reading What file type does “PLINK –block” accept as input?

VEP issue: ERROR: Cache assembly version (GRCh37) and database or selected assembly version (GRCh38) do not match

Describe the issue VEP give errors even my query and reference has same assembly version Command :$: ./vep -i examples/homo_sapiens_GRCh37.vcf –cache –refseq cache reference details while running install.pl ? 458 NB: Remember to use –refseq when running the VEP with this cache! downloading ftp.ensembl.org/pub/release-104/variation/indexed_vep_cache/homo_sapiens_refseq_vep_104_GRCh37.tar.gz unpacking homo_sapiens_refseq_vep_104_GRCh37.tar.gz converting cache, this may…

Continue Reading VEP issue: ERROR: Cache assembly version (GRCh37) and database or selected assembly version (GRCh38) do not match

dbSNP specific to C57BL6J

dbSNP specific to C57BL6J 0 Hi is it possible to obtain a dbSNP file that is specific to a strain e.g. C57BL6J? I tried looking for it in the ncbi, MGI and Jackson website. But I don’t seem to find strain specific vcf. Thanks c57bl6j dbsnp • 39 views •…

Continue Reading dbSNP specific to C57BL6J

Failed to instantiate plugin dbNSFP in VEP

Failed to instantiate plugin dbNSFP in VEP 0 Hi Team, My VEP (version 105, installed by perl INSTALL.pl) works well. But I face some problems to use dbNSFP plugin (also installed by perl INSTALL.pl) with VEP tool. My dbNSFP version 4.2a was installed by the following code without any warning…

Continue Reading Failed to instantiate plugin dbNSFP in VEP

help with CrossMap

help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…

Continue Reading help with CrossMap

Variant physical position must be monotonically increasing

ERROR: Variant physical position must be monotonically increasing 0 I want to calculate XPEHH for each SNP position. When I run the following command selscan –xpehh –vcf B10_beagle.vcf –vcf-ref D6_beagle.vcf –map MAP.map –threads 8 –out B10vsD6 I get this error ERROR: Variant physical position must be monotonically increasing Ch2:66 66…

Continue Reading Variant physical position must be monotonically increasing

sniffles failed detect SV on minimap2 aligments

When I use ngmlr the sniffles worked. The coverage it more than 90% The code I sent on the github is exactly what it generated, I don’t think there any error Xu Zhang PhD Postdoctoral Associate, Department of Microbiology and Immunology Weill Cornell Medicine 1300 York Avenue, Box 62 New…

Continue Reading sniffles failed detect SV on minimap2 aligments

Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD,  Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…

Continue Reading Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS

rust-bio-tools 0.35.0 – Docs.rs

rust-bio-tools-0.35.0 is not a library. A set of ultra fast and robust command line utilities for bioinformatics tasks based on Rust-Bio. Rust-Bio-Tools provides a command rbt, which currently supports the following operations: a linear time implementation for fuzzy matching of two vcf/bcf files (rbt vcf-match) a vcf/bcf to txt converter,…

Continue Reading rust-bio-tools 0.35.0 – Docs.rs

bcftools merge of over 9000+ vcf files

Hi all, I have around 9000+ vcf files that I’m trying to merge using bcftools merge. They are all located in their own folder so essentially I have a folder containing 9000+ separate folders, each containing one vcf.gz file. I have tried out the following code via this tutorial bcftools…

Continue Reading bcftools merge of over 9000+ vcf files

Sort vcf file based on Satsuma synteny output

Sort vcf file based on Satsuma synteny output 1 Hi all I have been using satsuma synteny to assign scaffolds (from the genome of my study species) to the chromosomes of a closely related species. I now have a tab delimited file that lists these scaffolds in the order that…

Continue Reading Sort vcf file based on Satsuma synteny output

Dragen-gatk for trio

Dragen-gatk for trio 0 Hi everyone, the Dragen gatk pipeline works great for single sample. however I would like to know if any have used this pipeline for a trio? if so how did you do it? it is recommended to do a hard filtering based on QUAL but how…

Continue Reading Dragen-gatk for trio

Reference panel data to be used for GCTA-COJO

Reference panel data to be used for GCTA-COJO 0 I performed a genome-wide meta-analysis based on summary statistics from the four cohorts to identify significant loci. Next, I would like to perform a conditional analysis using GCTA-COJO to search for SNPs independent of significant lead SNPs. I know that GCTA…

Continue Reading Reference panel data to be used for GCTA-COJO

Blast command line pipeline not working

Blast command line pipeline not working 0 Hello, I am running now a local blast pipeline using MacOs. The goal here is to take interval of the 5 best hits and then extract the SNP variants from multiple vcf.gz files. But I am facing an error which I cannot solve….

Continue Reading Blast command line pipeline not working

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

Genomic data The foundational resource for this study was a dataset of 40,107,925 nuclear SNPs sequenced from a worldwide sample of 532 DBM individuals collected in 114 different sites based on our previous project15. DNA was extracted from each of the 532 individuals using DNeasy Blood and Tissue Kit (Qiagen,…

Continue Reading Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest

how to add reference alleles to VCF?

how to add reference alleles to VCF? 1 I’m converting gVCFs to VCF, but the reference alleles are missing. An example below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 180525_FD02929177 1 97547947 . T . . . DP=31 GT:DP:RGQ 0/0:31:81 1 97915614 . C . . . DP=40…

Continue Reading how to add reference alleles to VCF?

gatk VariantRecalibrator positional argument error

I’m trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error “Illegal argument value: Positional arguments were provided”. But I don’t know what this means, or how to correct it! Here’s my call: gatk VariantRecalibrator -R “/Volumes/Seagate Expansion Drive/refs/hg38/gatk download/Homo_sapiens_assembly38.fasta” -V “$OUT”/results/variants/”$SN”.norm.vcf.gz -AS –resource hapmap,known=false,training=true,truth=true,prior=15.0: “/Volumes/Seagate…

Continue Reading gatk VariantRecalibrator positional argument error

Senior Bioinformatics Scientist II/ Staff Bioinformatics Scientist

Inscripta was founded in 2015 and recently launched the world’s first benchtop Digital Genome Engineering platform. The company is growing aggressively, investing in its leadership, team, and technology with a recent $150mm financing round led by Fidelity and TRowe price. The company’s advanced CRISPR-based platform, consisting of an instrument, reagents,…

Continue Reading Senior Bioinformatics Scientist II/ Staff Bioinformatics Scientist

Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

I am using the latest GATK 4.1.2.0 to do variant calling on insect samples with a reference genome of a closely related species. The heterozygosity is approximately 0.02. I followed the standard pipeline of “HaplotypeCaller –> GenomicDBImport –> GenotypeGVCFs” to get my unfiltered VCFs, however, although my variant sites have…

Continue Reading Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)

No quality in non-variant sites GATK

No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…

Continue Reading No quality in non-variant sites GATK

How to call LOH with FreeC

How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…

Continue Reading How to call LOH with FreeC

How can I calculate LD ?

How can I calculate LD ? 0 I have sequencing data in .vcf format of expanded whole exome sequencing of 2 trios (father, mother & index) one family is affected and another is not affected, I want to find out whether any linkage block is present in any one of…

Continue Reading How can I calculate LD ?

What is the single nucleotide polymorphism database ( dbsnp )?

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…

Continue Reading What is the single nucleotide polymorphism database ( dbsnp )?

How to merge vcf files

How to merge vcf files 3 Hi, I have 90 VCF files which I am looking to merge into one VCF file. I am trying to use VCFtools to merge these files. For that I am following the below process but while using vcf-merge command is not able to merge…

Continue Reading How to merge vcf files

vcftools- extract allele frequencies from pooled samples on a sample by sample basis

vcftools- extract allele frequencies from pooled samples on a sample by sample basis 0 I am looking to extract minor allele frequencies using vcftools for pooled samples on a sample by sample basis, as allele frequency output by vcftools is only on a site basis. Further, the reference alleles should…

Continue Reading vcftools- extract allele frequencies from pooled samples on a sample by sample basis

Filtering of rare variants

Filtering of rare variants 1 Hello I have exome datasets from 6 samples, in which four are affected and two are non-affected. I did joint call genotyping for all the six samples and annotated the vcf file. From this annotated vcf file, I have to look for rare variants shared…

Continue Reading Filtering of rare variants

SnpEff does not create htmlStats

SnpEff does not create htmlStats 0 SnpEff does not create htmlStats with the below command: $ snpEff eff -Xmx20G LAB330 LabUsa16cWild01-20_L-Q.vcf | head ##fileformat=VCFv4.0 ##filedate=20210414 ##source=SGSautoSNP ##reference=NbLab330.genome.softmasked.fasta ##phasing=allhomozygote ##INFO=<ID=DP,Number=1,Type=Integer,Description=”Read depth over all samples”> ##INFO=<ID=PL,Number=0,Type=String,Description=”Panel”> ##SnpEffVersion=”5.0e (build 2021-03-09 06:01), by Pablo Cingolani” ##SnpEffCmd=”SnpEff LAB330 LabUsa16cWild01-20_L-Q.vcf ” ##INFO=<ID=ANN,Number=.,Type=String,Description=”Functional annotations: ‘Allele | Annotation…

Continue Reading SnpEff does not create htmlStats

How to call variant by –max-depth for RNAseq

Hi everyone! I have a query regarding variant calling from a high coverage site on the basis of the maximum likelihood variant. I have RNA-seq data mapped bam file. I called variant using the below command. “bcftools mpileup –max-depth 10000 -Oz -f ref.fa sample.bam | bcftools call -mv -Oz -o…

Continue Reading How to call variant by –max-depth for RNAseq

Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

Extreme environments present profound physiological stress. The adaptation of closely related species to these environments is likely to invoke congruent genetic responses resulting in similar physiological and/or morphological adaptations, a process termed “parallel evolution” (1). Existing evidence shows that parallel evolution is more common at the phenotypic level than at…

Continue Reading Parallel genomic responses to historical climate change and high elevation in East Asian songbirds

How to extract homologous sequence data from multiple .vcf.gz files?

How to extract homologous sequence data from multiple .vcf.gz files? 0 Hello, I have short read data from multiple samples stored as scaffolds.vcf.gz files. I have some gene sequence of interest. I want to find the closest homologous sequence of the respective genes from all the other samples. At first,…

Continue Reading How to extract homologous sequence data from multiple .vcf.gz files?

VCF samtools

VCF samtools 0 Hello, I am having trouble when doing variant calling with samtools. I am getting only the header an no variants. If I would instead use Freebayes, I do get a lot of variables, and with Gatk, I get just a few. What can the problem be? Do…

Continue Reading VCF samtools

One-hot encoding for PLINK or VCF

One-hot encoding for PLINK or VCF 0 I want to write an autoencoder for SNP data. Is there an established way to one-hot-encode binary PLINK or VCF input? I believe that can be done by manipulating PLINK’s bed file but am afraid to do something wrong. By one-hot encoding I…

Continue Reading One-hot encoding for PLINK or VCF

How to handle VCFs from the same sample but using different aligners and variant callers?

Hi, I’m using whole-exome sequencing (WES) for somatic variant calling. During the process, I tried to follow the approach described here: pubmed.ncbi.nlm.nih.gov/28420412/ Basically my workflow is as follows: FASTQ preprocessing: Using 2 aligners (BWA-MEM, Bowtie2) BAM calibration Variant calling: Using 3 software (Mutect2, Strelka2, Lancet) Variant filtering: I keep just…

Continue Reading How to handle VCFs from the same sample but using different aligners and variant callers?

Somatic Variant Calling

Somatic Variant Calling 2 Hi, I need to call somatic variants from a BAM file of cancer panel. Can anyone please suggest any suitable tool for calling the variants and generate a VCF file. Thank You BAM NGS Variants Cancer • 53 views “Suitable” is very context-dependent, are you working…

Continue Reading Somatic Variant Calling

add gene names to ‘isec’ output files of bcftools’

add gene names to ‘isec’ output files of bcftools’ 1 I had two vcf files and I used isec from bcftools software to find typical and common mutations between samples. The output of isec function were four vcf.gz file showing like below: isec_output/0000.vcf.gz would be variants unique to 1.vcf.gz isec_output/0001.vcf.gz…

Continue Reading add gene names to ‘isec’ output files of bcftools’

Detecting chromosone notation in vcf files

Detecting chromosone notation in vcf files 1 Hi, I recently ran into an issue where a pipeline I wrote did not work on a new vcf file. As it turns out the problem was simply that the vcf file used “chr7” instead of just 7 for chromosome notation which confused…

Continue Reading Detecting chromosone notation in vcf files

Making consensus sequence for each haplotype

Making consensus sequence for each haplotype 0 I’m dealing with paired end amplicon sequencing data. I’ve produced a GVCF file with haplotype calls using: gatk HaplotypeCaller -R $REF -I “$BAM” -O “$OUT”/results/variants/${SN}_HaplotypeCallerPGT.vcf -ERC GVCF The vcf file it produces contains the PGT flag, and variants are called in the format…

Continue Reading Making consensus sequence for each haplotype

state and usuge of compressed file standards better than BAM and FASTQ

Forum:2021: state and usuge of compressed file standards better than BAM and FASTQ 3 Extra compressed formats for raw/aligned reads and variant tables have been around for some time but I think saw slow adoption. Our current disk space usage is making us have another look at switching to file…

Continue Reading state and usuge of compressed file standards better than BAM and FASTQ

Error: PLINK does not support more than 2^31

Error: PLINK does not support more than 2^31 – 3 variants. 0 Hi there, I was converting my vcf file into bfiles in plink, and I got an error ‘Error: PLINK does not support more than 2^31 – 3 variants’. We recommend other software, such as PLINK/SEQ, for very deep…

Continue Reading Error: PLINK does not support more than 2^31

Split a large VCF into user defined regions

Split a large VCF into user defined regions 3 Hello everyone! Anyone has an idea on how to split huge vcf files into user defined regions smaller vcf files? I have a bed file with my regions of interest (around 300) and I would like to extract into 300 different…

Continue Reading Split a large VCF into user defined regions

how to draw more than one regression line for a plot of dissimilarity matrices

how to draw more than one regression line for a plot of dissimilarity matrices 0 I have created a genetic distance matrix from VCF SNP data and a matrix of geographic distance from x y coordinates, both with function “dist” of R, then I plot them adding two-dimensional Kernel density…

Continue Reading how to draw more than one regression line for a plot of dissimilarity matrices