Tag: VCF
Bcftools equivalent of vcftools conversion to ped & map
Bcftools equivalent of vcftools conversion to ped & map 1 I am converting a VCF to ped & map thus in vcftools vcftools –gzvcf ZZZZZTYT.vcf.gz –plink –out ZZZZZTYT which works fine. However, I have been searching and searching, can bcftools do the same with a bcf? bcftools • 103 views…
Z697 – YFull YTree Info
R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697 Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…
difficulty filtering vcf file with vcftools
difficulty filtering vcf file with vcftools 1 I had a large VCF file named “common_known_variants.vcf ” which contains all known human variants downloaded from ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-common_all.vcf.gz -O common_known_variants.vcf.gz I’m trying to extract the known variants from only chromosomes 1,2,3,9,22, and X and write them in a new vcf file with the…
Error in BAFFromGVCFs – GenotypeGVCFs
Bug Report Affected module(s) or script(s) Module00c/BAFFromGVCFs/GenotypeGVCFs Affected version(s) Description I’m running GATKSVPipelineBatch and I got the following error in the GenotypeGVCFs task: A USER ERROR has occurred: Input /tmp/scratch/bean-resources/broad-references/v0/Homo_sapiens_assembly38.dbsnp138.vcf must support random access to enable queries by interval. If it’s a file, please index it using the bundled tool…
Latest dbSNP VCF
This is the directory you’re looking for: ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/ curl -s ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.39.gz | zcat | head ##fileformat=VCFv4.2 ##fileDate=20210513 ##source=dbSNP ##dbSNP_BUILD_ID=155 ##reference=GRCh38.p13 ##phasing=partial ##INFO=<ID=RS,Number=1,Type=Integer,Description=”dbSNP ID (i.e. rs number)”> ##INFO=<ID=GENEINFO,Number=1,Type=String,Description=”Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a…
Missing data per site
Hi, I want to calculate statistics of missing data per each site in my vcf file. Using vcftools –missing-site gives wrong stats for several sites. Is there is any other way to calculate it? Thank you! I have 36 samples and here is an example of the vcftools –missing-site output…
bedtools interset doesn’t return a VCF file?
bedtools interset doesn’t return a VCF file? 1 I am filtering a VCF file with a bed file using Bedtools. I have carried out this successfully with bedtools intersect -wb -a myVCF.vcf -b myBEDfile.bed > output.txt However, what I want is to get a VCF file with the metadata and…
Hard filtering on GATK HaplotypeCaller giving multiple warnings
I’m using this pipeline for deriving variants from RNA sequencing data: github.com/modupeore/VAP which uses specific versions of various tools, including HaplotypeCaller from GATK (v3.8-0-ge9d806836). The final step is a set of hard filters on the called variants (applied using VariantFilter), but looking at the log files, there are a lot…
How Can I Merge VCF File ?
The multiple secure and trustworthy solution to merge several VCF files into a single VCF is by establishing an efficacious VCF Merge Tool. In this respect, one of my colleagues has just used the VCF Merge Tool which permitted him to merge multiple VCF files by maintaining high data integrity….
snp – Reference variant detected as altered one in bam file
I received (from manufacturer) several .bam files and I used four callers (samtools, freebayes, haplotypecaller, deepvariant) to find some sequence variants. In obtained .vcf files, I took a closer look to some calls. I found interesting, homozygous one rs477033 (C/G Ref/Alt) with flag ‘COMMON=0’ and very low MAF. I also…
Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing
** Bioinformatics Scientist for Whole Genome and Whole Exome Sequencing ** The NeuroGenomics and Informatics (NGI) Center lead by Dr. Carlos Cruchaga at Washington University School of Medicine is recruiting a Bioinformatics Scientist to work on Whole Genome and Whole Exome Sequencing. We are seeking an experienced, self-motivated, self-driven scientist…
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
Sequencing data We used publicly available sequencing data from the GIAB consortium45, 1000 Genomes Project high-coverage data46 and Human Genome Structural Variation Consortium (HGSVC)4. All datasets include only samples consented for public dissemination of the full genomes. Statistics and reproducibility For generating the assemblies, we used all 14 samples for…
how to extract unique variants from GVCF
how to extract unique variants from GVCF 1 [note: cross-posted on GATK forum – still awaiting a response] I have a GVCF (generated using GATK’s HaplotypeCaller w/ -ERC GVCF parameter) of 36 related samples and would like to determine the (potentially de novo) variants that are unique to each sample….
wrong number of fields ?
Error occurence after merging files with bcftools: wrong number of fields ? 1 I have multiple vcf of CASES and CONTROLS variations annotated by VEP, SNPEff, SnpSift. first pair vcf -> only variations| CASES and CONTROLS second pair vcf -> variations + SnpEff | CASES and CONTROLS third pair vcf->…
L1193 – YFull YTree Info
I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193 FGC87558 Y72031 Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…
Y18411 – YFull YTree Info
J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…
How to Merge VCF files in Windows 10
Many organizations working on VCF have to face collecting and combining emails. Hiring technicians increase the data management cost. Along with the disadvantage, downtime is a big issue. It hampers work. Technicians often try to fix the problem manually. It is a time-consuming process, so trusting a vcf merge application is…
Variant quality and filters on GATK HaplotypeCaller generated VCFs
Variant quality and filters on GATK HaplotypeCaller generated VCFs 0 Hi, I am analysing human WGS data to diagnose rare inherited diseases. I followed the GATK Best Practices Guidelines for “Germline short variants discovery” for single-sample data to generate a VCF using HaplotypeCaller. The guidelines then point to the use…
Merge only bim files with plink
Merge only bim files with plink 0 Hello For the same dataset they provide a single BED and FAM files for all the chromosomes. However, the BIM files are split in chromosomes. I would like to generate the VCF file with the genotyping calls of all chromosomes but I need…
BioInformatics Product Manager at Helix (remote)
You + Helix Helix is a place where innovators and doers gather in order to drive significant progress in population genomics. We have come together to work at the intersection of clinical care, research, and genomics. If you’re excited by the idea of making a meaningful impact and joining a…
rna seq – RNAseq SNP discovery: deciding upon filters and dealing with allele expression bias
I am working with non-model plant RNA samples which we have been deep sequenced and analysed using STAR aligner under default parameters. Aim We would like to conduct SNP discovery of these samples. Objective Our ultimate goal with this genotypic data is to search for variants (both SNPs and indels)…
Parallel reduction in flowering time from de novo mutations enable evolutionary rescue in colonizing lineages
Díaz, S. et al. Summary for Policymakers of the Global Assessment Report on Biodiversity and Ecosystem Services of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES, 2019). Fisher, R. A. The correlation between relatives on the supposition of Mendelian inheritance. Earth Environ. Sci. Trans. R. Soc. Edinb. 52,…
using ANNOVAR annotation clinvar database out wrong position
using ANNOVAR annotation clinvar database out wrong position 0 Hello Biostars, I was trying to annotate the VCF using ANNOVAR,but I get a wrong out ,it seems my clinvar database is not sutibale bcftools_callCommand=call -m -v -o /project/plantform/20220316PCR/03.amplify/L2107973CFD7G5kxT1/L2107973CFD7G5kxT1.variation.vcf /project/plantform/20220316PCR/03.amplify/L2107973CFD7G5kxT1/L2107973CFD7G5kxT1.mpileup.vcf clinvar ANNOVAR • 34 views Read more here: Source link
M8498 – YFull YTree Info
B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…
FGC15109 – YFull YTree Info
I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109 Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…
bedtools -u not giving unique files
bedtools -u not giving unique files 1 The following are the steps Im following: First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38): tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf’ output({sample_name}_out.vcf’) chr2 113982416 rs56177103 TATAAAATAAAATAAA…
bam – Detect mutation context in a read of a sam file
That kind of custom fiddling with reads and variants is very cumbersome, non-standard and also error-prone. Do a standard variant callign pipeline and then filter for the mutations that you want. Then extract the variant position (so the coordinates) and get the variant context from the reference genome. Using individual…
vcf – Why does GATK produce both 0/1 and 1/0 genotypes in the same file? Are the two not equivalent?
I have always thought that 1/0 and 0/1 in VCF genotype fields are equivalent. And yet, GATK uses both. For example, these are two variants called in the same sample and the same run of GATK 4.1.4.0: chr7 117120317 . ATTCATTGTTTTGAAAGAAAGATGGAAGAATGAACTGAAG A 748.97 . AC=1;AF=0.5;AN=2;DP=64;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=11.89;SOR=7.223 GT:AD:DP:GQ:PL:SB 1/0:0,36:63:99:2294,1042,933:0,0,0,36 chr7 117120306 ….
split gtex genotype data by chromosomes.
Hello, I used and edited the command line to use –vcf to import vcf file. I used these commands: for chr in $(seq 1 22); do plink –vcf /dbGAP/GTEx_Analysis_2017-06-05_v8_WholeExomeSeq_979Indiv_VEP_annot.vcf.gz –chr $chr –recode –out…
Understanding the number of intersection in bedtools jaccard
Understanding the number of intersection in bedtools jaccard 1 Hello, I am using bedtools jaccard to compare two vcf files, as: bedtools jaccard -a ancestors.calls.norm.snp.vcf.gz -b GC078310.calls.norm.snp.vcf.gz intersection union-intersection jaccard n_intersections 1606899 1806667 0.889427 1536700 What I do not get is why n_intersections is equal to 1536700. Especially, the difference…
FGC19851 – YFull YTree Info
R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851 Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…
Re: Quick Way to Merge Multiple VCF Files into One
vCard files(.vcf) are essential for professional, personal, and even home purposes. Users need to merge multiple vCard files when they do not manage multiple vCard files. Another reason for users to combine multiple VCF files is security concerns. Here are some of the reasons: Organize your address book contacts…
YP4024 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…
HRJOB7442 Bioinformatics Scientist 2 (Various Locations) in Nether Alderley, Macclesfield (SK10) | Almac Group (Uk) Ltd
Bioinformatics Scientist 2 Hours: 37.5 hours per week Salary: Competitive Ref No: HRJOB7442 Business Unit: Diagnostic Services Location: Craigavon or Manchester Open To: Internal and External Applicants The Company Almac Diagnostic Services is a leading stratified medicine business, specialising in biomarker-driven clinical trials. We are incredibly proud to be involved…
Genomic variation from an extinct species is retained in the extant radiation following speciation reversal
Vamosi, J. C., Magallon, S., Mayrose, I., Otto, S. P. & Sauquet, H. Macroevolutionary patterns of flowering plant speciation and extinction. Annu. Rev. Plant Biol. 69, 685–706 (2018). CAS PubMed Google Scholar Rhymer, J. M. & Simberloff, D. Extinction by hybridization and introgression. Annu. Rev. Ecol. Syst. 27, 83–109 (1996)….
How to calculate r2 for IMPUTE2
How to calculate r2 for IMPUTE2 0 Hi all, I was finally able with all the help to remove some SNPs from the vcf file and then run it through IMPUTE2. This means I have the original vcf and the imputed vcf, how do run r2 analysis? Is there a…
Y570 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status AF2 —— Q-Y570 Q-Y570*, Q-F746* Hg19 .BAM Ancient 1X, 1.3 Mbp, 94 bp YF093124 —— Q-M120* —— Hg38 .BAM Nebula Genomics 57X, 23.6 Mbp, 150 bp Kolyma1 Russia (Sakha, Respublika [Yakutiya]) Q-Y222276* —— Hg19 .BAM Ancient 7X, 13.4…
How to apply vcftools –diff and extract only the different variants
How to apply vcftools –diff and extract only the different variants 0 Hello, I am trying to apply vcftools –diff in order to extract the different variants between two VCF files. vcftools –vcf marked_IO002_tumor-pe.vcf –diff marked_IO002_normal-pe.vcf –diff-site –out t_v_n I am getting this as result : VCFtools – 0.1.16 (C)…
PF6747 – YFull YTree Info
E-PF6747 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF010216 Azerbaijan (Qəbələ) E-PF6747* —— Hg19 .BAM FTDNA (Y500) 50X, 13.7 Mbp, 165 bp YF064736 Egypt (Al Minūfīyah) E-FT97857* —— Hg38 .BAM FTDNA (Y700) 35X, 18.5 Mbp, 151 bp YF093064 Yemen (Tā’izz) E-Y280593…
PostDoc Plant Bioinformatics job with SKOLKOVO INSTITUTE OF SCIENCE AND TECHNOLOGY
<p><strong>Want to participate to the outstanding new area of agro-genomics ? To put into the practice how the genetic diversity and genome-assisted breeding in crops contribute to provide healthy and high quality food in a sustainable way to humankind? Strong in bioinformatics and interested in working with very large datasets…
java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread
I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…
Z2039 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF003382 Finland (Länsi-Suomen lääni) I-Z2040* —— Hg19 .BAM FTDNA (Y500) 47X, 13.3 Mbp, 165 bp YF067917 Ireland I-FGC69701* —— Hg19 .BAM Dante Labs 9X, 22.9 Mbp, 151 bp YF078735 Belarus (Vicebskaja voblasc’) / Polish I-FGC69702 —— Hg38 .VCF…
BY7447 – YFull YTree Info
E-BY7447 – YFull YTree Info SNPs currently defining E-BY7447 BY7447 Sample ID Country / Language Info Ref File Testing company Statistics Status YF075635 Yemen (Al Bayḑā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 39X, 18.2 Mbp, 151 bp YF067501 Yemen (Şan’ā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 44X, 18.8 Mbp,…
Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser
I’ve annotated some variants using VEP, and was looking at the minor allele frequencies. Some of the variants had very different MAFs in the annotation than I expected (I expected MAF < 1%, whereas some annotated MAFs were >50%). I looked up the same variants on the gnomAD v3 browser,…
Bioconductor on Microsoft Azure – Microsoft Tech Community
Co-authored by: Nitesh Turaga – Scientist at Dana Farber/Harvard, Bioconductor Core Team Erdal Cosgun – Sr. Data Scientist at Microsoft Biomedical Platforms and Genomics team Vincent Carey – Professor at Harvard Medical School, Bioconductor Core Team Introduction The Bioconductor project promotes the statistical analysis and comprehension of current and emerging…
DF109 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF016926 Ireland R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 27X, 12.7 Mbp, 165 bp YF016394 United States (Ohio) R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 34X, 11.9 Mbp, 151 bp YF011566 Ireland (Mayo) R-DF109 R-DF109*, R-A18726*, R-FGC23742* Hg38…
Errors when compiling older version **samtools**
Errors when compiling older version **samtools** 0 I have downloaded bcf file from this website ricevarmap. In order to “view” this old bcf format and convert it to a newer one, it’s said that I have to install samtools-0.1.17, which has a older version bcftools in it. When I make…
GATK HaplotypeCaller with interval list
I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…
Split multiallelic SNPs to biallelic from vcf
Dear all, I have a particular vcf file like this, chrX 29 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2 I tried various tools to split this, but I get the following results, so the FORMAT and INFO lines are identical. chrX 29 . G A . PASS AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP…
ZP77 – YFull YTree Info
R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562 Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…
python – How can I fix the dash bio error: devtools cannot load source map dashbio@1.0.1 bundle.js.map?
I am implementing a website in Python with Django framework and using django-plotly-dash to display data. I am trying to use dash_bio’s IGV feature to display some chromosome data, but when I attempt to call the functionality, I receive the following errors and the callback that returns ‘dashbio.igv’ is unable…
bcftools merged vcf file assigns all variants to one sample
bcftools merged vcf file assigns all variants to one sample 0 I’ve made one vcf file for each of three samples. I then combined them using bcftools, like so: # Make a list of vcf files to merge cat “${OUT}/results/variants/vcf_list” /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/3a7a-10.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/MF3.vcf.gz /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/R507H-FB_S355_L001.vcf.gz Then merge the list: bcftools merge -l…
variant – Where should you put you cache for ensembl-vep using conda
I’ve installed vep in conda like so: conda install ensembl-vep=105.0-0 And then I installed the human cache like so: vep_install -a cf -s homo_sapiens -y GRCh38 -c /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep –CONVERT But when I try and run vep I get an error: vep –dir_cache /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep -i /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated.vcf -o /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated_vep.vcf Am I doing…
variant – Error running gatk HaplotypeCaller with allele specific annotations
I’ve got HaplotypeCaller working nicely in standard mode, like so: # Run haplotypcaller gatk –java-options “-Xmx4g” HaplotypeCaller –intervals “$INTERVALS” -R “$REF” -I “$OUT”/results/alignment/${SN}_sorted_marked_recalibrated.bam -O “$OUT”/results/variants/${SN}_g.vcf.gz -ERC GVCF But when I try in allele-specific mode, I get the following error. All I’ve done is add the -G annotations at the end,…
linux – How to fix Perl from anaconda not installing bioperl? Bailing out the installation for BioPerl
vep -i examples/homo_sapiens_GRCh38.vcf –database Can’t locate Bio/PrimarySeqI.pm in @INC (you may need to install the Bio::PrimarySeqI module) (@INC contains: /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/modules /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0 /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2 /home/youssef/anaconda3/envs/ngs1/lib/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/5.26.2 .) at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. Compilation failed in require at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm line 84. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm…
Variant calls of published already assembled genomes
Variant calls of published already assembled genomes 0 I have a set of short read sequencing for the 172 KB Epstein-barr virus genome. We successfully called our variants using GATK to a reference genome. A publication linked below from a different population compared variants (also from short read sequencing) to…
why my VCF file generated with manta is missing genotype information
Hi, everybody, I am pretty new to coding and bioinformatics. I am using Manta as a tool to infer somatic structural variants (SVs) from a paired tumor/normal sample call. However, my somaticSV.vcf.gz file does not contain information about the genotype nor the genotype quality (there is a dot instead of…
bedtools intersect error: Invalid record in file
Hello to all I am trying to run bedtools intersect with vcf file and a bed file (my goal is to add the depth data to my VCF) I get an error running this command: bedtools intersect -a depth.bed -b fish.vcf -wa -wb > $out The error: “Error: Invalid record…
What file type does “PLINK –block” accept as input?
What file type does “PLINK –block” accept as input? 0 Hi, I have set of SNPs (distributed over all the chromosomes) and I am trying to do some haplotype block estimation to identify whether some of them are part of the same haplotype block, etc. It seems like “PLINK –blocks”…
VEP issue: ERROR: Cache assembly version (GRCh37) and database or selected assembly version (GRCh38) do not match
Describe the issue VEP give errors even my query and reference has same assembly version Command :$: ./vep -i examples/homo_sapiens_GRCh37.vcf –cache –refseq cache reference details while running install.pl ? 458 NB: Remember to use –refseq when running the VEP with this cache! downloading ftp.ensembl.org/pub/release-104/variation/indexed_vep_cache/homo_sapiens_refseq_vep_104_GRCh37.tar.gz unpacking homo_sapiens_refseq_vep_104_GRCh37.tar.gz converting cache, this may…
dbSNP specific to C57BL6J
dbSNP specific to C57BL6J 0 Hi is it possible to obtain a dbSNP file that is specific to a strain e.g. C57BL6J? I tried looking for it in the ncbi, MGI and Jackson website. But I don’t seem to find strain specific vcf. Thanks c57bl6j dbsnp • 39 views •…
Failed to instantiate plugin dbNSFP in VEP
Failed to instantiate plugin dbNSFP in VEP 0 Hi Team, My VEP (version 105, installed by perl INSTALL.pl) works well. But I face some problems to use dbNSFP plugin (also installed by perl INSTALL.pl) with VEP tool. My dbNSFP version 4.2a was installed by the following code without any warning…
help with CrossMap
help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…
Variant physical position must be monotonically increasing
ERROR: Variant physical position must be monotonically increasing 0 I want to calculate XPEHH for each SNP position. When I run the following command selscan –xpehh –vcf B10_beagle.vcf –vcf-ref D6_beagle.vcf –map MAP.map –threads 8 –out B10vsD6 I get this error ERROR: Variant physical position must be monotonically increasing Ch2:66 66…
sniffles failed detect SV on minimap2 aligments
When I use ngmlr the sniffles worked. The coverage it more than 90% The code I sent on the github is exactly what it generated, I don’t think there any error Xu Zhang PhD Postdoctoral Associate, Department of Microbiology and Immunology Weill Cornell Medicine 1300 York Avenue, Box 62 New…
Benchmarking the NVIDIA Clara Parabricks germline pipeline on AWS
This blog post was contributed by Ankit Sethia, PhD, and Timothy Harkins, PhD, at NVIDIA Parabricks, and Olivia Choudhury, PhD, Sujaya Srinivasan, and Aniket Deshpande at AWS. This blog provides an overview of NVIDIA’s Clara Parabricks along with a guide on how to use Parabricks within the AWS Marketplace. It…
rust-bio-tools 0.35.0 – Docs.rs
rust-bio-tools-0.35.0 is not a library. A set of ultra fast and robust command line utilities for bioinformatics tasks based on Rust-Bio. Rust-Bio-Tools provides a command rbt, which currently supports the following operations: a linear time implementation for fuzzy matching of two vcf/bcf files (rbt vcf-match) a vcf/bcf to txt converter,…
bcftools merge of over 9000+ vcf files
Hi all, I have around 9000+ vcf files that I’m trying to merge using bcftools merge. They are all located in their own folder so essentially I have a folder containing 9000+ separate folders, each containing one vcf.gz file. I have tried out the following code via this tutorial bcftools…
Sort vcf file based on Satsuma synteny output
Sort vcf file based on Satsuma synteny output 1 Hi all I have been using satsuma synteny to assign scaffolds (from the genome of my study species) to the chromosomes of a closely related species. I now have a tab delimited file that lists these scaffolds in the order that…
Dragen-gatk for trio
Dragen-gatk for trio 0 Hi everyone, the Dragen gatk pipeline works great for single sample. however I would like to know if any have used this pipeline for a trio? if so how did you do it? it is recommended to do a hard filtering based on QUAL but how…
Reference panel data to be used for GCTA-COJO
Reference panel data to be used for GCTA-COJO 0 I performed a genome-wide meta-analysis based on summary statistics from the four cohorts to identify significant loci. Next, I would like to perform a conditional analysis using GCTA-COJO to search for SNPs independent of significant lead SNPs. I know that GCTA…
Blast command line pipeline not working
Blast command line pipeline not working 0 Hello, I am running now a local blast pipeline using MacOs. The goal here is to take interval of the 5 best hits and then extract the SNP variants from multiple vcf.gz files. But I am facing an error which I cannot solve….
Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample
I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…
Large-scale genome-wide study reveals climate adaptive variability in a cosmopolitan pest
Genomic data The foundational resource for this study was a dataset of 40,107,925 nuclear SNPs sequenced from a worldwide sample of 532 DBM individuals collected in 114 different sites based on our previous project15. DNA was extracted from each of the 532 individuals using DNeasy Blood and Tissue Kit (Qiagen,…
how to add reference alleles to VCF?
how to add reference alleles to VCF? 1 I’m converting gVCFs to VCF, but the reference alleles are missing. An example below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 180525_FD02929177 1 97547947 . T . . . DP=31 GT:DP:RGQ 0/0:31:81 1 97915614 . C . . . DP=40…
gatk VariantRecalibrator positional argument error
I’m trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error “Illegal argument value: Positional arguments were provided”. But I don’t know what this means, or how to correct it! Here’s my call: gatk VariantRecalibrator -R “/Volumes/Seagate Expansion Drive/refs/hg38/gatk download/Homo_sapiens_assembly38.fasta” -V “$OUT”/results/variants/”$SN”.norm.vcf.gz -AS –resource hapmap,known=false,training=true,truth=true,prior=15.0: “/Volumes/Seagate…
Senior Bioinformatics Scientist II/ Staff Bioinformatics Scientist
Inscripta was founded in 2015 and recently launched the world’s first benchtop Digital Genome Engineering platform. The company is growing aggressively, investing in its leadership, team, and technology with a recent $150mm financing round led by Fidelity and TRowe price. The company’s advanced CRISPR-based platform, consisting of an instrument, reagents,…
Why invariant blocks in GATK consistently have very low quality scores (but not variant sites)
I am using the latest GATK 4.1.2.0 to do variant calling on insect samples with a reference genome of a closely related species. The heterozygosity is approximately 0.02. I followed the standard pipeline of “HaplotypeCaller –> GenomicDBImport –> GenotypeGVCFs” to get my unfiltered VCFs, however, although my variant sites have…
No quality in non-variant sites GATK
No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…
How to call LOH with FreeC
How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…
How can I calculate LD ?
How can I calculate LD ? 0 I have sequencing data in .vcf format of expanded whole exome sequencing of 2 trios (father, mother & index) one family is affected and another is not affected, I want to find out whether any linkage block is present in any one of…
What is the single nucleotide polymorphism database ( dbsnp )?
The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…
How to merge vcf files
How to merge vcf files 3 Hi, I have 90 VCF files which I am looking to merge into one VCF file. I am trying to use VCFtools to merge these files. For that I am following the below process but while using vcf-merge command is not able to merge…
vcftools- extract allele frequencies from pooled samples on a sample by sample basis
vcftools- extract allele frequencies from pooled samples on a sample by sample basis 0 I am looking to extract minor allele frequencies using vcftools for pooled samples on a sample by sample basis, as allele frequency output by vcftools is only on a site basis. Further, the reference alleles should…
Filtering of rare variants
Filtering of rare variants 1 Hello I have exome datasets from 6 samples, in which four are affected and two are non-affected. I did joint call genotyping for all the six samples and annotated the vcf file. From this annotated vcf file, I have to look for rare variants shared…
SnpEff does not create htmlStats
SnpEff does not create htmlStats 0 SnpEff does not create htmlStats with the below command: $ snpEff eff -Xmx20G LAB330 LabUsa16cWild01-20_L-Q.vcf | head ##fileformat=VCFv4.0 ##filedate=20210414 ##source=SGSautoSNP ##reference=NbLab330.genome.softmasked.fasta ##phasing=allhomozygote ##INFO=<ID=DP,Number=1,Type=Integer,Description=”Read depth over all samples”> ##INFO=<ID=PL,Number=0,Type=String,Description=”Panel”> ##SnpEffVersion=”5.0e (build 2021-03-09 06:01), by Pablo Cingolani” ##SnpEffCmd=”SnpEff LAB330 LabUsa16cWild01-20_L-Q.vcf ” ##INFO=<ID=ANN,Number=.,Type=String,Description=”Functional annotations: ‘Allele | Annotation…
How to call variant by –max-depth for RNAseq
Hi everyone! I have a query regarding variant calling from a high coverage site on the basis of the maximum likelihood variant. I have RNA-seq data mapped bam file. I called variant using the below command. “bcftools mpileup –max-depth 10000 -Oz -f ref.fa sample.bam | bcftools call -mv -Oz -o…
Parallel genomic responses to historical climate change and high elevation in East Asian songbirds
Extreme environments present profound physiological stress. The adaptation of closely related species to these environments is likely to invoke congruent genetic responses resulting in similar physiological and/or morphological adaptations, a process termed “parallel evolution” (1). Existing evidence shows that parallel evolution is more common at the phenotypic level than at…
How to extract homologous sequence data from multiple .vcf.gz files?
How to extract homologous sequence data from multiple .vcf.gz files? 0 Hello, I have short read data from multiple samples stored as scaffolds.vcf.gz files. I have some gene sequence of interest. I want to find the closest homologous sequence of the respective genes from all the other samples. At first,…
VCF samtools
VCF samtools 0 Hello, I am having trouble when doing variant calling with samtools. I am getting only the header an no variants. If I would instead use Freebayes, I do get a lot of variables, and with Gatk, I get just a few. What can the problem be? Do…
One-hot encoding for PLINK or VCF
One-hot encoding for PLINK or VCF 0 I want to write an autoencoder for SNP data. Is there an established way to one-hot-encode binary PLINK or VCF input? I believe that can be done by manipulating PLINK’s bed file but am afraid to do something wrong. By one-hot encoding I…
How to handle VCFs from the same sample but using different aligners and variant callers?
Hi, I’m using whole-exome sequencing (WES) for somatic variant calling. During the process, I tried to follow the approach described here: pubmed.ncbi.nlm.nih.gov/28420412/ Basically my workflow is as follows: FASTQ preprocessing: Using 2 aligners (BWA-MEM, Bowtie2) BAM calibration Variant calling: Using 3 software (Mutect2, Strelka2, Lancet) Variant filtering: I keep just…
Somatic Variant Calling
Somatic Variant Calling 2 Hi, I need to call somatic variants from a BAM file of cancer panel. Can anyone please suggest any suitable tool for calling the variants and generate a VCF file. Thank You BAM NGS Variants Cancer • 53 views “Suitable” is very context-dependent, are you working…
add gene names to ‘isec’ output files of bcftools’
add gene names to ‘isec’ output files of bcftools’ 1 I had two vcf files and I used isec from bcftools software to find typical and common mutations between samples. The output of isec function were four vcf.gz file showing like below: isec_output/0000.vcf.gz would be variants unique to 1.vcf.gz isec_output/0001.vcf.gz…
Detecting chromosone notation in vcf files
Detecting chromosone notation in vcf files 1 Hi, I recently ran into an issue where a pipeline I wrote did not work on a new vcf file. As it turns out the problem was simply that the vcf file used “chr7” instead of just 7 for chromosome notation which confused…
Making consensus sequence for each haplotype
Making consensus sequence for each haplotype 0 I’m dealing with paired end amplicon sequencing data. I’ve produced a GVCF file with haplotype calls using: gatk HaplotypeCaller -R $REF -I “$BAM” -O “$OUT”/results/variants/${SN}_HaplotypeCallerPGT.vcf -ERC GVCF The vcf file it produces contains the PGT flag, and variants are called in the format…
state and usuge of compressed file standards better than BAM and FASTQ
Forum:2021: state and usuge of compressed file standards better than BAM and FASTQ 3 Extra compressed formats for raw/aligned reads and variant tables have been around for some time but I think saw slow adoption. Our current disk space usage is making us have another look at switching to file…
Error: PLINK does not support more than 2^31
Error: PLINK does not support more than 2^31 – 3 variants. 0 Hi there, I was converting my vcf file into bfiles in plink, and I got an error ‘Error: PLINK does not support more than 2^31 – 3 variants’. We recommend other software, such as PLINK/SEQ, for very deep…
Split a large VCF into user defined regions
Split a large VCF into user defined regions 3 Hello everyone! Anyone has an idea on how to split huge vcf files into user defined regions smaller vcf files? I have a bed file with my regions of interest (around 300) and I would like to extract into 300 different…
how to draw more than one regression line for a plot of dissimilarity matrices
how to draw more than one regression line for a plot of dissimilarity matrices 0 I have created a genetic distance matrix from VCF SNP data and a matrix of geographic distance from x y coordinates, both with function “dist” of R, then I plot them adding two-dimensional Kernel density…