Categories
Tag: BCFtools
A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing
Introduction Short-read metagenomic sequencing is the technique most widely used to explore the natural habitat of millions of bacteria. In comparison with 16S rRNA sequencing, shotgun metagenomic sequencing (MGS) provides sequence information of the whole genomes, which can be used to identify different genes present in an individual bacterium and…
PCA from plink2 for SGDP using a pangenome and DeepVariant
Hi there, I’m doing my first experiments with PCA and UMAP as dimensionality reductions to visualize a dataset I’ve been working on. Basically, I used the samples from the SGDP which I then mapped on the human pangenome for, finally, calling small variants with DeepVariant. I moved on with some…
Genomic insights into Plasmodium vivax population structure and diversity in central Africa | Malaria Journal
Hamblin MT, Di Rienzo A. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am J Hum Genet. 2000;66:1669–79. Article CAS PubMed PubMed Central Google Scholar Hamblin MT, Thompson EE, Di Rienzo A. Complex signatures of natural selection at the Duffy blood group…
Search for specific SNPs in VCF files of patients.
Search for specific SNPs in VCF files of patients. 0 I have 490 genomes from 490 patients in VCF format. I created a Multi VCF file from these VCFs. I want to find 2 mutations (Y215C and G325R) in these patients, count the number of patients who have these SNPs…
A super-pangenome of the North American wild grape species | Genome Biology
Alston JM, Sambucci O. Grapes in the world economy. In: Cantu D, Walker MA, editors. The grape genome. Springer International Publishing; 2019. p. 1–24. Google Scholar Rahemi A, Dodson Peterson JC, Lund KT. Grape rootstocks and related species. Cham: Springer International Publishing; 2022. Walker MA, Heinitz C, Riaz S, Uretsky…
Multiallelic variants when merging VCF’s with GLnexus
Multiallelic variants when merging VCF’s with GLnexus 0 I’m attempting to combine around 140 .g.vcf files into a single file using GLnexus on the DNAnexus platform. To examine multiallelic variants, I’m normalizing the files using the bcftools norm -m-any $file command. While merging the original VCF files (generated with GATK)…
Diversity and dissemination of viruses in pathogenic protozoa
Wang, A. L. & Wang, C. C. Viruses of the protozoa. Annu. Rev. Microbiol. 45, 251–263 (1991). Article CAS PubMed Google Scholar Banik, G., Stark, D., Rashid, H. & Ellis, J. Recent advances in molecular biology of parasitic viruses. Infect. Disord. – Drug Targets 14, 155–167 (2015). Article Google Scholar …
Chromosome-level genome assembly of the Stoliczka’s Asian trident bat (Aselliscus stoliczkanus)
Dobson, G. E. On a new genus and species of Rhinolophidae, with description of a new species of Vesperus, and notes on some other species of insectivorous bats from Persia. J. Asiat. Soc. Bengal. 40, 455–461 (1871). Google Scholar Bates, P., Bumrungsri, S., Francis, C., Csorba, G. & Furey, N….
Require Genotypes in VCF file in order to output IMPUTE format.
Error: Require Genotypes in VCF file in order to output IMPUTE format. 0 Hello, I am trying to export a VCF file in IMPUTE format and keep getting the same error message: Code: module load htslib/1.17 module load samtools/1.17 module load bcftools/1.17 module load java/17.0.8 module load python3 module load…
Indigenous Australian genomes show deep structure and rich novel variation
Inclusion and ethics The DNA samples analysed in this project form part of a collection of biospecimens, including historically collected samples, maintained under Indigenous governance by the NCIG11 at the John Curtin School of Medical Research at the Australian National University (ANU). NCIG, a statutory body within ANU, was founded…
The landscape of genomic structural variation in Indigenous Australians
Cohorts Saliva and/or blood samples were collected from consenting individuals among four NCIG-partnered communities: Tiwi Islands (comprising the Wurrumiyanga, Pirlangimpi and Millikapiti communities), Galiwin’ku, Titjikala and Yarrabah, between 2015 and 2019. Non-Indigenous comparison data, generated from unrelated Australian individuals of European ancestry, was drawn from two existing biomedical research cohorts:…
ubuntu – Medaka: unrecognized command ‘tools’ and samtools not found
When trying to run medaka_consensus in ubuntu, I am getting the following error. I installed into a virtualenv to run on ubuntu. (medaka) ubuntu:~/medaka$ medaka_consensus -i combined.fastq -d curated.fasta -t -o ~/medaka 10 -m r941_sup_plant_g610 TF_CPP_MIN_LOG_LEVEL is set to ‘3’ [main] unrecognized command ‘tools’ Attempting to automatically select model version….
Characterizing viral species in mosquitoes (Culicidae) in the Colombian Orinoco: insights from a preliminary metagenomic study
Kraemer, M. U. et al. The global distribution of the arbovirus vectors Aedes aegypti and Ae. albopictus. Elife 4, e08347. doi.org/10.7554/eLife.08347 (2015). Article PubMed PubMed Central Google Scholar Bhatt, S. et al. The global distribution and burden of dengue. Nature 496, 504–507. doi.org/10.1038/nature12060 (2013). Article ADS CAS PubMed PubMed Central …
extract variants from 1000 Genome VCF files
extract variants from 1000 Genome VCF files 0 Hi everyone, I have a gVCF containing genetic information from different individuals, and I would like to extract specific SNPs. The SNPs of interest are listed in a BED file with the following structure (the end position rapresent the real position of…
bcftools=1.18 not filtering correcting MAF
bcftools=1.18 not filtering correcting MAF 0 Hi, I have encountered some issues when using bcftools v.1.11, v.1.14 or v.1.18 I want to filter MAF<=0.01 & ‘F_MISSING<0.1’ for rare-variant analysis. I have a vcf file mapped to the GRCh37, left aligned, and multi-allelic split. bcftools view -q 0.01:minor test1.vcf > test2.vcf…
r – Fst calculation from VCF files
I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…
The MetaInvert soil invertebrate genome resource provides insights into below-ground biodiversity and evolution
FAO, ITPS, GSBI, CBD & EC. State of knowledge of soil biodiversity – Status, challenges and potentialities, Report 2020. (FAO). doi.org/10.4060/cb1928en. 2020. Potapov, A. M. et al. Feeding habits and multifunctional classification of soil-associated consumers from protists to vertebrates. Biol. Rev. 97, 1057–1117 (2022). Article PubMed Google Scholar García-Palacios, P.,…
Infer ancestry for RNA-seq data
Infer ancestry for RNA-seq data 0 I generated VCF files with bcftools for 4 patient RNA-seq samples. I was also able to generate bed, bim, and fam files with PLINK for these files. I want some guidance on how to infer ancestry for these RNA-seq samples: How do I find…
Genomics England hiring PhD Bioinformatics Intern in London, England, United Kingdom
Company DescriptionGenomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our ability to…
Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release
Body: Hello, Biostars Community, I am working on creating a custom database of variants using the VCF from the latest dbSNP alpha release available at ftp.ncbi.nih.gov/snp/population_frequency/latest_release/. I have encountered a couple of issues that I’m hoping someone might help me resolve. Firstly, the chromosome encoding uses RefSeq IDs (e.g., NC_000007.12)…
vcftools
vcftools 1 Hi, I tried this code but I couldn’t get any output. Please guide me to resolve this issue bash for i in {1..2} do vcftools –LROH –vcf Pakistan.total.vcf –out ${i} –chr i done vcftools • 41 views • link updated 2 hours ago by Barista ▴ 10 •…
PhD Bioinformatics Intern Job in Greater London, Pharmaceuticals & Life Sciences Career, Intern/Graduate Jobs in Genomics England
Company Description Genomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our…
Quorum-sensing synthase mutations re-calibrate autoinducer concentrations in clinical isolates of Pseudomonas aeruginosa to enhance pathogenesis
Centers for Disease Control and Prevention (U.S.). Antibiotic Resistance Threats in the United States, 2019. doi.org/10.15620/cdc:82532 (2019). Centers for Disease Control and Prevention. COVID-19: U.S. Impact on Antimicrobial Resistance, Special Report 2022. doi.org/10.15620/CDC:117915 (2022). Fricks-Lima, J. et al. Differences in biofilm formation and antimicrobial resistance of Pseudomonas aeruginosa isolated from…
Bcftools consensus when reference is a deletion
Bcftools consensus when reference is a deletion 1 Hello, I am trying to call a consensus on a VCF file like so: bcftools consensus species.vcf.gz -f Reference.fasta –absent N > Consensus.fasta Error: The site SUPER_1:173197 overlaps with another variant, skipping… I looked at this site and included the previous site…
Ancient diversity in host-parasite interaction genes in a model parasitic nematode
Van Valen, L. A new evolutionary law. Evol. Theory 1, 1–30 (1973). Google Scholar Woolhouse, M. E. J., Webster, J. P., Domingo, E., Charlesworth, B. & Levin, B. R. Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nat. Genet. 32, 569–577 (2002). Article CAS PubMed Google…
The genomic epidemiology of shigellosis in South Africa
Institue for Health Metrics and Evaluation. Global Burden of Disease. vizhub.healthdata.org/gbd-results/ 2019. Troeger, C. E. et al. Quantifying risks and interventions that have affected the burden of diarrhoea among children younger than 5 years: an analysis of the Global Burden of Disease Study 2017. Lancet Infect. Dis. 20, 37–59 (2020)….
Pruning with –indep-pairwise with plink 1.9
I’m new to PLINK and I would like to obtain a file with SNPs in approximate linkage equilibrium. Here is my script and the outputs of each step. If someone could tell me if there is an error in the script because at…
: error while loading shared libraries: libcrypto.so.1.0.0:
bcftools error: : error while loading shared libraries: libcrypto.so.1.0.0: 1 I’m having trouble installing bcftools using conda and mamba run the following code : conda install -c bioconda bcftools but there is Errors in the results bcftools error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such…
Help finding the correct file version for dbSNP VCF ID replacement
Tried to use dbSNP version 156 using bcftools to replace the ID field in a reference VCF which originally contains a different position ID format. It seems the bcftools command did not work because a numeric chromosome column format in the #CHROM field which might not be compatible with bcftools…
Whole mitochondrial genome sequencing provides new insights into the phylogeography of loggerhead turtles (Caretta caretta) in the Mediterranean Sea
Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. www.bioinformatics.babraham.ac.uk/projects/fastqc Avise JC (1986) Mitochondrial DNA and the evolutionary genetics of higher animals. Philos Trans R Soc Lond B 312:325–342. doi.org/10.1098/rstb.1986.0011 Article CAS Google Scholar Baker CS, Steel D, Calambokidis J, Falcone E, González-Peral U, Barlow J,…
update FMT/GT in VCF file using bcftools annotate
update FMT/GT in VCF file using bcftools annotate 1 Hi – I am trying to use bcftools to overwrite the existing FMT/GT values in a VCF file, matching by the ID column, in addition to CHROM and POS. I tried creating a .txt.gz file as an annotation file, but got…
bcftools info and filter error
bcftools info and filter error 0 Hi. I am trying to do a comparative analysis of my vcf file against the vcf files of ExAc. I’m using the bcftool isec here. I am getting an error that says: [W::vcf_parse_info] INFO ‘HOM_CONSANGUINEOUS’ is not defined in the header, assuming Type=String [W::vcf_parse_filter]…
Bbtools callvariant multisample mode, + base recalibration
Bbtools callvariant multisample mode, + base recalibration 0 Hello, it’s not clear to me if the multisample mode is actually a join calling, or if it’s simply the equivalent of bcftools norm + bcftools merge. I am specifically interested in NOT doing join calling. I am also wondering if base…
Issue with Merging BCF Files: Invalid INFO id Error
Issue with Merging BCF Files: Invalid INFO id Error 0 I am attempting to combine two BCF (Binary Variant Call Format) files into a single file using the command bcftools merge output1.bcf output2.bcf –force-samples -o test.bcf. However, when I try to view the resulting BCF file, I encounter an error…
Shotgun metagenomes from productive lakes in an urban region of Sweden
Williamson, C. E., Saros, J. E., Vincent, W. F. & Smol, J. P. Lakes and reservoirs as sentinels, integrators, and regulators of climate change. Limnology and Oceanography 54, 2273–2282, doi.org/10.4319/lo.2009.54.6_part_2.2273 (2009). Article ADS Google Scholar Cavicchioli, R. et al. 2019. Scientists’ warning to humanity: microorganisms and climate change. Nature Reviews…
SNP calling with many samples using bcftools
SNP calling with many samples using bcftools 0 Hello, I aim to identify SNPs from approximately 500 BAM files (non-human). I’m opting for bcftools since GATK, even with the Spark addition, takes a substantial 6 hours per sample. My objective is to generate a single VCF file encompassing all SNPs…
set GT values to missing in VCF file for specific sample-variant combinations
Hi – I have a multi-subject vcf file and would like to set specific genotypes (GT) to missing for a set of subjects. However, the subjects that I need to set to missing are different for each variant. For example, suppose I have this: CHROM POS ID FORMAT sub1 sub2…
Clinically relevant antibiotic resistance genes are linked to a limited set of taxa within gut microbiome worldwide
Murray, C. J. et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 399, 629–655 (2022). Article CAS Google Scholar Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535, 435–439 (2016). Article ADS CAS PubMed …
format error, unexpected A at line 1
bcftools mipileup error: format error, unexpected A at line 1 0 I had a problem using bcftools. After using the command line(below): there is some error in my results. The error message stated: “Note: none of –samples-file, –ploidy or –ploidy-file given, assuming all sites are diploid [E::fai_build_core] Format error, unexpected…
GATK SelectVariants –remove-unused-alternates dropping real INDELs?
I’m using a VCF that is generated by GenotypeGVCFs (so doing calibration based on a larger cohort of samples) and my goal is to only extract variants of interest to one specific sample. The VCF in the subset tends to include some variants that were present in the original joint…
Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error
Source: bcftools Version: 1.18-1 Severity: serious Tags: ftbfs Justification: ftbfs Control: forwarded -1 github.com/samtools/bcftools/issues/2036 Dear Maintainer, bcftools currently ftbfs on armhf due to multiple test_vcf_merge failures with Bus error[1]. I already informed upstream[2]. This bug is mostly to keep track of the issue on Debian side and eventually comment on possible Debian specific…
ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications | BMC Bioinformatics
Pipeline architecture and configuration file Genomic data processing poses a challenge for genetic research studies because it involves multiple program dependency installations, vast numbers of samples with raw data from various next-generation sequencing (NGS) platforms, and inconsistent genetic variant ID and/or positions among datasets. The Iliad suite of genomic data…
Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly
Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly 0 We have two groups sample of yeast species, control (1 sample) and treatment (1 sample), whose complete reference genome isn’t available yet to do alignment nor variant calling. The objective of this project is straightforward, simply wanting to…
Unzipped chromosome-level genomes reveal allopolyploid nematode origin pattern as unreduced gamete hybridization
Nematode materials and species identification Mi, Mj, and Mg were collected from farmlands in Wuhan city of Hubei province, Longyan city of Fujian province, and Changsha city of Hunan Provinces, respectively. Two Ma samples were collected from farmlands in Shenyang city of Liaoning province and Shiping city of Yunnan province….
bcftools compressing and indexing vcf files
bcftools compressing and indexing vcf files 2 Hello, I am trying to merge multiple VCF files using bcftools but it threw an error saying that the file is not compressed. I want to know if the right command to compress the file would be: bcftools view -I input.vcf -O z…
How to efficiently count missense mutations from an annotated vcf file?
How to efficiently count missense mutations from an annotated vcf file? 1 Hi! I am currently working on my undergraduate study about the frequency of missense mutations in early and advanced stages of early luminal breast cancer. The vcf file contains 47 transcriptomic samples–12 early (stage II) and 35 advanced…
Production of leishmanin skin test antigen from Leishmania donovani for future reintroduction in the field
Study design and ethical statement All research complies with all relevant ethical regulations. Animal experiments in this study were reviewed and approved by the Animal Care and Use Committee of the Center for Biologics Evaluation and Research, U.S. Food and Drug Administration (ASP-1999#23 and ASP-1995#26) and the National Institute of…
Single-nucleus DNA sequencing reveals hidden somatic loss-of-heterozygosity in Cerebral Cavernous Malformations
Ethical statement Our research complies with all relevant ethical regulations, including the Declaration of Helsinki and has been approved by the Institutional Review Boards of University of Chicago, Duke University and the Alliance to Cure Cavernous Malformations. Cerebral cavernous malformation lesions All human CCM tissue specimens have been previously reported18,19…
Inferring bacterial transmission dynamics using deep sequencing genomic surveillance data
Study design Experiments were performed in accordance with the New Zealand Animal Welfare Act (1999) and institutional guidelines provided by the University of Auckland Animal Ethics Committee, which reviewed and approved these experiments under application R1003. We did not use any specific randomisation process to allocate animals to a particular…
No samples in .vcf file.
I am trying to convert my vcf file into a BED format file. When I use this command: plink –vcf merge.bacteria.vcf.gz –make-bed –out merge.bacteria.vcf.bed I get the following error stating: PLINK v1.90b6.21 64-bit (19 Oct 2020) www.cog-genomics.org/plink/1.9/(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License…
Comparative Analysis of Structural Variant Callers on Short-Read Whole-Genome Sequencing Data
Pang, A.W., MacDonald, J.R., Pinto, D., et al., Towards a comprehensive structural variation map of an individual human genome, Genome Biol., 2010, vol. 11, no. 5, p. R52. doi.org/10.1186/gb-2010-11-5-r52 Article CAS PubMed PubMed Central Google Scholar The International HapMap Consortium, The international HapMap project, Nature, 2003, pp. 789—796. doi.org/10.1038/nature02168 Sudmant,…
Bioconductor – Bioconductor 3.18 Released
Home Bioconductor 3.18 Released October 25, 2023 Bioconductors: We are pleased to announce Bioconductor 3.18, consisting of 2266 software packages, 429 experiment data packages, 920 annotation packages, 30 workflows and 4 books. There are 69 new software packages, 10 new data experiment packages, 8 new annotation packages, no new workflows,…
Normalisation of PLINK/VCF files?
Normalisation of PLINK/VCF files? 0 Variant notations can vary significantly, and although there are numerous tools available to address this issue, such as bcftools +fixref or bcftools norm, there’s still a chance that something might be overlooked. Is there a comprehensive tool or pipeline that automates this process to ensure…
Bcftools Consensus – Choose Random Allele for Heterozygous Sites
Bcftools Consensus – Choose Random Allele for Heterozygous Sites 0 Hello, I am trying to generate a haploid consensus sequence based on a VCF file. For sites which are heterozygous, I want to randomly choose one of the alleles. I don’t want to always choose reference and I don’t want…
Genome sequences of 36,000- to 37,000-year-old modern humans at Buran-Kaya III in Crimea
Hajdinjak, M. et al. Initial upper palaeolithic humans in europe had recent neanderthal ancestry. Nature 592, 253–257 (2021). Article CAS PubMed PubMed Central Google Scholar Slimak, L. et al. Modern human incursion into Neanderthal territories 54,000 years ago at Mandrin, France. Sci. Adv. 8, eabj9496 (2022). Article CAS PubMed PubMed…
NGS one-liner to call variants
Tutorial:NGS one-liner to call variants 0 This is a tutorial about creating a pipeline for sequence analysis in a single line. It is made for capture/amplicon short read sequencing in mind for human DNA and tested with reference exome sequencing data described here. I share the process and debuging steps…
NGS oneliner
Tutorial:NGS oneliner 0 This is a tutorial about creating a pipeline for sequence analysis in a single line.I share the process and debuging steps gone through while putting it together.Source is available at: github.com/barslmn/ngsoneliner/I couldn’t make a longer post, complete version of this post: omics.sbs/blog/NGSoneliner/NGSoneliner.html Pipeline # fastp –in1 “$R1″…
problem with bcftools syntax
problem with bcftools syntax 1 Hi all! I am having difficulty with creating a bcftools command. I have a .vcf.gz file downloaded from the 1000G site and a csv file with columns chrom/pos/id/ref/alt. I would like to manipulate the downloaded vcf file so that it uses only the snps I…
Low mutation rate in epaulette sharks is consistent with a slow rate of evolution in sharks
Compagno, L. J. V. Alternative life-history styles of cartilaginous fishes in time and space. Environ. Biol. Fishes 28, 33–75 (1990). Article Google Scholar Kriwet, J., Witzmann, F., Klug, S. & Heidtke, U. H. J. First direct evidence of a vertebrate three-level trophic chain in the fossil record. Proc. Biol. Sci….
How to merge my vcf files (n=6) with existing Pf6 vcf file and do pca?
How to merge my vcf files (n=6) with existing Pf6 vcf file and do pca? 0 I sampled some Pf strains and got them WGS done. Now I want to merge them with existing Pf6 data. For this I downloaded Pf6 data for all 14 chromosomes. I then used bcftools…
Troubleshooting multallelic variant merging issue
Hello, I want to recode the IIDs of imputed data .bgen files into two different filesets, and merge these (working on eye-level analyses with Regenie). As I’m only interested in dosages, I’ve converted these to .pgen using PLINK2 (ref-first as UK Biobank): plink2 –bgen data.bgen ref-first –sample data.sample –update-ids recoded_ids_a.txt –make-pgen…
ILIAD: A suite of automated Snakemake workflows for processing genomic data for downstream applications
Abstract Background: Processing raw genomic data for downstream applications such as imputation, association studies, and modeling requires numerous third-party bioinformatics software tools. It is highly time-consuming and resource-intensive with computational demands and storage limitations that pose significant challenges that increase cost. The use of software tools independent of one another,…
‘samtools’ aligned sequence utilities interface
R: ‘samtools’ aligned sequence utilities interface Rsamtools-package {Rsamtools} R Documentation ‘samtools’ aligned sequence utilities interface Description This package provides facilities for parsing samtools BAM (binary) files representing aligned sequences. Details See packageDescription(‘Rsamtools’) for package details. A useful starting point is the scanBam manual page. Note This package documents the following…
Help me understand “for stripped” in bcftools isec output
Hello, I was given a set of VCF files, comparing variants. The Readme gives the following command bcftools isec -p dir -n-1 -c all ref.vcf.gz S1.vcf.gz S2.vcf.gz S3.vcf.gz S4.vcf.GZ S5.vcf.gz S5.vcf.gz First, if I understand the doc correctly, -n-1 means “looks for SNPs found at most in a single file”?…
Most sensible way to find private SNPs from a multisamples vcf with bcftools
Hello, this question is somehow complementary to what I asked yesterday here: Using bcftools to find unique alt homozygous sites Now let’s say I want to find the SNPs 0/1 unique to the sample D3A350g_bcftools2 (see below) I know I can use bcftools view -s D3A350g_bcftools2.bcf -x all_bcftools2_merged.vcf But there…
public databases – Converting VCF format to text for use with PLINK and understanding column mapping
I successfully completed Nature PRS tutorial, which is based on PLINK. Turning to my real data, I downloaded ukb-d-20544_1.vcf.gz. Now I’m facing the problem that I seem to be unable to use it in PLINK or find the correct data format to download at all, and I am a bit…
Using bcftools to find unique alt homozygous sites
Hello, I have a vcf with 20 samples. I want to find for each sample the sites that are 1/1, only in that sample (so other samples must have genotypes 0/1 or 0/0). I know I can use filters such as GT=”aa”‘ However, how do I say GT=”aa” for sample…
Genotyping, sequencing and analysis of 140,000 adults from Mexico City
Recruitment of study participants The MCPS was established in the late 1990s following discussions between Mexican scientists at the National Autonomous University of Mexico (UNAM) and British scientists at the University of Oxford about how best to measure the changing health effects of tobacco in Mexico. These discussions evolved into…
VCF indexing
VCF indexing 2 Hi all, Is it required to have the index files for vcf files in the same directory as these vcfs?I’m attempting to create index files by reading vcfs files from a write-protected directory. Thus, I am unable to make index files in the same directory. My goal…
What is a tool to get the genome build of a VCF?
What is a tool to get the genome build of a VCF? 0 It shouldn’t be too hard to create one, but if one exists already that’s even better. I need it to be automatable / non-web based (assume no relevant info exists in the header). bcftools vcf • 133…
Determine INDELs number (both classes separately) from reference and graph-based VCF files
Hi there, this is more so of a hint/suggestion post than a real question since I could manage to find some related posts here on Biostars but appreciate a feedback on the procedure/results for the analysis. In principle, I’m trying to compare the bwa-mem_GATK pipeline working on the linear reference…
Solved We are now going to call variants with two different
We are now going to call variants with two different approaches from the files we have been working with all course. Please use the following files, parameters, and listed versions of the software for this assignment. We will use the reference Ebola genome: /data/compres/refs/AF086833.2.fasta And this set of paired-end sequences:…
Filter vcf SNPs by sample GT value
Filter vcf SNPs by sample GT value 1 I have a merged VCF file with multiple samples on joined SNP set, where original genotypes have a 0|0 / 0|1 / 1|0 / 1|1 genotype (GT) and merged are fomatted as 0/0 if SNP was missing (–missing-ref option of bcftools merge)….
RNAseq based variant dataset in a black poplar association panel | BMC Research Notes
Dickmann DI, Kuzovkina J. Poplars and willows of the world, with emphasis on silviculturally important species. In: Isebrands JG, Richardson J, editors. Poplars and willows: trees for society and the environment. Wallingford: CABI; 2014. Google Scholar Imbert E, Lefèvre F. Dispersal and gene flow of Populus nigra (Salicaceae) along a…
Match variants from RNAseq with known databases
Match variants from RNAseq with known databases 0 Hi all, I’ve managed to call variants from RNAseq using existing tools out there (Strelka2, HC). Now I want to compare my variants with subsets of known variants coming from specific datasets. Here’s the question: variants are reported only on the forward…
Effect of recombination on genetic diversity of Caenorhabditis elegans
Strong correlation exists between recombination rate and abundance and proportion of indels Whole-genome sequence data of many C. elegans wild isolates now exist. These include Illumina paired-end data of over 600 wild isolates by CeNDR, which also obtained first-generation PacBio long-read data of 14 wild isolates. Second-generation PacBio HiFi data20…
Quality Control of VCFs that used different genotyping arrays
I have three VCFs. Two of these VCFs were generated using the Precision Medicine Research Array (PMRA) and refer to SNPs as AX numbers. I was able to merge the two PMRA VCFs together. Merged PMRA VCFs (Total genotyping rate is 0.924427): 1 AX-150343089 0 837711 T C 1 AX-149471710…
Automate the Splitting of a VCF File by Sample (bcftools)
Hi! I’m very new to working with large .vcf files, and am trying to split up a particular file by strain (sample). There are about 1500 samples in the file, so going through manually isn’t really an option (although I have managed to get it to work). My problem has…
Splitting VCF/BCF file into seperate gene files
Splitting VCF/BCF file into seperate gene files 0 I have a multi-sample bcf file which I would like to split into smaller files per gene so I can use this for some downstream eQTL analysis. I’ve started a bash script which pipes bcftools query -f ‘%SAMPLE\t%POS\t%REF\t%ALT\t%GT\n’ into an awk script…
How to filter vcf file by MAF using bcftools?
How to filter vcf file by MAF using bcftools? 0 I saw this thread (github.com/samtools/bcftools/issues/357) and it seems sometimes bcftools recomputes MAF and sometimes it takes it from INFO. Which one is recommended to use and how is this calculation different than what stated on INFO? Also, what is the…
Transposon-encoded nucleases use guide RNAs to promote their selfish spread
Siguier, P., Gourbeyre, E., Varani, A., Ton-Hoang, B. & Chandler, M. Everyman’s guide to bacterial insertion sequences. Microbiol. Spectr. 3, MDNA3-0030-2014 (2015). Article PubMed Google Scholar He, S. et al. The IS200/IS605 family and “peel and paste” single-strand transposition mechanism. Microbiol. Spectr. 3, MDNA3-0039-2014 (2015). Article ADS Google Scholar Kapitonov,…
Where to find info on VCF file format?
Where to find info on VCF file format? 1 The format of my vcf file is: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT chr8 1023470 chr8_1023470_T_G T G 48 . AF=3e-06;AQ=48 GT:DP:AD:GQ:PL:RNC What do AQ, AD and RNC mean? I tried looking here: samtools.github.io/hts-specs/VCFv4.1.pdf but couldn’t find them….
bcftools error merging two VCFs: REF prefixes differ
Hi all, i am trying to merge two VCF files using bcftools merge. However, my command bcftools merge -m id VCF_d.vcf.gz VCF_p.vcf.gz -o merged.vcf.gz –force-samples returns the following The REF prefixes differ: TG vs GA (2,2) Failed to merge alleles at 18:786377 in VCF_d.vcf.gz These are the entries in the…
how to extract unique snps in a vcf file by comparing with multiple vcf files
how to extract unique snps in a vcf file by comparing with multiple vcf files 1 how to extract unique snps in a vcf file by comparing with multiple vcf files and make a file with unique snps EDIT by Ram OP created anotehr post a couple of hours later…
filtering variants in a Strelka2 VCF file based on AD and AF
Dear all, I would appreciate having your suggestions on the following. I am working with a VCF file that was produced by Strelka on Tumor-Normal pairs. As it is well known, Strelka2 does not provide Allele Depth (AD) or VAF (variant allele fraction) in the VCF fields. I have used…
Fast and sensitive validation of fusion transcripts in whole-genome sequencing data | BMC Bioinformatics
Al-Salama ZT, Keam SJ. Entrectinib: first global approval. Drugs. 2019;79(13):1477–83. Article PubMed Google Scholar Drilon A, Laetsch TW, Kummar S, DuBois SG, Lassen UN, Demetri GD, Nathenson M, Doebele RC, Farago AF, Pappo AS, et al. Efficacy of larotrectinib in TRK fusion-positive cancers in adults and children. N Engl J…
The localization of centromere protein A is conserved among tissues
Earnshaw, W. C. & Migeon, B. R. Three related centromere proteins are absent from the inactive centromere of a stable isodicentric chromosome. Chromosoma 92, 290–296 (1985). Article CAS PubMed Google Scholar Choo, K. H. Centromerization. Trends Cell Biol. 10, 182–188 (2000). Article CAS PubMed Google Scholar Marshall, O. J., Chueh,…
Filtering VCF to divide with equal sizes
Filtering VCF to divide with equal sizes 1 Hello everyone! I have a very large VCF file (>400gb), and I want to divide it to use with VEP. VEP recommends separating the vcf, so I generated a list of contigs, based on the header, with 3^7 bases for each chromosome….
mRNA vaccine quality analysis using RNA sequencing
Design and synthesis of reference plasmid A reference construct was first designed, with the intention of optimising the production of RNA therapeutics for pre-clinical research. The coding sequence of eGFP30 was selected as a reporter in the coding region, as its protein product can be assayed simply through Flow cytometry…
What is the difference between norm –multiallelics -any versus –atomize?
What is the difference between norm –multiallelics -any versus –atomize? 1 Hello, forgive my ignorance- Suppose input.vcf contains a complex multiallelic site. What is the difference between bcftools norm –multiallelics -any -f hg38.fa input.vcf versus bcftools norm –atomize -f hg38.fa input.vcf I understand what –multiallelics -any does but not sure…
after gatk VariantAnnotator -V *_com_norm.vcf -A AlleleFraction -O *_norm_AB.vcf There “nan,nan” or “nan” in my vcf file
after gatk VariantAnnotator -V *_com_norm.vcf -A AlleleFraction -O *_norm_AB.vcf There “nan,nan” or “nan” in my vcf file 0 After I run this code gatk VariantAnnotator -V _com_norm.vcf -A AlleleFraction -O _norm_AB.vcf there “nan,nan” or “nan” in my vcf file the input file dosen’t has “nan,nan” or “nan”, it (*_com_norm.vcf) comes…
Filter VCF File by VCF Format Variants
Filter VCF File by VCF Format Variants 0 I am trying to filter a VCF file to only include variants that are within another file, which is a txt file with VCF formatted columns (CHR POS REF ALT). I have been having a hard time finding a way to filter…
Data Import Issue detectRUNS R
I am running into an issue when importing data with detectRUNS in R. The following commands to import PLINK files have not been successful, and result in blank data frames. genotypeFilePath <- system.file(“extdata”, “genome.ped”,package=”detectRUNS”) mapFilePath <- system.file(“extdata”, “genome.map”, package=”detectRUNS”) head(genotypeFilePath) [1] “” The PLINK data are correctly formatted. OR I…
Recommended ways to merge multiple filesets with disjoint samples
Hi Chris, Thanks for the quick reply. I am getting used to eccentricities and the mentality of “right tool for right job”. BCF is highly optimized but the file floats which ends up taking entire space. My 2TiB dataset ended up taking 4.3 TiB in bcf file format while the…
Liftover GRCh37 to hg38 1kg/GATK.
Liftover GRCh37 to hg38 1kg/GATK. 1 I need to liftover a few variants from GRCh37 to hg38 1kg/GATK. UCSC lifover does not have this reference genome version available. I have tried with the standard hg38 but conversations are wrong. Where can I find GRCh37 to hg38 1kg/GATK chain files or…
Mismatch repair deficiency is not sufficient to elicit tumor immunogenicity
Mice All animal use was approved by the Department of Comparative Medicine at the Massachusetts Institute of Technology (MIT) and the Institutional Animal Care and Use Committee under protocol no. 0714-076-17. Mice were housed with a 12-h light/12-h dark cycle with temperatures in the range 20–22 °C and 30–70% humidity. KrasLSL-G12D…
The genomic footprint of whaling and isolation in fin whale populations
Samples and sequencing Tissue samples from 50 fin whales (Balaenoptera physalus) were collected using a standard protocol to obtain skin biopsies from free-ranging cetacean species, which use a small stainless-steel biopsy dart deployed from a crossbow or rifle73,74. These samples were collected throughout the Eastern North Pacific (ENP; N = 30, represented…
what is BCFtools” f_missing” removing?
what is BCFtools” f_missing” removing? 0 I’m a bit confused by BCFtools “f_missing” command, and what exactly it is removing? Eg. if using: F_MISSING<0.1 is it removing individuals, or variants which have greater then 10% missingness? Is someone able to help explain this command, as in the manual it just…
Finding Unique values on specific INFO field of the VCF file (dbNSFP, vep annotated multisample VCF)
Hello everyone! I searched the forum but coundn`t find a question that is like mine I have a multisample annotated VCF file (with dbNSFP plugin) to which I have filtered using VEP, like so: /scratch/ensembl-vep-109/ensembl-vep/filter_vep –force_overwrite –input_file {1} –output_file /home/filtering/2/2_{1/.}.vcf –only_matched –filter “(clinvar_clnsig is Pathogenic) or (clinvar_clnsig is Likely_pathogenic) After…
Finding Unique values on specific INFO field of the VCF file
Finding Unique values on specific INFO field of the VCF file 0 Hello everyone! I searched the forum but coundn`t find a question that is like mine I have a VCF file to which I have filtered using VEP, like so: /scratch/ensembl-vep-109/ensembl-vep/filter_vep –force_overwrite –input_file {1} –output_file /home/filtering/2/2_{1/.}.vcf –only_matched –filter “(clinvar_clnsig…