Tag: BCFtools

A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

Introduction Short-read metagenomic sequencing is the technique most widely used to explore the natural habitat of millions of bacteria. In comparison with 16S rRNA sequencing, shotgun metagenomic sequencing (MGS) provides sequence information of the whole genomes, which can be used to identify different genes present in an individual bacterium and…

Continue Reading A Benchmark of Genetic Variant Calling Pipelines Using Metagenomic Short-Read Sequencing

PCA from plink2 for SGDP using a pangenome and DeepVariant

Hi there, I’m doing my first experiments with PCA and UMAP as dimensionality reductions to visualize a dataset I’ve been working on. Basically, I used the samples from the SGDP which I then mapped on the human pangenome for, finally, calling small variants with DeepVariant. I moved on with some…

Continue Reading PCA from plink2 for SGDP using a pangenome and DeepVariant

Genomic insights into Plasmodium vivax population structure and diversity in central Africa | Malaria Journal

Hamblin MT, Di Rienzo A. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am J Hum Genet. 2000;66:1669–79. Article  CAS  PubMed  PubMed Central  Google Scholar  Hamblin MT, Thompson EE, Di Rienzo A. Complex signatures of natural selection at the Duffy blood group…

Continue Reading Genomic insights into Plasmodium vivax population structure and diversity in central Africa | Malaria Journal

Search for specific SNPs in VCF files of patients.

Search for specific SNPs in VCF files of patients. 0 I have 490 genomes from 490 patients in VCF format. I created a Multi VCF file from these VCFs. I want to find 2 mutations (Y215C and G325R) in these patients, count the number of patients who have these SNPs…

Continue Reading Search for specific SNPs in VCF files of patients.

A super-pangenome of the North American wild grape species | Genome Biology

Alston JM, Sambucci O. Grapes in the world economy. In: Cantu D, Walker MA, editors. The grape genome. Springer International Publishing; 2019. p. 1–24. Google Scholar  Rahemi A, Dodson Peterson JC, Lund KT. Grape rootstocks and related species. Cham: Springer International Publishing; 2022. Walker MA, Heinitz C, Riaz S, Uretsky…

Continue Reading A super-pangenome of the North American wild grape species | Genome Biology

Multiallelic variants when merging VCF’s with GLnexus

Multiallelic variants when merging VCF’s with GLnexus 0 I’m attempting to combine around 140 .g.vcf files into a single file using GLnexus on the DNAnexus platform. To examine multiallelic variants, I’m normalizing the files using the bcftools norm -m-any $file command. While merging the original VCF files (generated with GATK)…

Continue Reading Multiallelic variants when merging VCF’s with GLnexus

Diversity and dissemination of viruses in pathogenic protozoa

Wang, A. L. & Wang, C. C. Viruses of the protozoa. Annu. Rev. Microbiol. 45, 251–263 (1991). Article  CAS  PubMed  Google Scholar  Banik, G., Stark, D., Rashid, H. & Ellis, J. Recent advances in molecular biology of parasitic viruses. Infect. Disord. – Drug Targets 14, 155–167 (2015). Article  Google Scholar …

Continue Reading Diversity and dissemination of viruses in pathogenic protozoa

Chromosome-level genome assembly of the Stoliczka’s Asian trident bat (Aselliscus stoliczkanus)

Dobson, G. E. On a new genus and species of Rhinolophidae, with description of a new species of Vesperus, and notes on some other species of insectivorous bats from Persia. J. Asiat. Soc. Bengal. 40, 455–461 (1871). Google Scholar  Bates, P., Bumrungsri, S., Francis, C., Csorba, G. & Furey, N….

Continue Reading Chromosome-level genome assembly of the Stoliczka’s Asian trident bat (Aselliscus stoliczkanus)

Require Genotypes in VCF file in order to output IMPUTE format.

Error: Require Genotypes in VCF file in order to output IMPUTE format. 0 Hello, I am trying to export a VCF file in IMPUTE format and keep getting the same error message: Code: module load htslib/1.17 module load samtools/1.17 module load bcftools/1.17 module load java/17.0.8 module load python3 module load…

Continue Reading Require Genotypes in VCF file in order to output IMPUTE format.

Indigenous Australian genomes show deep structure and rich novel variation

Inclusion and ethics The DNA samples analysed in this project form part of a collection of biospecimens, including historically collected samples, maintained under Indigenous governance by the NCIG11 at the John Curtin School of Medical Research at the Australian National University (ANU). NCIG, a statutory body within ANU, was founded…

Continue Reading Indigenous Australian genomes show deep structure and rich novel variation

The landscape of genomic structural variation in Indigenous Australians

Cohorts Saliva and/or blood samples were collected from consenting individuals among four NCIG-partnered communities: Tiwi Islands (comprising the Wurrumiyanga, Pirlangimpi and Millikapiti communities), Galiwin’ku, Titjikala and Yarrabah, between 2015 and 2019. Non-Indigenous comparison data, generated from unrelated Australian individuals of European ancestry, was drawn from two existing biomedical research cohorts:…

Continue Reading The landscape of genomic structural variation in Indigenous Australians

ubuntu – Medaka: unrecognized command ‘tools’ and samtools not found

When trying to run medaka_consensus in ubuntu, I am getting the following error. I installed into a virtualenv to run on ubuntu. (medaka) ubuntu:~/medaka$ medaka_consensus -i combined.fastq -d curated.fasta -t -o ~/medaka 10 -m r941_sup_plant_g610 TF_CPP_MIN_LOG_LEVEL is set to ‘3’ [main] unrecognized command ‘tools’ Attempting to automatically select model version….

Continue Reading ubuntu – Medaka: unrecognized command ‘tools’ and samtools not found

Characterizing viral species in mosquitoes (Culicidae) in the Colombian Orinoco: insights from a preliminary metagenomic study

Kraemer, M. U. et al. The global distribution of the arbovirus vectors Aedes aegypti and Ae. albopictus. Elife 4, e08347. doi.org/10.7554/eLife.08347 (2015). Article  PubMed  PubMed Central  Google Scholar  Bhatt, S. et al. The global distribution and burden of dengue. Nature 496, 504–507. doi.org/10.1038/nature12060 (2013). Article  ADS  CAS  PubMed  PubMed Central …

Continue Reading Characterizing viral species in mosquitoes (Culicidae) in the Colombian Orinoco: insights from a preliminary metagenomic study

extract variants from 1000 Genome VCF files

extract variants from 1000 Genome VCF files 0 Hi everyone, I have a gVCF containing genetic information from different individuals, and I would like to extract specific SNPs. The SNPs of interest are listed in a BED file with the following structure (the end position rapresent the real position of…

Continue Reading extract variants from 1000 Genome VCF files

bcftools=1.18 not filtering correcting MAF

bcftools=1.18 not filtering correcting MAF 0 Hi, I have encountered some issues when using bcftools v.1.11, v.1.14 or v.1.18 I want to filter MAF<=0.01 & ‘F_MISSING<0.1’ for rare-variant analysis. I have a vcf file mapped to the GRCh37, left aligned, and multi-allelic split. bcftools view -q 0.01:minor test1.vcf > test2.vcf…

Continue Reading bcftools=1.18 not filtering correcting MAF

r – Fst calculation from VCF files

I have four vcf files, SNPs_s1.vcf, SNPs_s2.vcf, SNPs_s3.vcf, and SNPs_s4.vcf, which contain information about SNPs. These vcf files were obtained by using the following methods: the initial input files were short-paired reads I did mapping with minimap2 ./minimap2 -ax sr ref.fa read1.fq.gz read2.fq.gz > aln.sam converted to bam file samtools…

Continue Reading r – Fst calculation from VCF files

The MetaInvert soil invertebrate genome resource provides insights into below-ground biodiversity and evolution

FAO, ITPS, GSBI, CBD & EC. State of knowledge of soil biodiversity – Status, challenges and potentialities, Report 2020. (FAO). doi.org/10.4060/cb1928en. 2020. Potapov, A. M. et al. Feeding habits and multifunctional classification of soil-associated consumers from protists to vertebrates. Biol. Rev. 97, 1057–1117 (2022). Article  PubMed  Google Scholar  García-Palacios, P.,…

Continue Reading The MetaInvert soil invertebrate genome resource provides insights into below-ground biodiversity and evolution

Infer ancestry for RNA-seq data

Infer ancestry for RNA-seq data 0 I generated VCF files with bcftools for 4 patient RNA-seq samples. I was also able to generate bed, bim, and fam files with PLINK for these files. I want some guidance on how to infer ancestry for these RNA-seq samples: How do I find…

Continue Reading Infer ancestry for RNA-seq data

Genomics England hiring PhD Bioinformatics Intern in London, England, United Kingdom

Company DescriptionGenomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our ability to…

Continue Reading Genomics England hiring PhD Bioinformatics Intern in London, England, United Kingdom

Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

Body: Hello, Biostars Community, I am working on creating a custom database of variants using the VCF from the latest dbSNP alpha release available at ftp.ncbi.nih.gov/snp/population_frequency/latest_release/. I have encountered a couple of issues that I’m hoping someone might help me resolve. Firstly, the chromosome encoding uses RefSeq IDs (e.g., NC_000007.12)…

Continue Reading Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

vcftools

vcftools 1 Hi, I tried this code but I couldn’t get any output. Please guide me to resolve this issue bash for i in {1..2} do vcftools –LROH –vcf Pakistan.total.vcf –out ${i} –chr i done vcftools • 41 views • link updated 2 hours ago by Barista &utrif; 10 •…

Continue Reading vcftools

PhD Bioinformatics Intern Job in Greater London, Pharmaceuticals & Life Sciences Career, Intern/Graduate Jobs in Genomics England

Company Description Genomics England partners with the NHS to provide whole genome sequencing diagnostics. We also equip researchers to find the causes of disease and develop new treatments – with patients and participants at the heart of it all. Our mission is to continue refining, scaling, and evolving our…

Continue Reading PhD Bioinformatics Intern Job in Greater London, Pharmaceuticals & Life Sciences Career, Intern/Graduate Jobs in Genomics England

Quorum-sensing synthase mutations re-calibrate autoinducer concentrations in clinical isolates of Pseudomonas aeruginosa to enhance pathogenesis

Centers for Disease Control and Prevention (U.S.). Antibiotic Resistance Threats in the United States, 2019. doi.org/10.15620/cdc:82532 (2019). Centers for Disease Control and Prevention. COVID-19: U.S. Impact on Antimicrobial Resistance, Special Report 2022. doi.org/10.15620/CDC:117915 (2022). Fricks-Lima, J. et al. Differences in biofilm formation and antimicrobial resistance of Pseudomonas aeruginosa isolated from…

Continue Reading Quorum-sensing synthase mutations re-calibrate autoinducer concentrations in clinical isolates of Pseudomonas aeruginosa to enhance pathogenesis

Bcftools consensus when reference is a deletion

Bcftools consensus when reference is a deletion 1 Hello, I am trying to call a consensus on a VCF file like so: bcftools consensus species.vcf.gz -f Reference.fasta –absent N > Consensus.fasta Error: The site SUPER_1:173197 overlaps with another variant, skipping… I looked at this site and included the previous site…

Continue Reading Bcftools consensus when reference is a deletion

Ancient diversity in host-parasite interaction genes in a model parasitic nematode

Van Valen, L. A new evolutionary law. Evol. Theory 1, 1–30 (1973). Google Scholar  Woolhouse, M. E. J., Webster, J. P., Domingo, E., Charlesworth, B. & Levin, B. R. Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nat. Genet. 32, 569–577 (2002). Article  CAS  PubMed  Google…

Continue Reading Ancient diversity in host-parasite interaction genes in a model parasitic nematode

The genomic epidemiology of shigellosis in South Africa

Institue for Health Metrics and Evaluation. Global Burden of Disease. vizhub.healthdata.org/gbd-results/ 2019. Troeger, C. E. et al. Quantifying risks and interventions that have affected the burden of diarrhoea among children younger than 5 years: an analysis of the Global Burden of Disease Study 2017. Lancet Infect. Dis. 20, 37–59 (2020)….

Continue Reading The genomic epidemiology of shigellosis in South Africa

Pruning with –indep-pairwise with plink 1.9

I’m new to PLINK and I would like to obtain a file with SNPs in approximate linkage equilibrium. Here is my script and the outputs of each step. If someone could tell me if there is an error in the script because at…

Continue Reading Pruning with –indep-pairwise with plink 1.9

: error while loading shared libraries: libcrypto.so.1.0.0:

bcftools error: : error while loading shared libraries: libcrypto.so.1.0.0: 1 I’m having trouble installing bcftools using conda and mamba run the following code : conda install -c bioconda bcftools but there is Errors in the results bcftools error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such…

Continue Reading : error while loading shared libraries: libcrypto.so.1.0.0:

Help finding the correct file version for dbSNP VCF ID replacement

Tried to use dbSNP version 156 using bcftools to replace the ID field in a reference VCF which originally contains a different position ID format. It seems the bcftools command did not work because a numeric chromosome column format in the #CHROM field which might not be compatible with bcftools…

Continue Reading Help finding the correct file version for dbSNP VCF ID replacement

Whole mitochondrial genome sequencing provides new insights into the phylogeography of loggerhead turtles (Caretta caretta) in the Mediterranean Sea

Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. www.bioinformatics.babraham.ac.uk/projects/fastqc Avise JC (1986) Mitochondrial DNA and the evolutionary genetics of higher animals. Philos Trans R Soc Lond B 312:325–342. doi.org/10.1098/rstb.1986.0011 Article  CAS  Google Scholar  Baker CS, Steel D, Calambokidis J, Falcone E, González-Peral U, Barlow J,…

Continue Reading Whole mitochondrial genome sequencing provides new insights into the phylogeography of loggerhead turtles (Caretta caretta) in the Mediterranean Sea

update FMT/GT in VCF file using bcftools annotate

update FMT/GT in VCF file using bcftools annotate 1 Hi – I am trying to use bcftools to overwrite the existing FMT/GT values in a VCF file, matching by the ID column, in addition to CHROM and POS. I tried creating a .txt.gz file as an annotation file, but got…

Continue Reading update FMT/GT in VCF file using bcftools annotate

bcftools info and filter error

bcftools info and filter error 0 Hi. I am trying to do a comparative analysis of my vcf file against the vcf files of ExAc. I’m using the bcftool isec here. I am getting an error that says: [W::vcf_parse_info] INFO ‘HOM_CONSANGUINEOUS’ is not defined in the header, assuming Type=String [W::vcf_parse_filter]…

Continue Reading bcftools info and filter error

Bbtools callvariant multisample mode, + base recalibration

Bbtools callvariant multisample mode, + base recalibration 0 Hello, it’s not clear to me if the multisample mode is actually a join calling, or if it’s simply the equivalent of bcftools norm + bcftools merge. I am specifically interested in NOT doing join calling. I am also wondering if base…

Continue Reading Bbtools callvariant multisample mode, + base recalibration

Issue with Merging BCF Files: Invalid INFO id Error

Issue with Merging BCF Files: Invalid INFO id Error 0 I am attempting to combine two BCF (Binary Variant Call Format) files into a single file using the command bcftools merge output1.bcf output2.bcf –force-samples -o test.bcf. However, when I try to view the resulting BCF file, I encounter an error…

Continue Reading Issue with Merging BCF Files: Invalid INFO id Error

Shotgun metagenomes from productive lakes in an urban region of Sweden

Williamson, C. E., Saros, J. E., Vincent, W. F. & Smol, J. P. Lakes and reservoirs as sentinels, integrators, and regulators of climate change. Limnology and Oceanography 54, 2273–2282, doi.org/10.4319/lo.2009.54.6_part_2.2273 (2009). Article  ADS  Google Scholar  Cavicchioli, R. et al. 2019. Scientists’ warning to humanity: microorganisms and climate change. Nature Reviews…

Continue Reading Shotgun metagenomes from productive lakes in an urban region of Sweden

SNP calling with many samples using bcftools

SNP calling with many samples using bcftools 0 Hello, I aim to identify SNPs from approximately 500 BAM files (non-human). I’m opting for bcftools since GATK, even with the Spark addition, takes a substantial 6 hours per sample. My objective is to generate a single VCF file encompassing all SNPs…

Continue Reading SNP calling with many samples using bcftools

set GT values to missing in VCF file for specific sample-variant combinations

Hi – I have a multi-subject vcf file and would like to set specific genotypes (GT) to missing for a set of subjects. However, the subjects that I need to set to missing are different for each variant. For example, suppose I have this: CHROM POS ID FORMAT sub1 sub2…

Continue Reading set GT values to missing in VCF file for specific sample-variant combinations

Clinically relevant antibiotic resistance genes are linked to a limited set of taxa within gut microbiome worldwide

Murray, C. J. et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 399, 629–655 (2022). Article  CAS  Google Scholar  Brito, I. L. et al. Mobile genes in the human microbiome are structured from global to individual scales. Nature 535, 435–439 (2016). Article  ADS  CAS  PubMed …

Continue Reading Clinically relevant antibiotic resistance genes are linked to a limited set of taxa within gut microbiome worldwide

format error, unexpected A at line 1

bcftools mipileup error: format error, unexpected A at line 1 0 I had a problem using bcftools. After using the command line(below): there is some error in my results. The error message stated: “Note: none of –samples-file, –ploidy or –ploidy-file given, assuming all sites are diploid [E::fai_build_core] Format error, unexpected…

Continue Reading format error, unexpected A at line 1

GATK SelectVariants –remove-unused-alternates dropping real INDELs?

I’m using a VCF that is generated by GenotypeGVCFs (so doing calibration based on a larger cohort of samples) and my goal is to only extract variants of interest to one specific sample. The VCF in the subset tends to include some variants that were present in the original joint…

Continue Reading GATK SelectVariants –remove-unused-alternates dropping real INDELs?

Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error

Source: bcftools Version: 1.18-1 Severity: serious Tags: ftbfs Justification: ftbfs Control: forwarded -1 github.com/samtools/bcftools/issues/2036 Dear Maintainer, bcftools currently ftbfs on armhf due to multiple test_vcf_merge failures with Bus error[1]. I already informed upstream[2]. This bug is mostly to keep track of the issue on Debian side and eventually comment on possible Debian specific…

Continue Reading Bug#1055669: bcftools: test_vcf_merge failures on armhf: Bus error

ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications | BMC Bioinformatics

Pipeline architecture and configuration file Genomic data processing poses a challenge for genetic research studies because it involves multiple program dependency installations, vast numbers of samples with raw data from various next-generation sequencing (NGS) platforms, and inconsistent genetic variant ID and/or positions among datasets. The Iliad suite of genomic data…

Continue Reading ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications | BMC Bioinformatics

Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly

Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly 0 We have two groups sample of yeast species, control (1 sample) and treatment (1 sample), whose complete reference genome isn’t available yet to do alignment nor variant calling. The objective of this project is straightforward, simply wanting to…

Continue Reading Need Help Understanding Variant Calling Issues in De Novo Yeast Assembly

Unzipped chromosome-level genomes reveal allopolyploid nematode origin pattern as unreduced gamete hybridization

Nematode materials and species identification Mi, Mj, and Mg were collected from farmlands in Wuhan city of Hubei province, Longyan city of Fujian province, and Changsha city of Hunan Provinces, respectively. Two Ma samples were collected from farmlands in Shenyang city of Liaoning province and Shiping city of Yunnan province….

Continue Reading Unzipped chromosome-level genomes reveal allopolyploid nematode origin pattern as unreduced gamete hybridization

bcftools compressing and indexing vcf files

bcftools compressing and indexing vcf files 2 Hello, I am trying to merge multiple VCF files using bcftools but it threw an error saying that the file is not compressed. I want to know if the right command to compress the file would be: bcftools view -I input.vcf -O z…

Continue Reading bcftools compressing and indexing vcf files

How to efficiently count missense mutations from an annotated vcf file?

How to efficiently count missense mutations from an annotated vcf file? 1 Hi! I am currently working on my undergraduate study about the frequency of missense mutations in early and advanced stages of early luminal breast cancer. The vcf file contains 47 transcriptomic samples–12 early (stage II) and 35 advanced…

Continue Reading How to efficiently count missense mutations from an annotated vcf file?

Production of leishmanin skin test antigen from Leishmania donovani for future reintroduction in the field

Study design and ethical statement All research complies with all relevant ethical regulations. Animal experiments in this study were reviewed and approved by the Animal Care and Use Committee of the Center for Biologics Evaluation and Research, U.S. Food and Drug Administration (ASP-1999#23 and ASP-1995#26) and the National Institute of…

Continue Reading Production of leishmanin skin test antigen from Leishmania donovani for future reintroduction in the field

Single-nucleus DNA sequencing reveals hidden somatic loss-of-heterozygosity in Cerebral Cavernous Malformations

Ethical statement Our research complies with all relevant ethical regulations, including the Declaration of Helsinki and has been approved by the Institutional Review Boards of University of Chicago, Duke University and the Alliance to Cure Cavernous Malformations. Cerebral cavernous malformation lesions All human CCM tissue specimens have been previously reported18,19…

Continue Reading Single-nucleus DNA sequencing reveals hidden somatic loss-of-heterozygosity in Cerebral Cavernous Malformations

Inferring bacterial transmission dynamics using deep sequencing genomic surveillance data

Study design Experiments were performed in accordance with the New Zealand Animal Welfare Act (1999) and institutional guidelines provided by the University of Auckland Animal Ethics Committee, which reviewed and approved these experiments under application R1003. We did not use any specific randomisation process to allocate animals to a particular…

Continue Reading Inferring bacterial transmission dynamics using deep sequencing genomic surveillance data

No samples in .vcf file.

I am trying to convert my vcf file into a BED format file.  When I use this command: plink –vcf merge.bacteria.vcf.gz –make-bed –out merge.bacteria.vcf.bed  I get the following error stating:  PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License…

Continue Reading No samples in .vcf file.

Comparative Analysis of Structural Variant Callers on Short-Read Whole-Genome Sequencing Data

Pang, A.W., MacDonald, J.R., Pinto, D., et al., Towards a comprehensive structural variation map of an individual human genome, Genome Biol., 2010, vol. 11, no. 5, p. R52. doi.org/10.1186/gb-2010-11-5-r52 Article  CAS  PubMed  PubMed Central  Google Scholar  The International HapMap Consortium, The international HapMap project, Nature, 2003, pp. 789—796. doi.org/10.1038/nature02168 Sudmant,…

Continue Reading Comparative Analysis of Structural Variant Callers on Short-Read Whole-Genome Sequencing Data

Bioconductor – Bioconductor 3.18 Released

Home Bioconductor 3.18 Released October 25, 2023 Bioconductors: We are pleased to announce Bioconductor 3.18, consisting of 2266 software packages, 429 experiment data packages, 920 annotation packages, 30 workflows and 4 books. There are 69 new software packages, 10 new data experiment packages, 8 new annotation packages, no new workflows,…

Continue Reading Bioconductor – Bioconductor 3.18 Released

Normalisation of PLINK/VCF files?

Normalisation of PLINK/VCF files? 0 Variant notations can vary significantly, and although there are numerous tools available to address this issue, such as bcftools +fixref or bcftools norm, there’s still a chance that something might be overlooked. Is there a comprehensive tool or pipeline that automates this process to ensure…

Continue Reading Normalisation of PLINK/VCF files?

Bcftools Consensus – Choose Random Allele for Heterozygous Sites

Bcftools Consensus – Choose Random Allele for Heterozygous Sites 0 Hello, I am trying to generate a haploid consensus sequence based on a VCF file. For sites which are heterozygous, I want to randomly choose one of the alleles. I don’t want to always choose reference and I don’t want…

Continue Reading Bcftools Consensus – Choose Random Allele for Heterozygous Sites

Genome sequences of 36,000- to 37,000-year-old modern humans at Buran-Kaya III in Crimea

Hajdinjak, M. et al. Initial upper palaeolithic humans in europe had recent neanderthal ancestry. Nature 592, 253–257 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  Slimak, L. et al. Modern human incursion into Neanderthal territories 54,000 years ago at Mandrin, France. Sci. Adv. 8, eabj9496 (2022). Article  CAS  PubMed  PubMed…

Continue Reading Genome sequences of 36,000- to 37,000-year-old modern humans at Buran-Kaya III in Crimea

NGS one-liner to call variants

Tutorial:NGS one-liner to call variants 0 This is a tutorial about creating a pipeline for sequence analysis in a single line. It is made for capture/amplicon short read sequencing in mind for human DNA and tested with reference exome sequencing data described here. I share the process and debuging steps…

Continue Reading NGS one-liner to call variants

NGS oneliner

Tutorial:NGS oneliner 0 This is a tutorial about creating a pipeline for sequence analysis in a single line.I share the process and debuging steps gone through while putting it together.Source is available at: github.com/barslmn/ngsoneliner/I couldn’t make a longer post, complete version of this post: omics.sbs/blog/NGSoneliner/NGSoneliner.html Pipeline # fastp –in1 “$R1″…

Continue Reading NGS oneliner

problem with bcftools syntax

problem with bcftools syntax 1 Hi all! I am having difficulty with creating a bcftools command. I have a .vcf.gz file downloaded from the 1000G site and a csv file with columns chrom/pos/id/ref/alt. I would like to manipulate the downloaded vcf file so that it uses only the snps I…

Continue Reading problem with bcftools syntax

Low mutation rate in epaulette sharks is consistent with a slow rate of evolution in sharks

Compagno, L. J. V. Alternative life-history styles of cartilaginous fishes in time and space. Environ. Biol. Fishes 28, 33–75 (1990). Article  Google Scholar  Kriwet, J., Witzmann, F., Klug, S. & Heidtke, U. H. J. First direct evidence of a vertebrate three-level trophic chain in the fossil record. Proc. Biol. Sci….

Continue Reading Low mutation rate in epaulette sharks is consistent with a slow rate of evolution in sharks

How to merge my vcf files (n=6) with existing Pf6 vcf file and do pca?

How to merge my vcf files (n=6) with existing Pf6 vcf file and do pca? 0 I sampled some Pf strains and got them WGS done. Now I want to merge them with existing Pf6 data. For this I downloaded Pf6 data for all 14 chromosomes. I then used bcftools…

Continue Reading How to merge my vcf files (n=6) with existing Pf6 vcf file and do pca?

Troubleshooting multallelic variant merging issue

Hello, I want to recode the IIDs of imputed data .bgen files into two different filesets, and merge these (working on eye-level analyses with Regenie). As I’m only interested in dosages, I’ve converted these to .pgen using PLINK2 (ref-first as UK Biobank): plink2 –bgen data.bgen ref-first –sample data.sample –update-ids recoded_ids_a.txt –make-pgen…

Continue Reading Troubleshooting multallelic variant merging issue

ILIAD: A suite of automated Snakemake workflows for processing genomic data for downstream applications

Abstract Background: Processing raw genomic data for downstream applications such as imputation, association studies, and modeling requires numerous third-party bioinformatics software tools. It is highly time-consuming and resource-intensive with computational demands and storage limitations that pose significant challenges that increase cost. The use of software tools independent of one another,…

Continue Reading ILIAD: A suite of automated Snakemake workflows for processing genomic data for downstream applications

‘samtools’ aligned sequence utilities interface

R: ‘samtools’ aligned sequence utilities interface Rsamtools-package {Rsamtools} R Documentation ‘samtools’ aligned sequence utilities interface Description This package provides facilities for parsing samtools BAM (binary) files representing aligned sequences. Details See packageDescription(‘Rsamtools’) for package details. A useful starting point is the scanBam manual page. Note This package documents the following…

Continue Reading ‘samtools’ aligned sequence utilities interface

Help me understand “for stripped” in bcftools isec output

Hello, I was given a set of VCF files, comparing variants. The Readme gives the following command bcftools isec -p dir -n-1 -c all ref.vcf.gz S1.vcf.gz S2.vcf.gz S3.vcf.gz S4.vcf.GZ S5.vcf.gz S5.vcf.gz First, if I understand the doc correctly, -n-1 means “looks for SNPs found at most in a single file”?…

Continue Reading Help me understand “for stripped” in bcftools isec output

Most sensible way to find private SNPs from a multisamples vcf with bcftools

Hello, this question is somehow complementary to what I asked yesterday here: Using bcftools to find unique alt homozygous sites Now let’s say I want to find the SNPs 0/1 unique to the sample D3A350g_bcftools2 (see below) I know I can use bcftools view -s D3A350g_bcftools2.bcf -x all_bcftools2_merged.vcf But there…

Continue Reading Most sensible way to find private SNPs from a multisamples vcf with bcftools

public databases – Converting VCF format to text for use with PLINK and understanding column mapping

I successfully completed Nature PRS tutorial, which is based on PLINK. Turning to my real data, I downloaded ukb-d-20544_1.vcf.gz. Now I’m facing the problem that I seem to be unable to use it in PLINK or find the correct data format to download at all, and I am a bit…

Continue Reading public databases – Converting VCF format to text for use with PLINK and understanding column mapping

Using bcftools to find unique alt homozygous sites

Hello, I have a vcf with 20 samples. I want to find for each sample the sites that are 1/1, only in that sample (so other samples must have genotypes 0/1 or 0/0). I know I can use filters such as GT=”aa”‘ However, how do I say GT=”aa” for sample…

Continue Reading Using bcftools to find unique alt homozygous sites

Genotyping, sequencing and analysis of 140,000 adults from Mexico City

Recruitment of study participants The MCPS was established in the late 1990s following discussions between Mexican scientists at the National Autonomous University of Mexico (UNAM) and British scientists at the University of Oxford about how best to measure the changing health effects of tobacco in Mexico. These discussions evolved into…

Continue Reading Genotyping, sequencing and analysis of 140,000 adults from Mexico City

VCF indexing

VCF indexing 2 Hi all, Is it required to have the index files for vcf files in the same directory as these vcfs?I’m attempting to create index files by reading vcfs files from a write-protected directory. Thus, I am unable to make index files in the same directory. My goal…

Continue Reading VCF indexing

What is a tool to get the genome build of a VCF?

What is a tool to get the genome build of a VCF? 0 It shouldn’t be too hard to create one, but if one exists already that’s even better. I need it to be automatable / non-web based (assume no relevant info exists in the header). bcftools vcf • 133…

Continue Reading What is a tool to get the genome build of a VCF?

Determine INDELs number (both classes separately) from reference and graph-based VCF files

Hi there, this is more so of a hint/suggestion post than a real question since I could manage to find some related posts here on Biostars but appreciate a feedback on the procedure/results for the analysis. In principle, I’m trying to compare the bwa-mem_GATK pipeline working on the linear reference…

Continue Reading Determine INDELs number (both classes separately) from reference and graph-based VCF files

Solved We are now going to call variants with two different

We are now going to call variants with two different approaches from the files we have been working with all course. Please use the following files, parameters, and listed versions of the software for this assignment. We will use the reference Ebola genome: /data/compres/refs/AF086833.2.fasta And this set of paired-end sequences:…

Continue Reading Solved We are now going to call variants with two different

Filter vcf SNPs by sample GT value

Filter vcf SNPs by sample GT value 1 I have a merged VCF file with multiple samples on joined SNP set, where original genotypes have a 0|0 / 0|1 / 1|0 / 1|1 genotype (GT) and merged are fomatted as 0/0 if SNP was missing (–missing-ref option of bcftools merge)….

Continue Reading Filter vcf SNPs by sample GT value

RNAseq based variant dataset in a black poplar association panel | BMC Research Notes

Dickmann DI, Kuzovkina J. Poplars and willows of the world, with emphasis on silviculturally important species. In: Isebrands JG, Richardson J, editors. Poplars and willows: trees for society and the environment. Wallingford: CABI; 2014. Google Scholar  Imbert E, Lefèvre F. Dispersal and gene flow of Populus nigra (Salicaceae) along a…

Continue Reading RNAseq based variant dataset in a black poplar association panel | BMC Research Notes

Match variants from RNAseq with known databases

Match variants from RNAseq with known databases 0 Hi all, I’ve managed to call variants from RNAseq using existing tools out there (Strelka2, HC). Now I want to compare my variants with subsets of known variants coming from specific datasets. Here’s the question: variants are reported only on the forward…

Continue Reading Match variants from RNAseq with known databases

Effect of recombination on genetic diversity of Caenorhabditis elegans

Strong correlation exists between recombination rate and abundance and proportion of indels Whole-genome sequence data of many C. elegans wild isolates now exist. These include Illumina paired-end data of over 600 wild isolates by CeNDR, which also obtained first-generation PacBio long-read data of 14 wild isolates. Second-generation PacBio HiFi data20…

Continue Reading Effect of recombination on genetic diversity of Caenorhabditis elegans

Quality Control of VCFs that used different genotyping arrays

I have three VCFs. Two of these VCFs were generated using the Precision Medicine Research Array (PMRA) and refer to SNPs as AX numbers. I was able to merge the two PMRA VCFs together. Merged PMRA VCFs (Total genotyping rate is 0.924427): 1 AX-150343089 0 837711 T C 1 AX-149471710…

Continue Reading Quality Control of VCFs that used different genotyping arrays

Automate the Splitting of a VCF File by Sample (bcftools)

Hi! I’m very new to working with large .vcf files, and am trying to split up a particular file by strain (sample). There are about 1500 samples in the file, so going through manually isn’t really an option (although I have managed to get it to work). My problem has…

Continue Reading Automate the Splitting of a VCF File by Sample (bcftools)

Splitting VCF/BCF file into seperate gene files

Splitting VCF/BCF file into seperate gene files 0 I have a multi-sample bcf file which I would like to split into smaller files per gene so I can use this for some downstream eQTL analysis. I’ve started a bash script which pipes bcftools query -f ‘%SAMPLE\t%POS\t%REF\t%ALT\t%GT\n’ into an awk script…

Continue Reading Splitting VCF/BCF file into seperate gene files

How to filter vcf file by MAF using bcftools?

How to filter vcf file by MAF using bcftools? 0 I saw this thread (github.com/samtools/bcftools/issues/357) and it seems sometimes bcftools recomputes MAF and sometimes it takes it from INFO. Which one is recommended to use and how is this calculation different than what stated on INFO? Also, what is the…

Continue Reading How to filter vcf file by MAF using bcftools?

Transposon-encoded nucleases use guide RNAs to promote their selfish spread

Siguier, P., Gourbeyre, E., Varani, A., Ton-Hoang, B. & Chandler, M. Everyman’s guide to bacterial insertion sequences. Microbiol. Spectr. 3, MDNA3-0030-2014 (2015). Article  PubMed  Google Scholar  He, S. et al. The IS200/IS605 family and “peel and paste” single-strand transposition mechanism. Microbiol. Spectr. 3, MDNA3-0039-2014 (2015). Article  ADS  Google Scholar  Kapitonov,…

Continue Reading Transposon-encoded nucleases use guide RNAs to promote their selfish spread

Where to find info on VCF file format?

Where to find info on VCF file format? 1 The format of my vcf file is: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT chr8 1023470 chr8_1023470_T_G T G 48 . AF=3e-06;AQ=48 GT:DP:AD:GQ:PL:RNC What do AQ, AD and RNC mean? I tried looking here: samtools.github.io/hts-specs/VCFv4.1.pdf but couldn’t find them….

Continue Reading Where to find info on VCF file format?

bcftools error merging two VCFs: REF prefixes differ

Hi all, i am trying to merge two VCF files using bcftools merge. However, my command bcftools merge -m id VCF_d.vcf.gz VCF_p.vcf.gz -o merged.vcf.gz –force-samples returns the following The REF prefixes differ: TG vs GA (2,2) Failed to merge alleles at 18:786377 in VCF_d.vcf.gz These are the entries in the…

Continue Reading bcftools error merging two VCFs: REF prefixes differ

how to extract unique snps in a vcf file by comparing with multiple vcf files

how to extract unique snps in a vcf file by comparing with multiple vcf files 1 how to extract unique snps in a vcf file by comparing with multiple vcf files and make a file with unique snps EDIT by Ram OP created anotehr post a couple of hours later…

Continue Reading how to extract unique snps in a vcf file by comparing with multiple vcf files

filtering variants in a Strelka2 VCF file based on AD and AF

Dear all, I would appreciate having your suggestions on the following. I am working with a VCF file that was produced by Strelka on Tumor-Normal pairs. As it is well known, Strelka2 does not provide Allele Depth (AD) or VAF (variant allele fraction) in the VCF fields. I have used…

Continue Reading filtering variants in a Strelka2 VCF file based on AD and AF

Fast and sensitive validation of fusion transcripts in whole-genome sequencing data | BMC Bioinformatics

Al-Salama ZT, Keam SJ. Entrectinib: first global approval. Drugs. 2019;79(13):1477–83. Article  PubMed  Google Scholar  Drilon A, Laetsch TW, Kummar S, DuBois SG, Lassen UN, Demetri GD, Nathenson M, Doebele RC, Farago AF, Pappo AS, et al. Efficacy of larotrectinib in TRK fusion-positive cancers in adults and children. N Engl J…

Continue Reading Fast and sensitive validation of fusion transcripts in whole-genome sequencing data | BMC Bioinformatics

The localization of centromere protein A is conserved among tissues

Earnshaw, W. C. & Migeon, B. R. Three related centromere proteins are absent from the inactive centromere of a stable isodicentric chromosome. Chromosoma 92, 290–296 (1985). Article  CAS  PubMed  Google Scholar  Choo, K. H. Centromerization. Trends Cell Biol. 10, 182–188 (2000). Article  CAS  PubMed  Google Scholar  Marshall, O. J., Chueh,…

Continue Reading The localization of centromere protein A is conserved among tissues

Filtering VCF to divide with equal sizes

Filtering VCF to divide with equal sizes 1 Hello everyone! I have a very large VCF file (>400gb), and I want to divide it to use with VEP. VEP recommends separating the vcf, so I generated a list of contigs, based on the header, with 3^7 bases for each chromosome….

Continue Reading Filtering VCF to divide with equal sizes

mRNA vaccine quality analysis using RNA sequencing

Design and synthesis of reference plasmid A reference construct was first designed, with the intention of optimising the production of RNA therapeutics for pre-clinical research. The coding sequence of eGFP30 was selected as a reporter in the coding region, as its protein product can be assayed simply through Flow cytometry…

Continue Reading mRNA vaccine quality analysis using RNA sequencing

What is the difference between norm –multiallelics -any versus –atomize?

What is the difference between norm –multiallelics -any versus –atomize? 1 Hello, forgive my ignorance- Suppose input.vcf contains a complex multiallelic site. What is the difference between bcftools norm –multiallelics -any -f hg38.fa input.vcf versus bcftools norm –atomize -f hg38.fa input.vcf I understand what –multiallelics -any does but not sure…

Continue Reading What is the difference between norm –multiallelics -any versus –atomize?

after gatk VariantAnnotator -V *_com_norm.vcf -A AlleleFraction -O *_norm_AB.vcf There “nan,nan” or “nan” in my vcf file

after gatk VariantAnnotator -V *_com_norm.vcf -A AlleleFraction -O *_norm_AB.vcf There “nan,nan” or “nan” in my vcf file 0 After I run this code gatk VariantAnnotator -V _com_norm.vcf -A AlleleFraction -O _norm_AB.vcf there “nan,nan” or “nan” in my vcf file the input file dosen’t has “nan,nan” or “nan”, it (*_com_norm.vcf) comes…

Continue Reading after gatk VariantAnnotator -V *_com_norm.vcf -A AlleleFraction -O *_norm_AB.vcf There “nan,nan” or “nan” in my vcf file

Filter VCF File by VCF Format Variants

Filter VCF File by VCF Format Variants 0 I am trying to filter a VCF file to only include variants that are within another file, which is a txt file with VCF formatted columns (CHR POS REF ALT). I have been having a hard time finding a way to filter…

Continue Reading Filter VCF File by VCF Format Variants

Data Import Issue detectRUNS R

I am running into an issue when importing data with detectRUNS in R. The following commands to import PLINK files have not been successful, and result in blank data frames. genotypeFilePath <- system.file(“extdata”, “genome.ped”,package=”detectRUNS”) mapFilePath <- system.file(“extdata”, “genome.map”, package=”detectRUNS”) head(genotypeFilePath) [1] “” The PLINK data are correctly formatted. OR I…

Continue Reading Data Import Issue detectRUNS R

Recommended ways to merge multiple filesets with disjoint samples

Hi Chris, Thanks for the quick reply. I am getting used to eccentricities and the mentality of “right tool for right job”. BCF is highly optimized but the file floats which ends up taking entire space. My 2TiB dataset ended up taking 4.3 TiB in bcf file format while the…

Continue Reading Recommended ways to merge multiple filesets with disjoint samples

Liftover GRCh37 to hg38 1kg/GATK.

Liftover GRCh37 to hg38 1kg/GATK. 1 I need to liftover a few variants from GRCh37 to hg38 1kg/GATK. UCSC lifover does not have this reference genome version available. I have tried with the standard hg38 but conversations are wrong. Where can I find GRCh37 to hg38 1kg/GATK chain files or…

Continue Reading Liftover GRCh37 to hg38 1kg/GATK.

Mismatch repair deficiency is not sufficient to elicit tumor immunogenicity

Mice All animal use was approved by the Department of Comparative Medicine at the Massachusetts Institute of Technology (MIT) and the Institutional Animal Care and Use Committee under protocol no. 0714-076-17. Mice were housed with a 12-h light/12-h dark cycle with temperatures in the range 20–22 °C and 30–70% humidity. KrasLSL-G12D…

Continue Reading Mismatch repair deficiency is not sufficient to elicit tumor immunogenicity

The genomic footprint of whaling and isolation in fin whale populations

Samples and sequencing Tissue samples from 50 fin whales (Balaenoptera physalus) were collected using a standard protocol to obtain skin biopsies from free-ranging cetacean species, which use a small stainless-steel biopsy dart deployed from a crossbow or rifle73,74. These samples were collected throughout the Eastern North Pacific (ENP; N = 30, represented…

Continue Reading The genomic footprint of whaling and isolation in fin whale populations

what is BCFtools” f_missing” removing?

what is BCFtools” f_missing” removing? 0 I’m a bit confused by BCFtools “f_missing” command, and what exactly it is removing? Eg. if using: F_MISSING<0.1 is it removing individuals, or variants which have greater then 10% missingness? Is someone able to help explain this command, as in the manual it just…

Continue Reading what is BCFtools” f_missing” removing?

Finding Unique values on specific INFO field of the VCF file (dbNSFP, vep annotated multisample VCF)

Hello everyone! I searched the forum but coundn`t find a question that is like mine I have a multisample annotated VCF file (with dbNSFP plugin) to which I have filtered using VEP, like so: /scratch/ensembl-vep-109/ensembl-vep/filter_vep –force_overwrite –input_file {1} –output_file /home/filtering/2/2_{1/.}.vcf –only_matched –filter “(clinvar_clnsig is Pathogenic) or (clinvar_clnsig is Likely_pathogenic) After…

Continue Reading Finding Unique values on specific INFO field of the VCF file (dbNSFP, vep annotated multisample VCF)

Finding Unique values on specific INFO field of the VCF file

Finding Unique values on specific INFO field of the VCF file 0 Hello everyone! I searched the forum but coundn`t find a question that is like mine I have a VCF file to which I have filtered using VEP, like so: /scratch/ensembl-vep-109/ensembl-vep/filter_vep –force_overwrite –input_file {1} –output_file /home/filtering/2/2_{1/.}.vcf –only_matched –filter “(clinvar_clnsig…

Continue Reading Finding Unique values on specific INFO field of the VCF file