Tag: QUAL

Processing WES VCF for case control GWAS analysis

Dear community I’m currently trying to try a GWAS for case control study using PLINK using whole exome VCF files. I understand that I need to convert these VCF files into PLINK’s binary format (bed/bim/fam) before I can proceed with the GWAS. From my literature I have to manually create…

Continue Reading Processing WES VCF for case control GWAS analysis

convert fasta to fastq without quality score input file

Here’s another beginner BioPython question from me… I’m running some genome assemblies for someone who has some new Illumina sequence data and also had done some sequencing a few years ago. They have some Sanger and 454 sequences (a couple thousand sequences with a couple thousand base pairs for each)…

Continue Reading convert fasta to fastq without quality score input file

gatk Hardfilter Error

gatk Hardfilter Error 0 Hello, I run the code but I got the Error. Could you pls help me with this issue? How can I fix it? gatk VariantFiltration -R reference. fasta -V input.vcf –filter-expression “FILTER == ‘PASS’ && (QUAL < 30.0 || QD < 2.0 || FS > 60.0…

Continue Reading gatk Hardfilter Error

Microsatellite markers reveal genetic diversity and population structure of Portunus trituberculatus in the Bohai Sea, China

Liu, L. et al. Identification of quantitative trait loci for growth-related traits in the swimming crab Portunus trituberculatus. Aquacult. Res. 46, 850–860 (2015). Article  CAS  Google Scholar  Sun, Q. F. et al. Astaxanthin: The ubiquitous and abundant carotenoid as a pivotal interior factor of anti-oxidation and immune for the moulting…

Continue Reading Microsatellite markers reveal genetic diversity and population structure of Portunus trituberculatus in the Bohai Sea, China

problems with MAF for MutSigCV (vcf2maf)

I am trying to run MutSIgCV and got stuck with this error: MutSigCV allsamples.md.tc.ir.br.pr.ug.dbsnp.vep.maf \ “$anno”exome_full192.coverage.txt \ “$anno”gene.covariates.txt \ my_results \ “$anno”mutation_type_dictionary_file.txt \ “$anno”chr_files_hg19 ====================================== MutSigCV v1.4 (c) Mike Lawrence and Gaddy Getz Broad Institute of MIT and Harvard ====================================== MutSigCV: PREPROCESS ——————– Loading mutation_file… Error using MutSigCV>MutSig_preprocess (line 246)…

Continue Reading problems with MAF for MutSigCV (vcf2maf)

DanMAC5: a browser of aggregated sequence variants from 8,671 whole genome sequenced Danish individuals | BMC Genomic Data

Demographics Data from three studies were included: Dan-NICAD: 1,649 individuals with symptoms of obstructive coronary artery disease, predominantly chest pain, undergoing coronary computed tomography angiography. In total, 52% were females, the mean age was 57 years (+/- 9 SD), median coronary artery calcium score were 0 [0–82] and 24% of…

Continue Reading DanMAC5: a browser of aggregated sequence variants from 8,671 whole genome sequenced Danish individuals | BMC Genomic Data

A review on blockchain for DNA sequence: security issues, application in DNA classification, challenges and future trends

Abadi M, Agarwal A, Barham P, et al. 2016 TensorFlow: large-scale machine learning on heterogeneous distributedsystems. Cornell University Library website. arxiv.org/abs/1603.04467. Published 2016. Accessed October 2016. Afshar P, Mohammadi A, Plataniotis KN. (2018) Brain tumor type classification via capsule networks, in Proc. 25th IEEE Int Conf Image Process. pp. 3129–3133….

Continue Reading A review on blockchain for DNA sequence: security issues, application in DNA classification, challenges and future trends

The Pathfinder plasmid toolkit for genetically engineering newly isolated bacteria enables the study of Drosophila-colonizing Orbaceae

Elston KM, Leonard SP, Geng P, Bialik SB, Robinson E, Barrick JE. Engineering insects from the endosymbiont out. Trends Microbiol. 2022;30:79–96. Article  CAS  PubMed  Google Scholar  Brophy JAN, Triassi AJ, Adams BL, Renberg RL, Stratis-Cullum DN, Grossman AD, et al. Engineered integrative and conjugative elements for efficient and inducible DNA…

Continue Reading The Pathfinder plasmid toolkit for genetically engineering newly isolated bacteria enables the study of Drosophila-colonizing Orbaceae

Splitting of VCF file of CSQ field in the INFO column to tabular format.

VCF file will be having seven fixed columns and INFO column. Chromosome, position, ID, ref, alt, qual, filter, and INFO column. This INFO column will be having the variant related information. In the INFO column CSQ field will be having multiple fields – 82 fields fixed with the delimeter “|”…

Continue Reading Splitting of VCF file of CSQ field in the INFO column to tabular format.

how can I generate a VCF (in hg38 coords) of differences between hg38 and CHM13?

I downloaded s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph/hprc-v1.0-minigraph-grch38.gfa.gz which contains hg38, chm13, and other assemblies, and now am trying to use vg to generate a VCF with the variants in CHM13 relative to hg38. After converting to vg format, by running vg convert <(gunzip -c hprc-v1.0-minigraph-grch38.gfa.gz) > hprc-v1.0-minigraph-grch38.vg, I tried a few different variations of…

Continue Reading how can I generate a VCF (in hg38 coords) of differences between hg38 and CHM13?

PLINK: Whole genome data analysis toolset – dbSNP

1. Introduction 2. Basic information 3. Downloads and general notes 4. Command reference table 5. Basic usage/data formats 6. Data management 7. Summary stats 8. Inclusion trim 9. Population stratification 10. IBS/IBD estimation 11. Bond 12. Family-based community 13. Permutation process 14. LD calculations 15. Multimarker tests 16. Conditional haplotype…

Continue Reading PLINK: Whole genome data analysis toolset – dbSNP

sequence analysis – How can I make this Biopython program (to correct erroneous barcodes) run faster, and is there any alternative method?

This question was migrated from Biology Stack Exchange because it can be answered on Bioinformatics Stack Exchange.Migrated yesterday. This question has also been asked on Biostars I am looking forward to getting a valuable suggestion for a bioinformatic problem. Background: Currently, I am performing a de novo whole…

Continue Reading sequence analysis – How can I make this Biopython program (to correct erroneous barcodes) run faster, and is there any alternative method?

vcf.Reader() in Python doesn’t read my VCF

vcf.Reader() in Python doesn’t read my VCF 0 I was working on a building a GUI with wxPython and wanted to import a VCF file with open(file_path, ‘r’) as file: vcf_reader = vcf.Reader(file) for record in vcf_reader: # Access the fields of each VCF record as needed chrom = record.CHROM…

Continue Reading vcf.Reader() in Python doesn’t read my VCF

Metagenome and metabolome insights into the energy compensation and exogenous toxin degradation of gut microbiota in high-altitude rhesus macaques (Macaca mulatta)

Ma, Y. et al. Gut microbiota adaptation to high altitude in indigenous animals. Biochem. Biophys. Res. Commun. 516, 120–126 (2019). Article  CAS  PubMed  Google Scholar  Guo, N. et al. Seasonal dynamics of diet-gut microbiota interaction in adaptation of yaks to life at high altitude. NPJ Biofilms Microbiomes 7, 38 (2021)….

Continue Reading Metagenome and metabolome insights into the energy compensation and exogenous toxin degradation of gut microbiota in high-altitude rhesus macaques (Macaca mulatta)

I want to correct the erroneous barcode file, and the Python code that I’ve written, using Biopython, is very slow. How can I make this process faster, and is there any alternative method?

I am looking forward to getting a valuable suggestion for a bioinformatic problem. Background: Currently, I am performing a de novo whole genome assembly. At the stage of barcode correction, I lost nearly half of all the reads due to erroneous barcodes. While library construction, 18-base molecular barcodes were used…

Continue Reading I want to correct the erroneous barcode file, and the Python code that I’ve written, using Biopython, is very slow. How can I make this process faster, and is there any alternative method?

Calculating Variant Allele Frequency

I got a VCF which I need to calculate variant allele frequency for each variant at each position. My understanding is that variant allele frequency is AD / DP There are multiple sample for each position (NA0001, NA0002, NA0003). Do I get the average for each of them as they…

Continue Reading Calculating Variant Allele Frequency

why CombineGCVFs in gatk not showing all the samples name?

why CombineGCVFs in gatk not showing all the samples name? 0 Hi, I combined the 64 .vcf files using the CombineGVCFs in gatk. The command was completed successfully but its showing the output with only one column and why not rest of the 64 samples? The output is here #CHROM…

Continue Reading why CombineGCVFs in gatk not showing all the samples name?

Invalid QUAL value on line

PLINK Error: Invalid QUAL value on line 0 Dear all, I have worked with VCF file after haplotypecaller, where QUAL=Infinity for some variants. I am trying to make PGEN format, but I have this ERROR: Error: Invalid QUAL value on line 17 of test_123-temporary.pvar.zst. The default range for QUAL is…

Continue Reading Invalid QUAL value on line

How to add attributes (i.e. gene_id, transcipt_id, exon_id, etc.) annotation from .bed file onto VCF?

I’m trying to annotate genes onto a VCF file with bcftools. My annotation file is a .bed file that originally was a hg38 UCSC knownGene gtf file, converted by BEDOPS: hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/ Original GTF file: chr1 11868 12227 . . + knownGene exon . gene_id “ENST00000456328.2”; transcript_id “ENST00000456328.2”; exon_number “1”; exon_id…

Continue Reading How to add attributes (i.e. gene_id, transcipt_id, exon_id, etc.) annotation from .bed file onto VCF?

LOINC 5034-4 Streptococcus agalactiae rRNA [Presence] in Specimen by Probe

zh-CN Chinese (China) 无乳链球菌 rRNA:存在情况或阈值:时间点:XXX:序数型:DNA探针Synonyms: B 族链球菌 rRNA B 族链球菌;B 组链球菌;B 群链球菌;GBS B 组链球菌 rRNA B 群链球菌 rRNA 不明的;其他;将在相应消息内其他部分之中加以详细说明;未作详细说明的;未作说明的;未做说明的标本;未加规定的;未加说明的标本;杂项 依次型;分类顺序型;定性的;序数型(或称等级型);性质上的;有序型;有序性分类应答;有序性分类结果;秩次型;等级型;筛查;顺序型 存在情况;存在;存在与否;是否存在;阈值;界值;界限;阀值;临界值;存在情况(存在、存在与否、是否存在)或阈值(界值、界限、阀值、临界值) 微生物学;微生物学试验;微生物学试验(培养、DNA、抗原及抗体) 时刻;随机;随意;瞬间 核糖体核糖核酸;核糖核酸 RNA 核蛋白体 RNA 核蛋白体 RNA(Ribosomal RNA,RRNA) 核蛋白体核糖核酸 核醣体 RNA 脱氧核糖核酸探针 es-AR Spanish (Argentina) ARNr de Streptococcus agalactiae:concentración arbitraria:punto en el tiempo:XXX:ordinal:sonda fr-CA French (Canada) Streptococcus agalactiae…

Continue Reading LOINC 5034-4 Streptococcus agalactiae rRNA [Presence] in Specimen by Probe

What is the major problem with this pipeline of SNPs analysis?

First, I have several Aspergillus flavus (A kind of fungi species) illumina sequencing raw data as pair of fastq.gz file (sample1_filtered_1.fastq.gz and sample1_filtered_2.fastq.gz). And I wanted to assemble illumina fragment sequences and make SNP(single nucleotide polymorphism) analysis with the reference genome, Aspergillus flavus NRRL3357 as fasta file. At the end…

Continue Reading What is the major problem with this pipeline of SNPs analysis?

Problem to convert genotypes to plink format from bcftools

I had a vcf file with imputed genotypes from beagle like this: ##fileformat=VCFv4.2 ##filedate=20210504 ##source=”beagle.18May20.d20.jar” ##INFO=<ID=AF,Number=A,Type=Float,Description=”Estimated ALT Allele Frequencies”> ##INFO=<ID=DR2,Number=1,Type=Float,Description=”Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose”> ##INFO=<ID=IMP,Number=0,Type=Flag,Description=”Imputed marker”> ##FORMAT=<ID=GT,Number=1,Type=String,Description=”Genotype”> ##FORMAT=<ID=DS,Number=A,Type=Float,Description=”estimated ALT dose [P(RA) + 2*P(AA)]”> ##FORMAT=<ID=GP,Number=G,Type=Float,Description=”Estimated Genotype Probability”> #CHROM POS ID…

Continue Reading Problem to convert genotypes to plink format from bcftools

Example of overlapping vcf calls with *

I’m looking for help understanding the example of spanning alleles and multi-allelic loci on luntergroup.github.io/octopus/docs/guides/advanced/vcf/ The ALT field values are OK, but the GT values don’t make sense to me. BAM files in IGV-style display show 3 samples. 1st, HG002, has a 4-bp del starting at 728 in half the…

Continue Reading Example of overlapping vcf calls with *

plink2 export tped 12 not working as expected

Hi, When I take a VCF file, convert it to plink, and then export it as a tped, the modifier ’12’ does not work as expected. I thought this modifier causes ALT1 alleles to be coded as ‘1’ and REF alleles as ‘2’. However, it appears to code REF alleles…

Continue Reading plink2 export tped 12 not working as expected

Variant caller reports a homozygous variant genotype, but more reads are associated with reference

Variant caller reports a homozygous variant genotype, but more reads are associated with reference 0 Hi there, I’m confused about how to interpret this output from calling variants using bcftools: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GSM5292852 1065632 chr9 41242177 . T C 6.65947 . DP=37;VDB=0.133454;SGB=-0.662043;RPBZ=2.91136;MQBZ=4.0715;BQBZ=1.05041;SCBZ=-0.480069;MQ0F=0;AC=2;AN=2;DP4=0,26,0,9;MQ=5 GT:PL:AD…

Continue Reading Variant caller reports a homozygous variant genotype, but more reads are associated with reference

Variant Calling with Multiple Individuals in Freebayes

Variant Calling with Multiple Individuals in Freebayes 1 Hello, I am currently a new student in Bioinformatics and was trying to do variant calling with Freebayes software. I have 10 individuals and also have a reference genome. I have already sorted and indexed all my .bam files ready for variant…

Continue Reading Variant Calling with Multiple Individuals in Freebayes

mpileup2sync

mpileup2sync 0 Hello there, I’m new to doing this type of analysis. I’m trying to convert a mpileup file into the synchronized file format (sync) but I have a problem using the script that I found. This is the script: mpileup2sync –input pools_all.mpileup –output pools_all.sync –fastq-type sanger –min-qual 20 –threads…

Continue Reading mpileup2sync

TxDB.Hsapiens.UCSC.hg38.knownGene with locateVariants() identifying SNPs from various chromosome being part of the same gene

I am trying to annotate a list of SNPs using the hg38 genome (knownGene) and locateVariants(). The program is able to successfully run and provide “GeneIDs” for several of the loci. However, some GeneIDs are applied to SNPs in completely different regions and on completely different chromosomes. When I cross…

Continue Reading TxDB.Hsapiens.UCSC.hg38.knownGene with locateVariants() identifying SNPs from various chromosome being part of the same gene

Picking the Perfect Data Visualization: Barplots

Data Science How-To’s Python Tutorials This blogpost is the second in a series where we explain the most common data visualization types and how you can best use them to explore your data and tell its story. In this post, we’ll cover barplots, which can give us great insight into…

Continue Reading Picking the Perfect Data Visualization: Barplots

LOINC 76575-0 Bacterial 16S rRNA [Presence] in Specimen by NAA with probe detection

Term Description This term was created for, but is not limited in use to, Pathogenius Laboratories’ Level 1 Wound and/or ENT test panels, which use quantitative PCR to identify commonly-found microorganisms in wound or ear, nose, and throat samples, respectively. Source: Regenstrief LOINC Part Description LP189395-9   Bacterial 16S rRNASequencing…

Continue Reading LOINC 76575-0 Bacterial 16S rRNA [Presence] in Specimen by NAA with probe detection

Can not get bcftools norm to join biallelics into a multiallelic.

Forum:Can not get bcftools norm to join biallelics into a multiallelic. 0 is this the right way to use bcftools to join/merge biallelic records into a multiallelic? If so, it is not working. No errors but it gives me the same file with my command added to the headers. Example…

Continue Reading Can not get bcftools norm to join biallelics into a multiallelic.

LOINC 16584-5 Chlamydophila pneumoniae rRNA [Presence] in Sputum by Probe

zh-CN Chinese (China) 肺炎嗜性衣原体 rRNA:存在情况或阈值:时间点:痰液:序数型:DNA探针Synonyms: CPN;Twar 制剂;Twar 株;肺炎衣原体;肺炎衣原体 Twar 株;鹦鹉热衣原体TWAR-TW株 下呼吸道;痰 亲衣原体 亲衣原体属 依次型;分类顺序型;定性的;序数型(或称等级型);性质上的;有序型;有序性分类应答;有序性分类结果;秩次型;等级型;筛查;顺序型 嗜性衣原体属 嗜衣体 嗜衣体属 嗜衣原体 嗜衣原体属 存在情况;存在;存在与否;是否存在;阈值;界值;界限;阀值;临界值;存在情况(存在、存在与否、是否存在)或阈值(界值、界限、阀值、临界值) 微生物学;微生物学试验;微生物学试验(培养、DNA、抗原及抗体) 时刻;随机;随意;瞬间 核糖体核糖核酸;核糖核酸 RNA 核蛋白体 RNA 核蛋白体 RNA(Ribosomal RNA,RRNA) 核蛋白体核糖核酸 核醣体 RNA 脱氧核糖核酸探针 衣原体 衣原体属 fr-CA French (Canada) Chlamydophila pneumoniae , ARNr:Présence-Seuil:Temps ponctuel:Expectorations:Ordinal:Sonde es-AR Spanish (Argentina) ADN de Chlamydophila pneumoniae:concentración…

Continue Reading LOINC 16584-5 Chlamydophila pneumoniae rRNA [Presence] in Sputum by Probe

Getting matrix of QDs from VCF file

Getting matrix of QDs from VCF file 1 Hi, I have a vcf file and I would like to get a site-by-individual matrix of read depths (the DP label) and a second matrix of just the GQ scores. What is the easiest way to do this? Thanks in advance! Ex…

Continue Reading Getting matrix of QDs from VCF file

Genome-wide significant risk loci for mood disorders in the Old Order Amish founder population

GBD 2017 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1789–858. Article  Google Scholar  Smoller JW, Finn…

Continue Reading Genome-wide significant risk loci for mood disorders in the Old Order Amish founder population

Filtering of tricky overlapping sites in VCF

I’m working on a project where I’m working with variants ranging from low frequency (AF~=0.03) up to high frequency (AF=1) after variant calling with lofreq. There are some cases where positions in the vcf are duplicated and this is leading to issues with my downstream processing of the data. Consider…

Continue Reading Filtering of tricky overlapping sites in VCF

Survival outcomes of patients newly diagnosed with diffuse large B-cell lymphoma: real-world evidence from a German claims database

Adzersen K, Friedrich S, Becker N (2016) Are epidemiological data on lymphoma incidence comparable? Results from an application of the coding recommendations of WHO, InterLymph, ENCR and SEER to a cancer registry dataset. J Cancer Res Clin Oncol 142:167–175 Article  PubMed  Google Scholar  Ardeshna KM, Smith P, Norton A et…

Continue Reading Survival outcomes of patients newly diagnosed with diffuse large B-cell lymphoma: real-world evidence from a German claims database

Health-related quality of life with pembrolizumab or placebo plus chemotherapy with or without bevacizumab for persistent, recurrent, or metastatic cervical cancer (KEYNOTE-826): a randomised, double-blind, placebo-controlled, phase 3 trial

Summary Background In the KEYNOTE-826 study, the addition of the anti-PD-1 monoclonal antibody pembrolizumab to chemotherapy with or without bevacizumab improved overall survival and progression-free survival (primary endpoints) versus placebo plus chemotherapy with or without bevacizumab, with manageable toxicity, in patients with persistent, recurrent, or metastatic cervical cancer. In this…

Continue Reading Health-related quality of life with pembrolizumab or placebo plus chemotherapy with or without bevacizumab for persistent, recurrent, or metastatic cervical cancer (KEYNOTE-826): a randomised, double-blind, placebo-controlled, phase 3 trial

LOINC 5024-5 Mycobacterium gordonae rRNA [Presence] in Specimen by Probe

zh-CN Chinese (China) 戈登分枝杆菌 rRNA:存在情况或阈值:时间点:XXX:序数型:DNA探针Synonyms: AFB 不明的;其他;将在相应消息内其他部分之中加以详细说明;未作详细说明的;未作说明的;未做说明的标本;未加规定的;未加说明的标本;杂项 依次型;分类顺序型;定性的;序数型(或称等级型);性质上的;有序型;有序性分类应答;有序性分类结果;秩次型;等级型;筛查;顺序型 分支杆菌 分支杆菌属 分枝杆菌属 劳瘵 存在情况;存在;存在与否;是否存在;阈值;界值;界限;阀值;临界值;存在情况(存在、存在与否、是否存在)或阈值(界值、界限、阀值、临界值) 微生物学;微生物学试验;微生物学试验(培养、DNA、抗原及抗体) 戈氏分支杆菌 rRNA 戈氏分支杆菌;戈氏分枝杆菌;戈登分支杆菌 戈氏分枝杆菌 rRNA 戈登分支杆菌 rRNA 抗酸 抗酸杆菌 抗酸杆菌(Acid fast bacillus,AFB) 抗酸菌 时刻;随机;随意;瞬间 核糖体核糖核酸;核糖核酸 RNA 核蛋白体 RNA 核蛋白体 RNA(Ribosomal RNA,RRNA) 核蛋白体核糖核酸 核醣体 RNA 痨 痨病 结核 结核病 肺结核 脱氧核糖核酸探针 es-AR Spanish (Argentina) ARNr de Mycobacterium…

Continue Reading LOINC 5024-5 Mycobacterium gordonae rRNA [Presence] in Specimen by Probe

problem with chromosomes in michigan imputation server

problem with chromosomes in michigan imputation server 4 Hi, I am trying to impute a dataset using the Michigan imputation server but I got this error: No valid chromosomes found! I have vcf (with tabix index) files from 1-23 chromosomes that look like this: Should I add chr at the…

Continue Reading problem with chromosomes in michigan imputation server

summary | Finding WGCNA modules that are “absent” from the healthy cells and so “exclusive” to cancer cells ?

Hello, I am currently using the hdWGCNA package on single cell data. So I get modules on a scRNAseq of cancer cells that I project on healthy cells in order to see what are the modules that are “absent” from the healthy cells and so “exclusive” to cancer cells. To…

Continue Reading summary | Finding WGCNA modules that are “absent” from the healthy cells and so “exclusive” to cancer cells ?

how to seperate VEP INFO column into seperate columns

I have a vcf files like below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT treatmentSample chr1 857100 . C T 1756.06 PASS AC=2;AF=1;AN=2;DP=60;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=29.27;SOR=1.812;CSQ=chr1:857100|T|SNV|ENSG00000228794|ENST00000445118|LINC01128||1|MODIFIER|non_coding_transcript_exon_variant||||5/5|||||||||||||||||| GT:AD:DP:GQ:PL 1/1:0,60:60:99:1770,180,0 Does anyone know how to seperate INFO columns into different columns? And also how to separate treatmentSample column following the FORMAT ORDER? I…

Continue Reading how to seperate VEP INFO column into seperate columns

GATK HaplotypeCaller combine info from two BAM into one line in vcf (not divide into samples column)

Hi I run the GATK HaplotypeCaller and hope to get a file where each sample will have a column. My bam file looks like this: input_bam/SRR8859080.bam input_bam/ENCFF477JTA_new.bam This is my GATK command: allele_chunk_file=rs_coord.vcf gatk_run_line=”../bin/gatk-4.1.2.0/gatk” outfile=wgs_test_out.genotypes.vcf bam_file=wgs_test.bam.list genome_seq=”../hg38.fa” intervals=wgs_test.bed $gatk_run_line \ HaplotypeCaller\ –reference $genome_seq \ –input $bam_file \ –genotyping-mode GENOTYPE_GIVEN_ALLELES \…

Continue Reading GATK HaplotypeCaller combine info from two BAM into one line in vcf (not divide into samples column)

How To Install bbmap on Ubuntu 20.04

In this tutorial we learn how to install bbmap on Ubuntu 20.04. bbmap is short read aligner and other bioinformatic tools b5833726e45421cf74f3885c83040a6f Introduction In this tutorial we learn how to install bbmap on Ubuntu 20.04. What is bbmap bbmap is: BBMap: Short read aligner for DNA and RNA-seq data. Capable…

Continue Reading How To Install bbmap on Ubuntu 20.04

Miseq mothur error with SOP in galaxy

Is there an update for the following SOP: 16S Microbial Analysis with mothur (extended) 1 We keep getting an error after the unique.seq command. Help!! Help! SOP update? Error after Unique.seq Help! SOP update? Error after Unique.seq Mhayes 6d Is there an update for the following SOP: 16S Microbial Analysis…

Continue Reading Miseq mothur error with SOP in galaxy

fastq-dump split-spot and skip-technical

Hi All, Anyone can make a clear interpretation to the split-spot and skip-technical option in fastq-dump? –split-spot Split spots into individual reads. (**I guess this option split each read to two parts**) –skip-technical Dump only biological reads. What does –split-spot can be used? What’s principle of skip-technical? What’s difference between…

Continue Reading fastq-dump split-spot and skip-technical

Help please! Miseq SOP with galaxy toolsuite

Is there an update for the following SOP: 16S Microbial Analysis with mothur (extended) 1 We keep getting an error after the unique.seq command. Help!! Help! SOP update? Error after Unique.seq Help! SOP update? Error after Unique.seq Mhayes 6d Is there an update for the following SOP: 16S Microbial Analysis…

Continue Reading Help please! Miseq SOP with galaxy toolsuite

proteins – open and read Fasta file (raw data)

It sounds like all you really want is to print every 2nd line of a text file. If so, you don’t need Python, let alone BioPython, you can do it with basic *nix tools: $ awk ‘NR%4==2’ pdb.fasta_qual.dataset 247 NR is the current line number, and % is the modulo…

Continue Reading proteins – open and read Fasta file (raw data)

calctruequality in bbmap

calctruequality in bbmap 1 I’m trying to recalibrate Q scores of a NextSeq run using MiSeq contigs assembled with Tadpole. The commands I used to map the reads to the reference were as follows: bbmap.sh in=concatABC.fastq.gz outm=mapped.sam ref=./Lpe09_06TdpAssemblies/contigs09_06.fa ignorequality maxindel=100 minratio=0.4 ambig=toss qahist=qahist_raw.txt qhist=qhist_raw.txt mhist=mhist_raw.txt The command I used to…

Continue Reading calctruequality in bbmap

Randomize Read Order In Multigbp Fastq File?

Randomize Read Order In Multigbp Fastq File? 3 Is there any method to randomize the read order in a multi-Gbp fastq file? fastq • 6.0k views Assuming you are talking about a single-end file, you can use awk to put each 4-line fastq entry on a single line. You then…

Continue Reading Randomize Read Order In Multigbp Fastq File?

Issue with VCF format while using Pharmcat

Hello everybody, I am using pharmcat tool’s prerprocessor feature to preprocessmy vcf file using the command > python3 pharmcat_vcf_preprocessor.py -vcf sample.vcf But I think there is some issue with my vcf file as this command outputs an error > Reading samples from sample.vcf … Saving output to . > >…

Continue Reading Issue with VCF format while using Pharmcat

How to Calulate Allele Frequency from a VCF File?

I have a VCF file with 200 samples (mitochondrial genome of Plasmodium falciparum). Here is a pic to take a look at: And a few relevant lines from the actual file: ##INFO=<ID=AC,Number=A,Type=Integer,Description=”Allele count in genotypes, for each ALT allele, in the same order as listed”> ##INFO=<ID=AF,Number=A,Type=Float,Description=”Allele Frequency, for each ALT…

Continue Reading How to Calulate Allele Frequency from a VCF File?

An error occurs while running pre.cluster command – Commands in mothur

Sam92 January 25, 2023, 6:37am #1 Hi I am running in to a problem with the pre.cluster step. I am running it as a batch file and the output in mothur log file is as follows. mothur > summary.seqs(fasta=current) Using MethodF2.trim.pcr.trim.good.unique.good.filter.unique.fasta as input file for the fasta parameter. Using 40…

Continue Reading An error occurs while running pre.cluster command – Commands in mothur

PHG Load haplotype and create consensus

Here, presented my PHG scripts, config, wgs_keyfile. 1. Create valid intervals docker run –name test_assemblies –rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t maizegenetics/phg:1.0 /tassel-5-standalone/run_pipeline.pl -Xmx100G -debug -configParameters /phg/Masterconfig.txt -CreateValidIntervalsFilePlugin -intervalsFile /phg/inputDir/reference/glyma.Wm82.gnm4.ann1.T8TQ.gene_models_main.bed -referenceFasta /phg/inputDir/reference/glyma.Wm82.gnm4.4PTR.genome_main.fixed.fna.gz -mergeOverlaps true -generatedFile /phg/validBedFile.bed -endPlugin &> Log/1.Create_validinterval.txt & 2. Create initial DB docker run –name create_initial_db –rm -v /DATA/jysong/PHG/ver1.0_phg/:/phg/ -t…

Continue Reading PHG Load haplotype and create consensus

Download “Epigenetic Memory” Phenomenon in Induced Pluripotent Stem Cells. PDF

reVIeWS “Epigenetic Memory” Phenomenon in Induced Pluripotent Stem Cells E.A. Vaskova1,2,3, A.E. Stekleneva1,2,3, S.P. Medvedev1,2,3, S.M. Zakian1,2,3,* 1Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, prosp. Akad. Lavrentyeva, 10, Novosibirsk, Russia, 630090 2Meshalkin State Research Institute of Circulation Pathology, Rechkunovskaya Str., 15, Novosibirsk, Russia, 630055 3Institute of…

Continue Reading Download “Epigenetic Memory” Phenomenon in Induced Pluripotent Stem Cells. PDF

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Sample preparation We ordered the GIAB samples from the Coriell Institute (NA24385, NIST ID HG002; NA24149, NIST-ID HG003 and NA24143, NIST-ID HG004). DNA concentration was measured by Qubit. The library was constructed according to Illumina TruSeq DNA PCR Free Library Prep protocol HT (Illumina Inc., San Diego, CA, USA) for…

Continue Reading Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Compressing BAM, SAM, CRAM | Genozip

How good is Genozip at compressing BAM files? ​ See Benchmarks. ​ Compressing a BAM, SAM or CRAM file  ​ In the rest of this page we will give examples of BAM files. Genozip is also capable of compressing SAM files, and with some limitations, CRAM files as well. ​…

Continue Reading Compressing BAM, SAM, CRAM | Genozip

As of July 2015, the VCFtools project has been moved to github! Please visit the new website here: vcftools.github.io/man_0112a.html

NAME SYNOPSIS DESCRIPTION EXAMPLES BASIC OPTIONS SITE FILTERING OPTIONS INDIVIDUAL FILTERING OPTIONS GENOTYPE FILTERING OPTIONS OUTPUT OPTIONS COMPARISON OPTIONS AUTHOR NAME VCFtools v0.1.12a − Utilities for the variant call format (VCF) and binary variant call format (BCF) SYNOPSIS vcftools [ –vcf FILE | –gzvcf FILE | –bcf FILE]…

Continue Reading As of July 2015, the VCFtools project has been moved to github! Please visit the new website here: vcftools.github.io/man_0112a.html

GATK vs DeepVariant : bioinformatics

Hi everyone, I am currently working on a medium-sized WES cohort study and wanted to know what the bioinformatics community would regard a cutting-edge workflow. As the big labs usually utilize GATK I also started with that. The results for SNPs are ok, but manual inspection (IGV) still uncovers a…

Continue Reading GATK vs DeepVariant : bioinformatics

python – Matching two files(vcf to maf) using a dictionaries, and appending the contents

annotation_file ##INFO=<ID=ClinVar_CLNSIG,Number=.,xxx ##INFO=<ID=ClinVar_CLNREVSTAT,Number=.,yyy ##INFO=<ID=ClinVar_CLNDN,Number=.zzz #CHROM POS ID REF ALT QUAL FILTER INFO chr1 10145 . AAC A 101.83 . AC=2;AF=0.067;AN=30;aaa chr1 10146 . AC A 98.25 . AC=2;AF=0.083;AN=24;bbb chr1 10146 . AC * 79.25 . AC=2;AF=0.083;AN=24;ccc chr1 10439 . AC A 81.33 . AC=1;AF=0.008333;AN=120;ddd chr1 10450 . T G 53.09…

Continue Reading python – Matching two files(vcf to maf) using a dictionaries, and appending the contents

BAM file and no RNAME or POS information? : bioinformatics

Newbie here. Please, play nice. I got possession of a set of 4 .bam files that stores the exome of an individual, around 400 MB each. I used samtools to generate a 2.4 GB .sam file out of one of the .bam files, and I found it contains lines with…

Continue Reading BAM file and no RNAME or POS information? : bioinformatics

(ERR): bowtie2-align exited with value 13

bowtie2 – (ERR): bowtie2-align exited with value 13 1 I am trying to run bowtie2. but following error are occuring everytime bowtie2 –very-fast-local -x bowtie -q -1 R1.fastq -2 R2.fastq -s aligned.sam Saw ASCII character 10 but expected 33-based Phred qual. terminate called after throwing an instance of ‘int’ Aborted…

Continue Reading (ERR): bowtie2-align exited with value 13

Why did I achieve shorter than initial reads subset after aligned reads extraction.

Why did I achieve shorter than initial reads subset after aligned reads extraction. 1 Hello dear colleages! I have recently faced some problem. I have worked with long WGS reads. Firstly I have filtered the longest subset of reads, and aligned them to the custom sequence with several structural variants…

Continue Reading Why did I achieve shorter than initial reads subset after aligned reads extraction.

Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

Provided by: sambamba_0.8.2+dfsg-2_amd64 NAME sambamba-view – tool for extracting information from SAM/BAM files SYNOPSIS sambamba view OPTIONS <input.bam | input.sam> [region1 […]] DESCRIPTION sambamba view allows to efficiently filter SAM/BAM files for alignments satisfying various conditions, as well as access its SAM header and information about reference sequences. In order…

Continue Reading Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

Chief of Bioinformatics | ID/HIV Career Center

Business Title Chief, Bioinformatics, Public Health Laboratory Civil Service Title CITY RESEARCH SCIENTIST Title Classification Non-Competitive Proposed Salary Range $ 96,772.00 – $140,660.00 (Annual) Work Location 455 First Ave., N.Y. Division/Work Unit PHL Admin & Lab Support As of August 2, 2021, all new hires must be vaccinated against the…

Continue Reading Chief of Bioinformatics | ID/HIV Career Center

BBTools – BioGrids Consortium – Supported Software

AllHigh-Throughput SequencingGenomicsProteomicsVisualizationOther BBTools Description a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving. Installation Use the following command to…

Continue Reading BBTools – BioGrids Consortium – Supported Software

SeqIO object get cleared away after being accessed

I’m using Biopython to parse a fastq file, and I found that the SeqIO object get cleared away once I accessed it. from Bio import SeqIO record_fastqIO = SeqIO.parse(‘SRR835775_1.first1000.fastq’,’fastq’) for record in record_fastqIO: print(record.id) This script works perfectly. But if I add one line to the script: from Bio import…

Continue Reading SeqIO object get cleared away after being accessed

Split multiallelic SNPs to biallelic from vcf

Dear all, I have a particular vcf file like this, chrX 29 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2 I tried various tools to split this, but I get the following results, so the FORMAT and INFO lines are identical. chrX 29 . G A . PASS AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP…

Continue Reading Split multiallelic SNPs to biallelic from vcf

bedtools intersect error: Invalid record in file

Hello to all I am trying to run bedtools intersect with vcf file and a bed file (my goal is to add the depth data to my VCF) I get an error running this command: bedtools intersect -a depth.bed -b fish.vcf -wa -wb > $out The error: “Error: Invalid record…

Continue Reading bedtools intersect error: Invalid record in file

Issue with fastq after converting phred 64 to phred 33 quality scores

Hello, I ran seqtk seq -VQ64 read1.fastq.gz > read1_phred33.fastq to convert my 64 based phred score reads to 33 based phred score phred reads. However when I attempted to run them through tophat alignment I got this error: Saw ASCII character 4 but expected 33-based Phred qual. terminate called after…

Continue Reading Issue with fastq after converting phred 64 to phred 33 quality scores

Dragen-gatk for trio

Dragen-gatk for trio 0 Hi everyone, the Dragen gatk pipeline works great for single sample. however I would like to know if any have used this pipeline for a trio? if so how did you do it? it is recommended to do a hard filtering based on QUAL but how…

Continue Reading Dragen-gatk for trio

how to add reference alleles to VCF?

how to add reference alleles to VCF? 1 I’m converting gVCFs to VCF, but the reference alleles are missing. An example below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 180525_FD02929177 1 97547947 . T . . . DP=31 GT:DP:RGQ 0/0:31:81 1 97915614 . C . . . DP=40…

Continue Reading how to add reference alleles to VCF?

No quality in non-variant sites GATK

No quality in non-variant sites GATK 1 Heys, I am doing the SNP calling with Haplotypecaller BP_Resolution, CombineGVCFs with convert-to-base-pair-resolution and GenotypeGVCFs with include-non-variant-sites with GATK and when I get my vcf file, the non-variant sites does not have any quality at all: #CHROM POS ID REF ALT QUAL FILTER…

Continue Reading No quality in non-variant sites GATK

VCF samtools

VCF samtools 0 Hello, I am having trouble when doing variant calling with samtools. I am getting only the header an no variants. If I would instead use Freebayes, I do get a lot of variables, and with Gatk, I get just a few. What can the problem be? Do…

Continue Reading VCF samtools

add gene names to ‘isec’ output files of bcftools’

add gene names to ‘isec’ output files of bcftools’ 1 I had two vcf files and I used isec from bcftools software to find typical and common mutations between samples. The output of isec function were four vcf.gz file showing like below: isec_output/0000.vcf.gz would be variants unique to 1.vcf.gz isec_output/0001.vcf.gz…

Continue Reading add gene names to ‘isec’ output files of bcftools’

I can’t get a dossage file using PLINK

Hi, I have been trying to get a dosage file from vcf, map and fam files. For that, I have written this bash script : plink –fam plink.fam –map plink.map –dosage one.vcf –write-dosage However, I got this error: –dosage: Reading from one.vcf. Error: Line 1 of one.vcf has fewer tokens…

Continue Reading I can’t get a dossage file using PLINK

vcf to bgen conversion using qctool v2 yields 0 snps

Hi all, I have a vcf file that was extracted from UKB data using qctool (v2.0.6-Ubuntu16.04-x86_64) and contains data in the GP format. This contains a bunch of SNPs from a single chromosome. ❱ wc -l chromosome1.vcf 260 chromosome1.vcf Then I try to convert this file to .bgen again using…

Continue Reading vcf to bgen conversion using qctool v2 yields 0 snps

predixcan error

predixcan error 0 Hello, I am trying to run predict.py script from predixcan software But its showing error for me. The command use: python $PXCN_TOOLS/PrediXcan.py –model_db_path $MODELS/en_Whole_Blood.db –model_db_snp_key rsid –vcf_mode genotyped –vcf_genotypes $VCF_FILES/*.vcf –prediction_output $OUTPUT/GVDS_PrediXcan_Test_2021.txt the error: [E::bcf_hdr_parse] Could not parse the header, sample line not found Segmentation fault I…

Continue Reading predixcan error

bcftools merge

Check out the vcf_merge command I wrote: $ fuc vcf_merge -h usage: fuc vcf_merge [-h] [–how TEXT] [–format TEXT] [–sort] [–collapse] vcf_files [vcf_files …] This command will merge multiple VCF files (both zipped and unzipped). It essentially wraps the ‘pyvcf.merge’ method from the fuc API. By default, only the GT…

Continue Reading bcftools merge

Edit vcf file 0|0 to 0

Edit vcf file 0|0 to 0 1 I have a vcf file with GT format as 0|0 0|1 1|1 etc. I would like to convert those to a single number to create a dosage file. Ex: Editing the vcf so that 0|0 become 0, 0|1 becomes 1 1|1 becomes 2…

Continue Reading Edit vcf file 0|0 to 0

Output of samtools view, what does the third column actually represent?

The samtools view outputs information from SAM and BAM files in SAM format. You can find a description of the SAM format here: samtools.github.io/hts-specs/SAMv1.pdf Section 1.4 deals with the meaning of each of the manditory coloumns. It includes the following table: Col Field Type Regexp/Range Brief description |—|——|——-|—————————-|—————————————-| 1 QNAME…

Continue Reading Output of samtools view, what does the third column actually represent?

Extract multiple times a fasta sequence from a list by name

Hi everybody! I have uploaded on R a list of 9K fasta sequences, on which 40K SNPs map to – which means, some sequence host 1+ SNP. I have a R object (and a vcf as well) with the fasta sequences names and the SNP positions and I want to…

Continue Reading Extract multiple times a fasta sequence from a list by name

bcftools merge; retaining sample names

bcftools merge; retaining sample names 2 When I do bcftools merge, the headers do not retain the filenames.  How can I specify filenames? This is my command  bcftools merge vcf/unfiltered/*.vcf.gz -O z > msa/pooled.vcf.gz However this is the relevant part of my header, despite the filenames I gave it.  Is…

Continue Reading bcftools merge; retaining sample names

Bcftools how to add DP to FORMAT field (get per sample read depth for REF vs ALT alleles )

Bcftools how to add DP to FORMAT field (get per sample read depth for REF vs ALT alleles ) 1 I’m trying to achieve what this post was looking for Add Dp Tag To Genotype Field Of Vcf File Currently this is my command: bcftools mpileup -Ou –max-depth 8000 –min-MQ…

Continue Reading Bcftools how to add DP to FORMAT field (get per sample read depth for REF vs ALT alleles )

FreeBayes VCF output with FORMAT unknown

Hey, I am looking for a way to add samples ID names to the FORMAT in my vcf file. I have 10 sorted Bam files. I used Freebayes to create vcf files and my next step is merging all 10 files for VcfSampleCompare. And for that I need to define…

Continue Reading FreeBayes VCF output with FORMAT unknown

Change chromosome notation in dbSNP VCF file

Change chromosome notation in dbSNP VCF file 0 Hiii, I have downloaded dbSNP VCf file from [ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/] The format is as follows: #CHROM POS ID REF ALT QUAL FILTER INFO 1 10019 rs775809821 TA T . . RS=775809821;RSPOS=10020;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000020005000002000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP 1 10039 rs978760828 A C . . RS=978760828;RSPOS=10039;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP 1 10043 rs1008829651 T…

Continue Reading Change chromosome notation in dbSNP VCF file

Convert a VCF-file in a user specific Format

Convert a VCF-file in a user specific Format 0 Hello everyone, I am curious if it is possible to convert a VCF-File (with multiple samples) in a Format whith 5 columns. Column should be Sample ID Column: Position on the chromosome Genotyp Number of reads covering site QUAL phred-scaled quality…

Continue Reading Convert a VCF-file in a user specific Format

Platypus

Platypus 0 Hi, I’m super new to WGS and bioinformatics, but I’m a classic software data scientist, so I know enough to be annoying. I’m using Platypus too call variants on 100X WGS via Nebula Genomics. I found an odd series of calls and am not sure if this is…

Continue Reading Platypus

Variant Calling Heterozygous Reference Alleles

I am going to be working with VCF files a lot in the near future so I thought I would brush up on the practice. After much reading and research, there’s something that I just can’t wrap my head around. 1) In a diploid organism, you have 2 alleles for…

Continue Reading Variant Calling Heterozygous Reference Alleles

Inquiry related to vcf file and formatting

Hello everyone, I am trying to run predixcan software. But its showing error as segmentation fault implying that there is something wrong with my vcf files. I am sharing the header of vcf file. ##fileformat=VCFv4.1 ##INFO=<ID=LDAF,Number=1,Type=Float,Description=”MLE Allele Frequency Accounting for LD”> ##INFO=<ID=AVGPOST,Number=1,Type=Float,Description=”Average posterior probability from MaCH/Thunder”> ##INFO=<ID=RSQ,Number=1,Type=Float,Description=”Genotype imputation quality from…

Continue Reading Inquiry related to vcf file and formatting

print only columns with data from every line

print only columns with data from every line 0 Hi, I have a vcf file where is about 60 000 columns. Here is example of the first three lines: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 10022-20416-17 10024-34469-18A 10025-34469-18B 10034-31625-18A 10035-31625-18B 10036-31625-18C 10042-29083-18 10044-34485-18A 10045-34485-18B 10046-34485-18C 10069-33802-18 10070-20895-17…

Continue Reading print only columns with data from every line

bcftools consensus still returns “Could not parse the header” error

bcftools consensus still returns “Could not parse the header” error 0 I attempted to create a consensus fasta file using bcftools, i.e. bgzip -c All_SRR_SNP_Clean.vcf > All_SRR_SNP_Clean.vcf.gz tabix All_SRR_SNP_Clean.vcf.gz cat $ref| bcftools consensus $vcf_dir/All_SRR_SNP_Clean.vcf.gz > consensus.fasta where $ref is the path to a Drosophila reference genome fa and the vcf…

Continue Reading bcftools consensus still returns “Could not parse the header” error

VCF Filter On Small Genomes

VCF Filter On Small Genomes 0 Hi guys, I am working on a yeast species (Candida glabrata) NGS data to find any mutations related to drug resistance. I am new in bioinformatics so I am using Galaxy.eu to get use to algorithms. There is literature about some genes that mutations…

Continue Reading VCF Filter On Small Genomes