Cool Bioinformatics Scripts | PythonRepo

You can use this script in two ways

  • read tons of millions of P values from stdin
# python 
zcat pval.txt.gz | qqplot.py -out test -title "QQ plot on the fly"
# julia
zcat pval.txt.gz | qqplot.jl --out test --title "QQ plot on the fly"

warning : If you have 100 billion P values to process you should definitely use qqplot.jl instead of qqplot.py. The hourly processed lines of julia version is 3 billion while python is only 700 million on my server.

  • use qqplot.py in your script
import numpy as np
from qqplot import qqplot
p = np.random.random(1000000)
qqplot(x=p, figname="test.png")

click to see the output png

Before running bcftools merge, you maybe need to fix the ref and alt and corresponding genotypes, otherwise bcftools will surprise you.

usage: fixref.py [-h] REF_VCF IN_VCF OUT_VCF

When you run imputation analysis with BEAGLE (or other imputation tools), you may want to know the distribution of genotype discordance between the original vcf and imputed vcf.

warning : Before running the script, you must be sure the two vcfs have the exact same sites and samples for each chromosome.

usage: calc_imputed_gt_discord.py [-h] [-chr STRING] VCF1 VCF2 OUT

click to see the output png

Read more here: Source link