GitHub – Zilong-Li/BioScripts: Cool Bioinformatics Scripts

You can use make a QQ plot in the following ways.

  • one-liner for reading tons of millions of P values from the pipe
# python 
zcat pval.txt.gz | qqplot.py -out test -title "QQ plot on the fly"
# julia (recommand to run it in the REPL)
zcat pval.txt.gz | qqplot.jl --out test --title "QQ plot on the fly"

warning : If you have 100 billion P values to process you should definitely use qqplot.jl instead of qqplot.py. The hourly processed number of lines of julia version is 5 billion while python is only 700 million on my server.

  • running in a julia REPL (recommanded)
include("qqplot.jl")
cmd = pipeline(`zcat pval.gz`, `awk 'NR>1{print $10}'`)
qqfly("test", cmd=cmd)
  • use qqplot.py in your script
import numpy as np
from qqplot import qqplot
p = np.random.random(1000000)
qqplot(x=p, figname="test.png")

click to see the output png

Before running bcftools merge, you maybe need to fix the ref and alt and corresponding genotypes, otherwise bcftools will surprise you.

usage: fixref.py [-h] REF_VCF IN_VCF OUT_VCF

When you run imputation analysis with BEAGLE (or other imputation tools), you may want to know the distribution of genotype discordance between the original vcf and imputed vcf.

usage: calc_imputed_gt_discord.py [-h] [-chr STRING] VCF1 VCF2 OUT

warning : Before running the script, you must be sure the two vcfs have the exact same sites and samples for each chromosome.

click to see the output png

Read more here: Source link