I am attempting to split multiallelic sites using bcftools norm
with the following command:
zcat ${inputVcf} |
sed 's/AD,Number=./AD,Number=R/g' |
sed 's/ADR,Number=./ADR,Number=R/g' |
sed 's/ADF,Number=./ADF,Number=R/g' |
bcftools norm
--fasta-ref ${genomeFa}
--check-ref s
--multiallelics -any
--output ${outputVcf}
The sed
commands were based on the recommendation from here. However I’m still getting FORMAT entries such as the following: GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 1/0:44:44:56:1,10,5:1,4,2:0,6,3:PASS:511,99,48 ./.:.:.:.:.:.:.:.:. 0/1:53:53:63:0,12,6:0,4,1:0,8,5:PASS:483,210,164
which are clearly multiallelic. Anybody know how to fix this?
Read more here: Source link