Set ancestral alleles to upper case in vcf file

Set ancestral alleles to upper case in vcf file


I am trying to set my reference allele as the ancestral allele in 1000genomes vcf files. I can do this using the --derived option in vcftools. However most of the ancestral alleles are in lowercase so vcftools is not able to correct for this.

I am currently looking at a method of extracting the ancestral alleles and converting them to upper case as such:

bcftools view -G -H file.vcf.gz | awk -F'[;=|]' '{for(i=1;i<=NF;i++)if($i=="AA"){print toupper($(i+1));next}}'

And then reinserting them.

This is quite a convoluted way of doing things and I wonder if anyone has a tidier method for doing this?


Here is a single entry from the vcf file (with genotype info hidden):

11  128196  rs576393503 A   G   100 PASS    AC=453;AF=0.0904553;AN=5008;NS=2504;DP=5057;EAS_AF=0.0159;AMR_AF=0.0259;AFR_AF=0.3071;EUR_AF=0.006;SAS_AF=0.0072;AA=g|||;VT=SNP

So here the ancestral allele is g (AA=g) and I need it to be in uppercase so that vcftools recognises it when running the --derived option.



Read more here: Source link