Indels statistics

Hi,

I have a vcf statistics for heterozygote and homozygote cases and I would like to find matches with my maf file. The issue is that the reference field in maf file is different and it exlcudes nucleotides in alternative states, e.g. if you have a ref CAA and alternative variant is CAAAAA, in maf file your ref would be AAA.

So I need a code to change the ref field and alt in my file with statistics (may be add separate columns ref2 and alt2)

Here is a snippet of my file:

CHR POS ID REF ALT
chr11 71579744 rs71049992 A ACAGCAGCTGGACTGGGAGCAGCAGGACCTG
(insertion case)

chr11 124880551 rs71859853 CCGGAGT C
(deletion case)

I think I should first count numbers of nucleotides in column4 and 5. then if number in column 4 is greater than 5 (meaning deletion), then in my ref2 that position will start from the next nucleotide different from alternative one.

For insertion, I will have an alt site changed and skipped ref nucleotides

As a result, I would like to have this:

CHR POS ID REF ALT REF2 ALT2
chr11 71579744 rs71049992 A ACAGCAGCTGGACTGGGAGCAGCAGGACCTG A CAGCAGCTGGACTGGGAGCAGCAGGACCTG

chr11 124880551 rs71859853 CCGGAGT C CGGAGT C

Thank you very much in advance!

Read more here: Source link