Filtering long indels from VCF
to create a multi-sample VCF in a large cohort of WES samples of very different quality I have to select only high-quality variants genotyped in as many samples as possible.
I figured out that
- long indels have low quality
- only substitutions do not provide enough variants for my analysis.
I know how to filter out indels using bcftools – is there a command that may filter out long indels only, but remain 1-2bp inserts/deletions? I feel some AWK command should be very fast, but I don’t know how to count number of chars in columns ALT/REF of the VCF and how to print only variants where both ALT/REF variants are shorter than 3 symbols.
Appreciate any help, quick googling did not solve the problem.
• 23 views
Read more here: Source link