Extracting variations in the gene regions and from 100 bp of gene boundary from multiple VCF files
I sincerely hope that I am not repeating an already answered question. I couldn’t find the answer to my exact problem.
I have three VCF files derived using bcftools (isec). Those three files contain similar variations compared to the reference sequence. End of the day, I have
- Three VCF files representing three varieties (include only the common variations)
- Reference FASTA file
- Annotation (gff3) file for reference.
What I want to do is extract variations found in;
- Gene region
- 100 bp from TSS/+1 and the stop codon
Please note this is a 5 MB region (not a whole-genome, so there are no chromosomes).
I appreciate it if someone can help me in this regard.
• 21 views
Read more here: Source link