Counting the number of SNPs (VCF) for each genomic coordinates (BED)

Counting the number of SNPs (VCF) for each genomic coordinates (BED)

1

I want to count the number of recorded variations for each genomic coordinates of a .bed file from the corresponding .vcf file. I guess it should be solved by vcftools, but I could not find any suitable option to use.

Let’s suppose the bed file like this:

> chr10.bed
chr10   200383  202383  ENSG00000212331
chr10   224433  226434  ENSG00000015171
chr10   224453  226453  ENSG00000259741
chr10   254333  256391  ENSG00000015171
chr10   284402  286402  ENSG00000015171
chr10   288911  290911  ENSG00000015171

and the vcf file (the Data part just to show) as:

>chr10.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT GTEX-14PKV
10      200901   10_66040_CAG_C_b37      CAG     C       .       PASS    .       GT      0|0
10      225434   10_66326_A_G_b37        A       G       .       PASS    .       GT      0|0
10      225151   10_68431_T_TA_b37       T       TA      .       PASS    .       GT      0|0
10      254737   10_71776_T_C_b37        T       C       .       PASS    .       GT      1|1
10      254367   10_72040_CAAT_C_b37     CAAT    C       .       PASS    .       GT      0|0
10      286302   10_75794_G_T_b37        G       T       .       PASS    .       GT      1|1
10      290011   10_79191_G_A_b37        G       A       .       PASS    .       GT      0|0

Then, the output can be something like this:

> output_chr10.bed
ENSG00000212331  1
ENSG00000015171  0
ENSG00000259741  2
ENSG00000015171  3
ENSG00000015171  1
ENSG00000015171  1

Thank you in advance.


vcf


vcftools


bed


bedtools

• 41 views

Read more here: Source link