VCF file phasing by SHAPEIT

Hi everybody,

I would like to phase (just phasing, not imputation) vcf file containing about 1100 individuals (a given human population) derived from whole genome sequencing, the vcf file obtained by GATK. As I searched, SHAPEIT was mostly used; based on its manual, it requires genetic map for phasing, however, the provided link for genetic map is based on hapmap, hg37, which the link didn’t work (actually, an error says “The requested URL /genetics_software/shapeit/shapeit.html/files/genetic_map_b37.tar.gz was not found on this server”. Now, my questions are:

1) Could you please tell me where is genetic map?, I also need this map based on hg38, is there the genetic map for hg38, or how we can convert this map from hg37 to the related map for hg38?

2) In SHAPEIT manual, “read aware phasing” also described that takes bam and vcf file as input to extract the phase informative read (PIR) that used for phasing vcf file in the next step. So, the genetic map is no longer required, here. I think I should use this method (not that is based on genetic map) since I have the sequencing data, yes, am I right?

3) Also, please kindly let me know if it is possible to use a subset of interest from the vcf file of a given chromosome and the related part bam file extracted from whole bam file (so, not use whole bam and vcf file) for phasing?

Any suggestion and help would be highly appreciated.

Read more here: Source link