New datasets for ancestry estimation and imputation?


What datasets are people using nowadays for genotype imputation and ancestry estimation? HapMap and 1000 Genomes are good, but it was some years since their release and both have some limitations on the number of populations included and resolution (especially HapMap which is a few genome builds behind and requires lifting over).

I know of several much larger studies like:

  • 100 000 Genomes (UK-centric)
  • UK Biobank (UK-centric)
  • GenomeAsia100K (Asia-centric, in a pilot phase)

and all three require pre-approved access (which may not be granted for simple use in QC and imputation due to concerns about the privacy of participants).

An answer from 5 years ago mentiones Simons Genome Diversity Project (SGDP) and the Estonian Biocentre Human Genome Diversity Panel (EGDP) which indeed have more populations, but they also have smaller numbers.

Are there any other international projects like HapMap or 1000 Genomes that could be used for ancestry estimation and genotype imputation? Or is there anything in a pilot phase?







