Where to find 1000 Genome phase 3 whole genome data and select only European population

Where to find 1000 Genome phase 3 whole genome data and select only European population

2

Hello:

I was trying to download whole genome data from 1000Genome phase 3 data and extract only the EUR population (GBR, TSI, FIN, IBS, CEU). I used the ftp site:

ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz,

but apparently it is not the file I need, the error message says:

Error: No samples in .vcf file.

My question is where do I get the whole genome 1000Genome phase 3 data. Also, I checked Data slicer from EnsemblGRCh37, it allows population selection, but the maximum genome region to be extracted is 2.5Mb, so I can’t get the whole genome data even if I succeed in downloading the whole genome dataset from the above ftp site (assume if it exists).

Opal


whole genome


1000Genome phase3


EUR population

• 3.4k views

updated 2 hours ago by

▴

10

written 3.2 years ago by

0

It is not exactly what you are asking for, but I bumped into your question searching for this:

If you are looking for the plink files (.fam, .bim, .bed) for 1000G CEU phase 3 the following is a good source:

ctg.cncr.nl/software/magma

The following commands will download and unpack this data for you.

wget https://ctg.cncr.nl/software/MAGMA/ref_data/g1000_eur.zip 
unzip g1000_eur.zip

The file that you want to download is 1.8 gigabytes. It will take a while to download, depending on your connection. Ensure that it downloads completely before trying to use it.

To view data in a vcf.gz file, use zcat or bcftools view, or just unzip it.


You can also download the data on a per-chromosome basis:

prefix="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr" ;

suffix=".phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" ;

for chr in {1..22} X; do
    wget $prefix$chr$suffix $prefix$chr$suffix.tbi ;
done

You can then merge those into a single file or keep them separate. Either way, then download the 1000 Genomes PED file, which you can use for obtaining IDs for the purposes of filtering:

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_g1k.ped ;

Note that if you use mac, wget may not be installed. You can install it with brew install wget

Kevin


Login
before adding your answer.

Traffic: 2542 users visited in the last hour

Read more here: Source link