Filter dosage file by list of SNP IDs

Filter dosage file by list of SNP IDs

1

Hello, does anyone by any chance know of a fast/computationally efficient way to select lines in a .dosage file if the first column’s SNP ID is also contained within a .txt document of SNP IDs?

The .dosage file is in the following format:

SNPID Position REF ALT Sample1Dosage Sample2Dosage Sample3Dosage . . .
1:100:A:C A C 0 2 1 . . .
1:101:C:T C T 1 2 1 . . .
. . .

The list of SNP IDs in a .txt document is in the following format:

1:100:A:C
1:101:C:T
1:103:G:A
1:105:C:T

. . .

I have tried using grep -f snp_IDs.txt example.dosage > filtered_example.dosage, but the command is unfortunately too slow for my server to run it without hitting the max wall time


dosage


snp


genomics

• 18 views

Read more here: Source link