Extract columns from a vcf file using identifiers from a second file
Dears people
Maybe I am too naive but I am pretty new to bioinformatics. I have two files. One is a normal vcf file with column1 having the CHROMOSOME information and column2 the POSITION information. My second file is a txt file that contains the “CHROM” of interest. However in the second file there is only one entry per chromosome. Data are from denovo RNA assemblies, so instead of 1,2,3, chromosoms I have thousands of contigs in the #CHROM entry.
e.g.
file1
#CHROM POS ID REF ALT QUAL FILTER
TRINITY_DN4621_c0_g1 45 . G T 6641.77 PASS
TRINITY_DN4621_c0_g1 304 . T A 9057.77 PASS
TRINITY_DN12351_c0_g1 34 . G T 131.03 PASS
TRINITY_DN12351_c0_g1 328 . T C 1795.77 PASS
TRINITY_DN12351_c0_g1 774 . C T 1649.77 PASS
TRINITY_DN12351_c0_g1 942 . G A 2202.77 PASS
TRINITY_DN12351_c0_g1 1035 . T A 4024.77 PASS
TRINITY_DN12351_c0_g1 1224 . A T 7691.77 PASS
TRINITY_DN12351_c0_g1 1821 . A T 4930.77 PASS
TRINITY_DN12351_c0_g1 2133 . T A 4647.77 PASS
TRINITY_DN12351_c0_g1 2160 . G A 2677.77 PASS
TRINITY_DN12351_c0_g1 2241 . A G 2563.77 PASS
TRINITY_DN12351_c0_g1 2631 . A C 5120.77 PASS
TRINITY_DN11255_c4_g2 212 . T C 200.84 PASS
TRINITY_DN11255_c4_g2 491 . G A 3052.77 PASS
TRINITY_DN11255_c4_g2 581 . C T 3994.77 PASS
TRINITY_DN11255_c4_g2 639 . A G 3725.77 PASS
TRINITY_DN12185_c0_g1 713 . A T 4053.77 PASS
TRINITY_DN12185_c0_g1 733 . T A 3150.77 PASS
TRINITY_DN576_c0_g1 1389 . T A 160.8 PASS
TRINITY_DN7282_c0_g1 127 . A G 94.28 PASS
TRINITY_DN11386_c5_g2 109 . A G 79.28 PASS
TRINITY_DN11386_c5_g2 157 . T A 54.74 PASS
TRINITY_DN11386_c5_g1 660 . G A 18333.8 PASS
TRINITY_DN11386_c5_g1 1002 . A C 23923.8 PASS
TRINITY_DN11386_c5_g1 1341 . C A 18387.8 PASS
TRINITY_DN12464_c8_g1 417 . G A 8615.77 PASS
file2
TRINITY_DN4621_c0_g1
TRINITY_DN12351_c0_g1
TRINITY_DN11255_c4_g2
TRINITY_DN12185_c0_g1
TRINITY_DN576_c0_g1
TRINITY_DN7282_c0_g1
TRINITY_DN11386_c5_g2
TRINITY_DN11386_c5_g1
TRINITY_DN12464_c8_g1
TRINITY_DN12481_c4_g1
TRINITY_DN12018_c2_g1
TRINITY_DN12013_c0_g1
TRINITY_DN2189_c0_g1
TRINITY_DN739_c0_g1
TRINITY_DN11060_c0_g1
TRINITY_DN11770_c2_g1
My question is how can I extract the information present in file2 but for all the positions in file1? I need both columns the #CHROM column and the #POS column
Thanks a lot for any help
• 1.9k views
Read more here: Source link