Extract columns from a vcf file using identifiers from a second file

Extract columns from a vcf file using identifiers from a second file

4

Dears people
Maybe I am too naive but I am pretty new to bioinformatics. I have two files. One is a normal vcf file with column1 having the CHROMOSOME information and column2 the POSITION information. My second file is a txt file that contains the “CHROM” of interest. However in the second file there is only one entry per chromosome. Data are from denovo RNA assemblies, so instead of 1,2,3, chromosoms I have thousands of contigs in the #CHROM entry.
e.g.
file1

  #CHROM               POS  ID  REF ALT QUAL    FILTER
TRINITY_DN4621_c0_g1    45  .   G   T   6641.77 PASS
TRINITY_DN4621_c0_g1    304 .   T   A   9057.77 PASS
TRINITY_DN12351_c0_g1   34  .   G   T   131.03  PASS
TRINITY_DN12351_c0_g1   328 .   T   C   1795.77 PASS
TRINITY_DN12351_c0_g1   774 .   C   T   1649.77 PASS
TRINITY_DN12351_c0_g1   942 .   G   A   2202.77 PASS
TRINITY_DN12351_c0_g1   1035    .   T   A   4024.77 PASS
TRINITY_DN12351_c0_g1   1224    .   A   T   7691.77 PASS
TRINITY_DN12351_c0_g1   1821    .   A   T   4930.77 PASS
TRINITY_DN12351_c0_g1   2133    .   T   A   4647.77 PASS
TRINITY_DN12351_c0_g1   2160    .   G   A   2677.77 PASS
TRINITY_DN12351_c0_g1   2241    .   A   G   2563.77 PASS
TRINITY_DN12351_c0_g1   2631    .   A   C   5120.77 PASS
TRINITY_DN11255_c4_g2   212 .   T   C   200.84  PASS
TRINITY_DN11255_c4_g2   491 .   G   A   3052.77 PASS
TRINITY_DN11255_c4_g2   581 .   C   T   3994.77 PASS
TRINITY_DN11255_c4_g2   639 .   A   G   3725.77 PASS
TRINITY_DN12185_c0_g1   713 .   A   T   4053.77 PASS
TRINITY_DN12185_c0_g1   733 .   T   A   3150.77 PASS
TRINITY_DN576_c0_g1 1389    .   T   A   160.8   PASS
TRINITY_DN7282_c0_g1    127 .   A   G   94.28   PASS
TRINITY_DN11386_c5_g2   109 .   A   G   79.28   PASS
TRINITY_DN11386_c5_g2   157 .   T   A   54.74   PASS
TRINITY_DN11386_c5_g1   660 .   G   A   18333.8 PASS
TRINITY_DN11386_c5_g1   1002    .   A   C   23923.8 PASS
TRINITY_DN11386_c5_g1   1341    .   C   A   18387.8 PASS
TRINITY_DN12464_c8_g1   417 .   G   A   8615.77 PASS

file2

TRINITY_DN4621_c0_g1    
TRINITY_DN12351_c0_g1   
TRINITY_DN11255_c4_g2   
TRINITY_DN12185_c0_g1   
TRINITY_DN576_c0_g1 
TRINITY_DN7282_c0_g1    
TRINITY_DN11386_c5_g2   
TRINITY_DN11386_c5_g1   
TRINITY_DN12464_c8_g1   
TRINITY_DN12481_c4_g1   
TRINITY_DN12018_c2_g1   
TRINITY_DN12013_c0_g1   
TRINITY_DN2189_c0_g1    
TRINITY_DN739_c0_g1 
TRINITY_DN11060_c0_g1   
TRINITY_DN11770_c2_g1

My question is how can I extract the information present in file2 but for all the positions in file1? I need both columns the #CHROM column and the #POS column

Thanks a lot for any help


snp

• 1.9k views

Read more here: Source link