bash script

bash script

3

Hello everyone,
I have a file like this:
RSID1 RSID2

chr1_169894240_G_T_b38  chr1_169894240_G_T_b38
chr1_169894240_G_T_b38  chr1_169891332_G_A_b38
chr1_169891332_G_A_b38  chr1_169891332_G_A_b38
chr1_169661963_G_A_b38  chr1_169661963_G_A_b38
chr1_169661963_G_A_b38  chr1_169697456_A_T_b38
chr1_169697456_A_T_b38  chr1_169697456_A_T_b38
chr1_27636786_T_C_b38   chr1_27636786_T_C_b38
chr1_196651787_C_T_b38  chr1_196651787_C_T_b38
chr6_143501715_T_C_b38  chr6_143501715_T_C_b38

I want to extract info just like:
chr1_169894240 chr1_169894240.
I don’t want to have other info. I just want chr_pos
I am confuse how to extract this info because the length is varying. In one case its 9 length and in other its 10. So if i use cut command for some its showing write value like chr_pos but for some its showing chr_pos_
Can anyone please help me out with this.


info


snp


model


substring

• 63 views

You can use cut or awk with “_” as field separator character, e.g., cut -f 1 yourfile.txt | awk -v FS="_" {print $1"_"$2}. If you have a 2-column tsv file, you can try:

paste <(cut -f 1 yourfile.txt  | awk -v FS="_" '{print $1"_"$2}') <(cut -f 2 yourfile.txt  | awk -v FS="_" '{print $1"_"$2}')

$ sed -r 's/_w_w_w{3}//g' test.txt

$ awk -v OFS="t" -F '[_t]' '{print $1"_"$2,$6"_"$7}' test.txt

$ parallel --colsep "_|t" echo {1}_{2} {6}_{7} :::: test.txt  | sed 's/s/t/'

chr1_169894240  chr1_169894240
chr1_169894240  chr1_169891332
chr1_169891332  chr1_169891332
chr1_169661963  chr1_169661963
chr1_169661963  chr1_169697456
chr1_169697456  chr1_169697456
chr1_27636786   chr1_27636786
chr1_196651787  chr1_196651787
chr6_143501715  chr6_143501715

For the win, can even do a fancy regex with sed

cat data.tsv 
chr1_169894240_G_T_b38  chr1_169894240_G_T_b38
chr1_169894240_G_T_b38  chr1_169891332_G_A_b38
chr1_169891332_G_A_b38  chr1_169891332_G_A_b38
chr1_169661963_G_A_b38  chr1_169661963_G_A_b38
chr1_169661963_G_A_b38  chr1_169697456_A_T_b38
chr1_169697456_A_T_b38  chr1_169697456_A_T_b38
chr1_27636786_T_C_b38   chr1_27636786_T_C_b38
chr1_196651787_C_T_b38  chr1_196651787_C_T_b38
chr6_143501715_T_C_b38  chr6_143501715_T_C_b38

sed 's/_[ATGC]_[ATGC]_[a-z][0-9]*//g' data.tsv 
chr1_169894240  chr1_169894240
chr1_169894240  chr1_169891332
chr1_169891332  chr1_169891332
chr1_169661963  chr1_169661963
chr1_169661963  chr1_169697456
chr1_169697456  chr1_169697456
chr1_27636786   chr1_27636786
chr1_196651787  chr1_196651787
chr6_143501715  chr6_143501715

Kevin


Login
before adding your answer.

Traffic: 1151 users visited in the last hour

Read more here: Source link