Merging columns from multiple files and adding headers- Linux

I have over 50 files that have the following format:

File1.tab

A 25
B 56
C 87
D 82
E 91
F 34

File2.tab

A 89
B 55
C 92
D 36
E 28
F 19

File3.tab

A 98
B 76
C 82
D 76
E 49
F 42

etc.

I would like the following:

  • The columns merged together based on the first column.
  • each column to have the file name as a header.
  • The values separated by a tab not space so when I open the merged file in excel it will have multiple columns (i.e column1= “row1 is empty, row2 is A and so on”; column2= “row1 is File1.tab, row2 is 25 and so on”; column3= “row1 is File2.tab, row2 is 89 and so on”; etc).

final.tab

  File1.tab File2.tab File3.tab
A    25        89        98
B    56        55        76
C    87        92        82
D    82        36        76
E    91        28        49
F    34        19        42

I tried searching for answers and followed the following command: from unix.stackexchange.com/questions/638521/merge-multiple-files-by-first-column

Print with headers: for the multiple files version:

$ hdr() { awk 'FNR==1{ print "", FILENAME }1' "$1"; }
$ join -a1 -a2 -e 1 -o auto <(hdr file1) <(hdr file2) >join.tmp
$ for file in rest_files*; do
     join -a1 -a2 -e 1 -o auto join.tmp <(hdr "$file") >join.tmp.1
     mv join.tmp.1 join.tmp
  done
$ tr -d '' <join.tmp >final.file

When I opened the output in excel, I had the following problems:

  • The headers were assigned twice in two rows
  • Columns from the first and second files were added twice and then the files was added again at the end.
  • All the values and headers were separated by a space and just assigned to one column in excel.

Here is what it looked like:

final.tab

 File1.tab File2.tab File3.tab join.temp
 File1.tab File2.tab File3.tab File1.tab
A 25 89 25 89 98 25
B 56 55 56 55 76 56
C 87 92 87 92 82 87
D 82 36 82 36 76 82
E 91 28 91 28 49 91
F 34 19 34 19 42 34

Can you fix the current code or provide me with a new one that will do I need?

Read more here: Source link