I have over 50 files that have the following format:
File1.tab
A 25
B 56
C 87
D 82
E 91
F 34
File2.tab
A 89
B 55
C 92
D 36
E 28
F 19
File3.tab
A 98
B 76
C 82
D 76
E 49
F 42
etc.
I would like the following:
- The columns merged together based on the first column.
- each column to have the file name as a header.
- The values separated by a tab not space so when I open the merged file in excel it will have multiple columns (i.e column1= “row1 is empty, row2 is A and so on”; column2= “row1 is File1.tab, row2 is 25 and so on”; column3= “row1 is File2.tab, row2 is 89 and so on”; etc).
final.tab
File1.tab File2.tab File3.tab
A 25 89 98
B 56 55 76
C 87 92 82
D 82 36 76
E 91 28 49
F 34 19 42
I tried searching for answers and followed the following command: from unix.stackexchange.com/questions/638521/merge-multiple-files-by-first-column
Print with headers: for the multiple files version:
$ hdr() { awk 'FNR==1{ print " ", FILENAME }1' "$1"; }
$ join -a1 -a2 -e 1 -o auto <(hdr file1) <(hdr file2) >join.tmp
$ for file in rest_files*; do
join -a1 -a2 -e 1 -o auto join.tmp <(hdr "$file") >join.tmp.1
mv join.tmp.1 join.tmp
done
$ tr -d ' ' <join.tmp >final.file
When I opened the output in excel, I had the following problems:
- The headers were assigned twice in two rows
- Columns from the first and second files were added twice and then the files was added again at the end.
- All the values and headers were separated by a space and just assigned to one column in excel.
Here is what it looked like:
final.tab
File1.tab File2.tab File3.tab join.temp
File1.tab File2.tab File3.tab File1.tab
A 25 89 25 89 98 25
B 56 55 56 55 76 56
C 87 92 87 92 82 87
D 82 36 82 36 76 82
E 91 28 91 28 49 91
F 34 19 34 19 42 34
Can you fix the current code or provide me with a new one that will do I need?
Read more here: Source link