I have multiple files, each with two columns :
For example :
file1.txt
ID Value1
1 40
2 30
3 70file2.txt
ID Value2
3 50
4 70
9 20 And so on,
file1230.txt
ID Value150
9 98
10 52
2 71How do I merge these files based on the first column (based on the intersections between the files)? My output should be
ID Value1 Value2 Value150
1 40 0 0
2 30 0 71
3 70 50 0
4 0 70 0
9 0 20 98
10 0 0 52Could someone help to this using awk or Linux commands.
Thank you.
2 Answers
Here is one way you can do that using python.
Code:
import sys
columns = []
data = {}
ids = set()
for filename in sys.argv[1:]: with open(filename, 'rU') as f: key = next(f).strip().split()[1] columns.append(key) data[key] = {} for line in f: if line.strip(): id, value = line.strip().split() try: data[key][int(id)] = value except ValueError as exc: raise ValueError( "Problem in line: '{}' '{}' '{}'".format( id, value, line.rstrip())) ids.add(int(id))
print('\t'.join(['ID'] + columns))
for id in sorted(ids): line = [] for column in columns: line.append(data[column].get(id, '0')) print('\t'.join([str(id)] + line))Results:
ID Value1 Value2 Value150
1 40 0 0
2 30 0 71
3 70 50 0
4 0 70 0
9 0 20 98
10 0 0 52 6 A Bash solution with command line tools. The input file list was out of order, hence the ls -v output to cat.
while read line; do if [[ "$line" =~ ID ]]; then array=${line##* } index+=($array) continue else eval $array'[${line% *}]=${line#* }' fi
done <<<"$( cat $(ls -v file[0-9]*.txt) )"
printf ID
for name in ${index[@]}; do printf ' %s' $name
done
echo
max_ind=$( sort -nu file[0-9]*.txt | tail -n1 | cut -d' ' -f1 )
for (( j = 1 ; j <= $max_ind ; j++ )); do for (( i = 0 ; i < ${#index[@]} ; i++ )); do value=$( eval 'echo ${'${index[i]}'[j]}' ) roll+=$( [ "$value" ] && printf "%-${#index[i]}s " $value || printf "%-${#index[i]}s " 0 ) done [[ "$roll" =~ [^0\ ] ]] && printf '%-4s%s\n' $j "$roll" unset roll
done