Intersect multiple bed and keep all fields

Assuming the inputs are disjoint, here is a modification of a general BEDOPS-based solution from Shane Neph for N input files (for you, three inputs, but this works for the general case of N inputs):

$ bedops --everything file1.bed file2.bed ... fileN.bed 
    | bedmap --echo-map - 
    | awk '(split($0, a, ";") == 3)' - 
    | sed 's/;/n/g' - 
    | sort-bed - 
    | uniq - 
    > answer.bed

The file answer.bed contains a unique list of elements from N number of BED files that overlap between three elements by at least one or more bases.

Let’s walk through this line by line.

  1. Take the multiset union of all elements from all N input files with bedops --everything. Note that this is not the same as a --merge operation, which instead makes contiguous regions from all overlapping input regions. Pipe this unioned set to bedmap.
  2. For each element from all files, use bedmap --echo-map to print any other elements from the multiset union that overlap that first element by one or more bases, using the default overlap criteria. The result will be a semi-colon delimited string that contains one or more overlapping elements. Note that this result will always include the original element, because it overlaps itself by 100%. Pipe this result to the awk statement.
  3. The awk command splits the string on the semi-colon delimiter and prints out the string if there are exactly three overlapping elements. We do this filter step, because we are interested in any one element that overlaps two other elements by the stated overlap criterion. We pipe this string to sed.
  4. The sed statement replaces all semi-colons with newline characters. This creates BED-formatted output that we pipe to sort-bed.
  5. The sort-bed tool takes the overlapping elements and puts them in lexicographically-sorted order. We pipe this to uniq to remove duplicate elements.

The file answer.bed is a sorted BED file that contains all unique elements from file1.bed through fileN.bed, where there is mutual, minimally one-base overlap between three sets.

If you look at that awk statement, you’ll see how to modify that test for your other two use cases, such as when you want elements that overlap no other elements in any other sets (“one”), or the case where you want elements that overlap with only one other set (“two”).

You might also find this page useful, which describes this more general approach to this problem and why it is scalable with BEDOPS.

Read more here: Source link