module Bio::BioAlignment::TreeSplitter

Split an alignment based on its phylogeny

Public Instance Methods

split_on_distance(target_size = nil) click to toggle source

Split an alignment using a phylogeny tree. One half contains sequences that are relatively homologues, the other half contains the rest. This is described in the tree-split.feature in the features directory.

The target_size parameter gives the size of the homologues sequence set. If target_size is nil, the set will be split in half.

Returns two alignments with their matching trees attached

# File lib/bio-alignment/edit/tree_splitter.rb, line 15
def split_on_distance target_size = nil
  target_size = size/2+1 if not target_size

  aln1 = clone
  # Start from the root of the tree (FIXME: what if there is no root?)
  prev_root = nil
  new_root = aln1.tree.root
  while new_root
    # find the nearest child (shortest edge)
    near_children = new_root.nearest_children
    # We possibly have multiple matches, so we are going to split on the
    # number of leafs, or we leave it like it is, if the split will be
    # too far from the target
    prev_root = new_root
    new_root = near_children.first
    near_children.each do |c|
      next if c == new_root
      # find the nearest match
      if (c.leaves.size-target_size).abs < (new_root.leaves.size-target_size).abs
        new_root = c 
      end
    end
    # Break out of the loop when we hit the target
    break if new_root.leaves.size <= target_size 
  end
  # Now see if whether the last step actually was an improvement, otherwise
  # we take one node up
  # p [(prev_root.leaves.size-target_size).abs,(new_root.leaves.size-target_size).abs]
  new_root = prev_root if (prev_root.leaves.size-target_size).abs < (new_root.leaves.size-target_size).abs
  branch = aln1.tree.clone_subtree(new_root)
  reduced_tree = aln1.tree.clone_tree_without_branch(new_root)
  # p branch.map { |n| n.name }.compact
  # p reduced_tree.map { |n| n.name }.compact

  # Now reduce the alignments themselves to match the trees
  aln1 = tree_reduce(reduced_tree)
  aln2 = tree_reduce(branch)
  return aln1,aln2
end