class FlatKit::MergeTree
Merge
a list of sorted records from Readers into a single output Writer
The MergeTree
implements a Tournament Tree algorightm to do a k-way merge between Reader
objects:
https://en.wikipedia.org/wiki/K-way_merge_algorithm
Usage:
compare_fields = %w[ key timestamp ] format = ::FlatKit::Format.for('json') readers = ARGV.map do |path| format.reader.new(source: path, compare_fields: compare_fields) end write_path = "merged.jsonl" writer = format.writer.new(destination: write_path.to_s) tree = ::FlatKit::MergeTree.new(readers) tree.each do |record| writer.write(record) end writer.close
Attributes
leaves[R]
levels[R]
readers[R]
Public Class Methods
new(readers)
click to toggle source
# File lib/flat_kit/merge_tree.rb, line 36 def initialize(readers) @readers = readers @leaves = [] @levels = [] @readers.each do |reader| @leaves << LeafNode.new(reader) end # Need to pad the leaves to an even number so that the slicing by 2 for # the tournament will work if @leaves.size.odd? then @leaves << SentinelLeafNode.new end init_tree end
Public Instance Methods
depth()
click to toggle source
The number of levels, this shold be the logn(readers.size)
# File lib/flat_kit/merge_tree.rb, line 86 def depth @levels.size end
each() { |value| ... }
click to toggle source
Iterate over all the ements from all the readers yielding them in sorted order.
# File lib/flat_kit/merge_tree.rb, line 94 def each loop do break if root.leaf.finished? yield root.value # consume the yielded value and have the tournament tree replay those # brackets affected root.leaf.update_and_replay end end
init_tree()
click to toggle source
Initialize the tournament tree, go in depths - bottom layer will be the winners of the 2 leaf nodes, continuing until top layer is just 1 node
# File lib/flat_kit/merge_tree.rb, line 58 def init_tree values = @leaves.dup loop do break if values.size == 1 winners = [] # we alays need a left and a right node, there is the possibility of # adding a single Sentinel node as the final right node in each level values.each_slice(2) do |left, right| right = SentinelInternalNode.new if right.nil? winners << InternalNode.new(left: left, right: right) end values = winners @levels << winners end end
root()
click to toggle source
Root is the last level - should only have one node
# File lib/flat_kit/merge_tree.rb, line 79 def root @levels[-1].first end