module StreamSampler
Constants
- VERSION
Public Class Methods
reservoir_sample(iterable, n, iteration_method = :each)
click to toggle source
Uses Vitter's reservoir sampling algorithm to take n items without replacement, from a iterable of items of unknown, arbitrary length.
The algorithm is O(n) in the length of the iterable. The original paper is at:
www.cs.umd.edu/~samir/498/vitter.pdf
# File lib/stream_sampler.rb, line 12 def self.reservoir_sample(iterable, n, iteration_method = :each) out = [] return out if n.to_i == 0 raise ArgumentError, "n must be >= 0" unless n > 0 iterable.send(iteration_method).with_index do |i, idx| if idx < n out << i else j = rand(idx + 1) out[j] = i if j < n end end out end