class Obuf

An object buffer for Ruby objects. Use it to sequentially store a shitload of objects on disk and then retreive them one by one. Make sure to call clear when done with it to discard the stored blob.

a = Obuf.new
parse_big_file do | one_node |
  a.push(one_node)
end

a.size #=> 30932 # We've stored 30 thousand objects on disk without breaking a sweat
a.each do | node_read_from_disk |
   # do something with node that has been recovered from disk
end

a.clear # ensure that the file is deleted

Both reading and writing aim to be threadsafe

Constants

VERSION

Attributes

size[R]

Returns the number of objects stored so far

Public Class Methods

new(enumerable = []) { |self| ... } click to toggle source

Initializes a new Obuf. If an Enumerable argument is passed each element from the Enumerable will be stored in the Obuf (so you can pass an IO for example)

# File lib/obuf.rb, line 33
def initialize(enumerable = [])
  @sem = Mutex.new
  @store = Tempfile.new("obuf")
  @store.binmode
  @size = 0
  
  @lens = Obuf::ProtectedLens.new(@store)
  
  # Store everything from the enumerable in self
  enumerable.each { |e| push(e) }
  
  # ...and yield self for any configuration
  yield self if block_given?
end

Public Instance Methods

<<(object_to_store)
Alias for: push
[](slice) click to toggle source

Retreive a slice of the enumerable at index

# File lib/obuf.rb, line 82
def [](slice)
  slice.respond_to?(:each) ? slice.map{|i| recover_at(i) } : recover_at(slice)
end
clear() click to toggle source

Calls close! on the datastore and deletes the objects in it

# File lib/obuf.rb, line 74
def clear
  @sem.synchronize do
    @store.close!
    @size = 0
  end
end
each() { |recover_object| ... } click to toggle source

Retreive each stored object in succession. All other Enumerable methods are also available (but be careful with Enumerable#map and to_a)

# File lib/obuf.rb, line 66
def each
  with_separate_read_io do | iterable |
    reading_lens = Obuf::Lens.new(iterable)
    @size.times { yield(reading_lens.recover_object) }
  end
end
empty?() click to toggle source

Tells whether the buffer is empty

# File lib/obuf.rb, line 49
def empty?
  @size.zero?
end
push(object_to_store) click to toggle source

Store an object

# File lib/obuf.rb, line 54
def push(object_to_store)
  @sem.synchronize {
    @lens << object_to_store
    @size += 1
  }
  object_to_store
end
Also aliased as: <<

Private Instance Methods

recover_at(idx) click to toggle source
# File lib/obuf.rb, line 88
def recover_at(idx)
  with_separate_read_io do | iterable |
    reading_lens = Obuf::Lens.new(iterable)
    reading_lens.recover_at(idx)
  end
end
with_separate_read_io() { |iterable| ... } click to toggle source

We first ensure that we have a disk-backed file, then reopen it as read-only and iterate through that (we will have one IO handle per loop nest)

# File lib/obuf.rb, line 97
def with_separate_read_io
  # Ensure all data is written before we read it
  iterable = @sem.synchronize do
    @store.flush
    File.open(@store.path, "rb")
  end
  
  begin
    yield(iterable)
  ensure
    iterable.close
  end
end