class JustKeepZipping

Allows the creating of large ZIP files in a streaming fashion.

Example:

zip = JustKeepZipping.new
zip.add 'file1.txt', 'Data to be zipped'
data1 = zip.read
progress = Marshal.dump zip # into an object store?

zip = Marshal.load progress # load from object store?
zip.add 'file2.txt', 'More data to be zipped'
zip.close
data2 = zip.read

complete_archive = data1 + data2

Attributes

entries[R]

Public Class Methods

new() click to toggle source

Use the constructor for the initial object creation. Use Marshal.dump and Marshal.load (e.g. with Redis) to tranfer this instance between compute units (e.g. Sidekiq jobs).

# File lib/just-keep-zipping.rb, line 25
def initialize
  @entries = []
  @data = ''
end

Public Instance Methods

add(filename, body) click to toggle source

Add a file to the archive.

Params:

filename

a string representing the name of the file as it should appear in the archive

body

a string or IO object that represents the contents of the file

# File lib/just-keep-zipping.rb, line 43
def add(filename, body)
  io = Zip::OutputStream.write_buffer do |zip|
    zip.put_next_entry filename
    zip.write body.respond_to?(:read) ? body.read : body
  end
  io.rewind
  io.set_encoding 'ASCII-8BIT'
  d = Zip::CentralDirectory.read_from_stream io

  e = d.entries.first
  payload_size = 30 + e.name.length + e.compressed_size

  io.rewind
  @data += io.read(payload_size)

  e.zipfile = nil
  @entries << e
  nil
end
close() click to toggle source

Finalizes the archive by adding the trailing ZIP header. A final read must be called to get the data.

No further files should be added after calling close.

# File lib/just-keep-zipping.rb, line 67
def close
  contents_size = 0
  @entries.each do |e|
    e.local_header_offset = contents_size
    contents_size += 30 + e.name.length + e.compressed_size
  end

  io = StringIO.new
  io.instance_eval "@offset = #{contents_size}"
  def io.tell
    super + @offset
  end
  Zip::CentralDirectory.new(@entries).write_to_stream io
  io.rewind
  tail = io.read
  tail.force_encoding 'ASCII-8BIT'
  @data += tail
  contents_size
end
current_size() click to toggle source

The current data size. Use this as a stopping or checkpoint condition, to keep memory from growing too large.

# File lib/just-keep-zipping.rb, line 33
def current_size
  @data.size
end
read() click to toggle source

Get the current ZIP data, to save in an object store like S3 or GCS.

Do this before persisting this instance with Marshal.dump, to avoid placing too much progress data into a temporary object store like Redis.

# File lib/just-keep-zipping.rb, line 92
def read
  data = @data
  @data = ''
  data
end