class MlmmjArchiver::Archiver

Archiver class. Point it to a target directory you want to place your web archive under, add some MLs to process and start the process via archive!. You have some influence over the used (temporary) MHonArc RC file by specifying some arguments to ::new.

Note that archiving for the web is a two-step process. First the mails in mlmmj’s archive folder need to be split up in a directory structure that allows processesing them month-by-month instead of processing them all at once, because this allows for an easier overview of the web archive. In the second step, all these month directories are passed into mhonarc, which converts them to HTML and stores them in the final directory.

Constants

ARCHIVE_DIR

Path relative to ML root containing the mails

CONTROL_FILE

Path relative to ML root containing the file that requests the web archiving.

MHONARC

Path to the mhonarc executable.

MRC_DEFAULTS

Default values for the MHonArc RC file.

MRC_TEMPLATE

Template for generating the temporary MHonArc RC file.

Public Class Methods

new(target, rc_args = {}) click to toggle source

Create a new Archiver that stores its HTML mails below the given target directory. rc_args allows the customization of the used MHonArc RC file. It is a hash that takes the following arguments (the values in parentheses denote the default values)

header (“<p>ML archive</p>”)

HTML header to prepend to every page. $IDXTITLE$ is replaced by the title of the respective index.

tlevels (8)

Number of levels to nest threads before flattening.

archiveadmin (postmaster@example.org)

E-Mail address of the archive administrator.

checknoarchive (true)

If set, adds <CHECKNOARCHIVE> to the rc file. Otherwise adds <NOCHECKNOARCHIVE>.

searchtarget (nil)

If this is set, displays a link called “search” next to the index links that links to the location specified here.

stylefile (“/archive.css”)

CSS style file to reference from the outputted HTML pages.

mhonarc (“/usr/bin/mhonarc”)

Path to the mhonarc executable to create the archive.

cachedir (nil)

Path to a directory where the mails are stored sorted. Setting this to a permanent storage will speed up the archiving process on large MLs.

# File lib/mlmmj-archiver/archiver.rb, line 77
def initialize(target, rc_args = {})
  @target_dir   = Pathname.new(target).expand_path
  @mailinglists = []
  @mutex        = Mutex.new
  @rc_args      = MRC_DEFAULTS.merge(rc_args)
  @debug        = false
  @inotify_thread = nil
  @mhonarc      = rc_args[:mhonarc] || MHONARC

  if rc_args[:cachedir]
    @sorted_target = Pathname.new(rc_args[:cachedir]).expand_path
  else
    @sorted_target = Pathname.new(Dir.mktmpdir)
    at_exit{FileUtils.rm_rf(@sorted_target)}
  end

end

Public Instance Methods

<<(path) click to toggle source

Like add_ml, but returns self for method chaining.

# File lib/mlmmj-archiver/archiver.rb, line 114
def <<(path)
  add_ml(path)
  self
end
add_ml(path) click to toggle source

Add a mlmmj ML directory to process.

# File lib/mlmmj-archiver/archiver.rb, line 106
def add_ml(path)
  dir = Pathname.new(path).expand_path
  debug("Adding ML directory: #{dir}")

  @mailinglists.push(dir)
end
archive!() click to toggle source

Process all the mails in all the directories.

# File lib/mlmmj-archiver/archiver.rb, line 169
def archive!
  @mutex.synchronize do
    rcpath = generate_rcfile

    @mailinglists.each do |path|
      control_file = path + CONTROL_FILE
      next unless control_file.file?

      process_ml(@sorted_target + path.basename, @target_dir + path.basename, rcpath)
    end
  end
end
debug_mode=(val) click to toggle source

Enable/disable debugging output.

# File lib/mlmmj-archiver/archiver.rb, line 96
def debug_mode=(val)
  @debug = val
end
debug_mode?() click to toggle source

True if debugging output is enabled, see debug_mode=.

# File lib/mlmmj-archiver/archiver.rb, line 101
def debug_mode?
  @debug
end
preprocess_mlmmj_mails!() click to toggle source

Iterates over all mailinglists and copies new messages into the intermediate month directory structure.

# File lib/mlmmj-archiver/archiver.rb, line 157
def preprocess_mlmmj_mails!
  @sorted_target.mkpath unless @sorted_target.directory?

  @mutex.synchronize do
    @mailinglists.each do |path|
      hsh = collect_messages(path + ARCHIVE_DIR)
      split_messages_into_month_dirs(hsh, @sorted_target + path.basename) # path.basename is the ML name
    end
  end
end
stop_watching_mlmmj_mails!() click to toggle source

Terminate the watching thread started by watch_mlmmj_mails.

# File lib/mlmmj-archiver/archiver.rb, line 151
def stop_watching_mlmmj_mails!
  @inotify_thread.terminate
end
watch_mlmmj_mails!() click to toggle source

The more elegant variant of preprocess_mlmmj_mails. Instead of polling all mails and testing whether they are there, use inotify to have Linux notify us when a new file is added to the ML directory. For this method to work rb-inotify must be available on your system (otherwise you get a NotImplementedError).

# File lib/mlmmj-archiver/archiver.rb, line 124
def watch_mlmmj_mails!
  raise(NotImplementedError, "This is only possible with rb-inotify!") unless defined?(INotify)

  @inotifier = INotify::Notifier.new

  @mailinglists.each do |path|
    archive_dir = path + ARCHIVE_DIR

    @inotifier.watch(archive_dir.to_s, :create) do |event|
      next unless File.file?(event.absolute_name)
      next unless event.name =~ /^\d+$/

      debug "Got a new mail: #{event.name}"
      sleep 2 # Wait for the file to be fully written

      @mutex.synchronize do
        mail = Mail.read(event.absolute_name)
        FileUtils.cp(event.absolute_name, @sorted_target + path.basename + mail.date.year.to_s + mail.date.month.to_s)
      end
    end
  end

  debug "Watching MLs via inotify."
  @inotify_thread = Thread.new{@inotifier.run}
end

Private Instance Methods

collect_messages(mail_dir) click to toggle source

Collect the mails in the given directory in a nested hash like this:

{year1 => {month1 => [...], month2 => [...]}, year2 => {...}}
# File lib/mlmmj-archiver/archiver.rb, line 264
def collect_messages(mail_dir)
  hsh = Hash.new{|hsh, k| hsh[k] = Hash.new{|hsh2, k2| hsh2[k2] = []}}

  debug "Collecting messages in #{mail_dir}"

  mail_dir.each_child do |path|
    next unless path.file?

    mail = Mail.read(path)
    hsh[mail.date.year][mail.date.month] << path
  end

  hsh
end
debug(str) click to toggle source

Prints str onto stdout via puts if debug_mode?.

# File lib/mlmmj-archiver/archiver.rb, line 322
def debug(str)
  puts str if debug_mode?
end
generate_rcfile() click to toggle source
header (“<p>ML archive</p>”)

HTML header to prepend to every page. $IDXTITLE$ is replaced by the title of the respective index.

tlevels (8)

Number of levels to nest threads before flattening.

archiveadmin (postmaster@example.org)

E-Mail address of the archive administrator.

checknoarchive (true)

If set, adds <CHECKNOARCHIVE> to the rc file. Otherwise adds <NOCHECKNOARCHIVE>.

searchtarget (“/search”)

Target for the “search” link.

stylefile (“/archive.css”)

Generate an RC file for MHonArc and return the path to it.

# File lib/mlmmj-archiver/archiver.rb, line 225
def generate_rcfile
  tempfile = Tempfile.new("archive-mhonarc")
  rcpath   = tempfile.path
  at_exit{File.delete(rcpath)}

  debug "Generating MhonArc RC file at #{rcpath}"

  header         = @rc_args[:header]
  tlevels        = @rc_args[:tlevels]
  archiveadmin   = @rc_args[:archiveadmin]
  checknoarchive = @rc_args[:checknoarchive] ? "<CHECKNOARCHIVE>" : "<CHECKNOARCHIVE>\n<NOCHECKNOARCHIVE>"
  searchtarget   = @rc_args[:searchtarget]
  stylefile      = @rc_args[:stylefile]

  mrc = MRC_TEMPLATE.result(binding)
  tempfile.write(mrc)

  rcpath
end
mhonarc(source, rel_target, rcpath) click to toggle source

Run mhonarc over the source directory and place the results in rel_target which is a path relative to the target passed to ::new. rcpath is the path to an MHonArc RC file to use.

# File lib/mlmmj-archiver/archiver.rb, line 312
def mhonarc(source, rel_target, rcpath)
  target = @target_dir + rel_target
  target.mkpath unless target.directory?

  ary = [@mhonarc.to_s, "-rcfile", rcpath.to_s, "-outdir", target.to_s, "-add", source.to_s]
  debug "Executing: #{ary.inspect}"
  system(*ary)
end
process_ml(sorted_mail_dir, archive_dir, rcpath) click to toggle source

Process all mails in sorted_mail_dir and output an HTML directory structure in archive_dir. rcpath is the path to an MHonArc RC file to use.

# File lib/mlmmj-archiver/archiver.rb, line 248
def process_ml(sorted_mail_dir, archive_dir, rcpath)
  debug "Processing sorted ML directory #{sorted_mail_dir} ===> #{archive_dir}"

  # Create the target directory
  archive_dir.mkpath unless archive_dir.directory?

  # Let mhonarc process the messages
  sorted_mail_dir.each_child do |yeardir|
    yeardir.each_child do |monthdir|
      mhonarc(monthdir, archive_dir + sprintf("%04d/%02d", yeardir.basename.to_s.to_i, monthdir.basename.to_s.to_i), rcpath)
    end
  end
end
split_messages_into_month_dirs(hsh, target) click to toggle source

Takes the result of collect_messages and writes the messages out to a directory structure under target like this:

2013/
  1/
    msg1
  2/
    msg1
    msg2
...

Already existing messages will not be copied again.

# File lib/mlmmj-archiver/archiver.rb, line 289
def split_messages_into_month_dirs(hsh, target)
  debug "Splitting into year-month directories under #{target}"
  target.mkpath unless target.directory?

  hsh.each_pair do |year, months|
    year_dir = target + year.to_s
    year_dir.mkdir unless year_dir.directory?

    months.each do |month, messages|
      month_dir = year_dir + month.to_s
      month_dir.mkdir unless month_dir.directory?

      messages.each do |msgpath|
        FileUtils.cp(msgpath, month_dir) unless month_dir.join(msgpath.basename).file?
      end
    end
  end
end