class SO2DB::Importer

Base class for StackOverflow data importers. Drives database setup and data importing files from a directory.

Implementations of this class must provide a method with the following signature:

import_stream(formatter)

This method may be private. The purpose of this method is to actually perform the data import with data from the provided formatter. The formatter is provided to support scenarios of streaming data to STDIN (e.g., PostgreSQL’s COPY command) as well as pushing data to a file before import (e.g., for MySQL’s mysqlimport utility). It has type SO2DB::Formatter.

The importer uses ActiveRecord for table creation and Foreigner for creating table relationships. You are limited to the databases supported by these libraries. In addition, a ‘uuid’ method must be avaiable to the adapter provided to ActiveRecord. (See so2pg for an example of an adapter extension that provides the method.)

In addition, it provides two accessors for subclasses:

attr_reader :conn_opts
attr_accessor :delimiter

The conn_opts property provides the ActiveRecord connection data (e.g., :database, :host, etc.). The delimiter property sets the delimiter used by the formatter. The delimiter is v (0xB) by default.

Attributes

conn_opts[R]
format_delimiter[RW]

Public Class Methods

new(relations = false, optionals = false, adapter = '', options = {}) click to toggle source

Initializes the importer.

Arguments:

relations: (Boolean) Indicates whether database relationships should
                     be created.
optionals: (Boolean) Indicates whether optional database tables and
                     content should be created.
adapter:   (String)  The ActiveRecord adapter name (e.g., 'postgresql').
options:   (Hash)    The database connection options, as required by
                     ActiveRecord for the provided adapter.
# File lib/so2db.rb, line 65
def initialize(relations = false, optionals = false, adapter = '', options = {})
  @relations = relations
  @optionals = optionals
  @conn_opts = options.merge( { :adapter => adapter } )
  @format_delimiter = 11.chr.to_s
end

Public Instance Methods

import(dir) click to toggle source

Creates the database tables and relationships, and imports the data in the files in the specified directory.

Arguments:

dir:  (String) The directory path containting the StackOverflow data
               dump XML files (e.g., badges.xml, posts.xml, etc.).
# File lib/so2db.rb, line 78
def import(dir)
  setup
  create_basics
  import_data(dir)
  create_relations if @relations
  create_optionals if @optionals
  create_optional_relations if @relations and @optionals
end

Private Instance Methods

create_basics() click to toggle source
# File lib/so2db.rb, line 97
def create_basics
  SO2DB::CreateBasicTables.new.up
end
create_optional_relations() click to toggle source
# File lib/so2db.rb, line 117
def create_optional_relations
  SO2DB::CreateOptionalRelationships.new.up
end
create_optionals() click to toggle source
# File lib/so2db.rb, line 113
def create_optionals
  SO2DB::CreateOptionals.new.up
end
create_relations() click to toggle source
# File lib/so2db.rb, line 109
def create_relations
  SO2DB::CreateRelationships.new.up
end
import_data(dir) click to toggle source
# File lib/so2db.rb, line 101
def import_data(dir)
  files = Dir.entries(dir).delete_if { |x| !x.end_with? 'xml' }
  files.each do |f|
    f = Formatter.new(File.join(dir, f), @format_delimiter)
    import_stream f
  end
end
setup() click to toggle source
# File lib/so2db.rb, line 92
def setup
  ActiveRecord::Base.establish_connection @conn_opts
  Foreigner.load
end