class SO2DB::Importer
Base class for StackOverflow data importers. Drives database setup and data importing files from a directory.
Implementations of this class must provide a method with the following signature:
import_stream(formatter)
This method may be private. The purpose of this method is to actually perform the data import with data from the provided formatter. The formatter is provided to support scenarios of streaming data to STDIN (e.g., PostgreSQL’s COPY command) as well as pushing data to a file before import (e.g., for MySQL’s mysqlimport utility). It has type SO2DB::Formatter
.
The importer uses ActiveRecord
for table creation and Foreigner for creating table relationships. You are limited to the databases supported by these libraries. In addition, a ‘uuid’ method must be avaiable to the adapter provided to ActiveRecord
. (See so2pg for an example of an adapter extension that provides the method.)
In addition, it provides two accessors for subclasses:
attr_reader :conn_opts attr_accessor :delimiter
The conn_opts
property provides the ActiveRecord
connection data (e.g., :database, :host, etc.). The delimiter property sets the delimiter used by the formatter. The delimiter is v (0xB) by default.
Attributes
Public Class Methods
Initializes the importer.
Arguments:
relations: (Boolean) Indicates whether database relationships should be created. optionals: (Boolean) Indicates whether optional database tables and content should be created. adapter: (String) The ActiveRecord adapter name (e.g., 'postgresql'). options: (Hash) The database connection options, as required by ActiveRecord for the provided adapter.
# File lib/so2db.rb, line 65 def initialize(relations = false, optionals = false, adapter = '', options = {}) @relations = relations @optionals = optionals @conn_opts = options.merge( { :adapter => adapter } ) @format_delimiter = 11.chr.to_s end
Public Instance Methods
Creates the database tables and relationships, and imports the data in the files in the specified directory.
Arguments:
dir: (String) The directory path containting the StackOverflow data dump XML files (e.g., badges.xml, posts.xml, etc.).
# File lib/so2db.rb, line 78 def import(dir) setup create_basics import_data(dir) create_relations if @relations create_optionals if @optionals create_optional_relations if @relations and @optionals end
Private Instance Methods
# File lib/so2db.rb, line 97 def create_basics SO2DB::CreateBasicTables.new.up end
# File lib/so2db.rb, line 117 def create_optional_relations SO2DB::CreateOptionalRelationships.new.up end
# File lib/so2db.rb, line 113 def create_optionals SO2DB::CreateOptionals.new.up end
# File lib/so2db.rb, line 109 def create_relations SO2DB::CreateRelationships.new.up end
# File lib/so2db.rb, line 101 def import_data(dir) files = Dir.entries(dir).delete_if { |x| !x.end_with? 'xml' } files.each do |f| f = Formatter.new(File.join(dir, f), @format_delimiter) import_stream f end end
# File lib/so2db.rb, line 92 def setup ActiveRecord::Base.establish_connection @conn_opts Foreigner.load end