class PgImporter

Imports the StackOverflow data into a PostgreSQL data.

Public Class Methods

import_from_argv(argv) click to toggle source
# File lib/so2pg.rb, line 46
def self.import_from_argv(argv)
  # Parse the command-line options
  cmd_opts = PgOptionsParser.parse(ARGV)

  # If all validation passed, then execute the import!
  if cmd_opts
    start = Time.now
    pg = PgImporter.new(cmd_opts.has_key?(:relationships),
                        cmd_opts.has_key?(:optionals),
                        cmd_opts)
    pg.import(cmd_opts[:dir])
    puts "Import completed in #{Time.now - start}s"
  end
end
new(relations = false, optionals = false, options = {}) click to toggle source

(See SO2DB::Importer.initialize documentation)

Calls superclass method SO2DB::Importer::new
# File lib/so2pg.rb, line 42
def initialize(relations = false, optionals = false, options = {})
  super(relations, optionals, "postgresql", options)
end

Private Instance Methods

build_cmd(sql) click to toggle source

Builds the import command with the given SQL command and the global connection options.

Example:

>> sql = "COPY ..."
>> puts build_cmd(sql)
=> psql -d test -h localhost -c "COPY ..."
# File lib/so2pg.rb, line 91
def build_cmd(sql)
  # Only exists within the context of this script (not exported), so this
  # does not degrade security posture after the script has completed
  ENV['PGPASSWORD'] = conn_opts[:password] if conn_opts.has_key? :password

  cmd = "psql"
  cmd << " -d #{conn_opts[:database]}" if conn_opts.has_key? :database
  cmd << " -h #{conn_opts[:host]}" if conn_opts.has_key? :host
  cmd << " -U #{conn_opts[:username]}" if conn_opts.has_key? :username
  cmd << " -p #{conn_opts[:port]}" if conn_opts.has_key? :port
  cmd << " -c \"#{sql}\""

  return cmd
end
build_sql(value_str) click to toggle source

Builds the SQL command used for bulk loading the tables.

# File lib/so2pg.rb, line 80
def build_sql(value_str)
  "COPY #{value_str} FROM STDIN WITH (FORMAT csv, DELIMITER E'\x0B')"
end
execute_cmd(cmd, formatter) click to toggle source

Executes the provided shell command and pumps the data from the formatter to it over stdin.

# File lib/so2pg.rb, line 108
def execute_cmd(cmd, formatter)
  IO.popen(cmd, 'r+') do |s|
    formatter.format(s)
    s.close_write
  end
end
import_stream(formatter) click to toggle source

Imports the data from the formatter into the PostgreSQL database.

Note that what follows is just one way to implement the importer. You could just as easily push the formatted data into a file and then ask the database to suck that file in.

# File lib/so2pg.rb, line 68
def import_stream(formatter)
  puts "Importing file #{formatter.file_name}..."
  start = Time.now

  sql = build_sql(formatter.value_str)
  cmd = build_cmd(sql)
  execute_cmd(cmd, formatter)

  puts "   -> #{Time.now - start}s"
end