module Wukong::Hadoop::ReduceLogic

Implements logic for figuring out the correct reducer commandline given wu-hadoop's arguments and whether or not to run a map-only (no-reduce) job.

Public Instance Methods

explicit_reduce_command?() click to toggle source

Were we given an explicit reduce command (like 'uniq -c') or are we to introspect and construct the command?

@return [true, false]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 30
def explicit_reduce_command?
  settings[:reduce_command]
end
explicit_reduce_processor?() click to toggle source

Were we given a processor to use as our reducer explicitly by name or are we to introspect to discover the correct processor?

@return [true, false]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 39
def explicit_reduce_processor?
  settings[:reducer]
end
explicit_reducer?() click to toggle source

Were we given an explicit reducer (either as a command or as a processor) or should we introspect to find one?

@return [true, false]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 47
def explicit_reducer?
  explicit_reduce_processor? || explicit_reduce_command?
end
map_only?() click to toggle source

Is this a map-only job?

@see reduce?

@return [true, false]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 83
def map_only?
  (! reduce?)
end
reduce?() click to toggle source

Should we perform a reduce or is this a map-only job?

We will definitely reduce if

- given an explicit <tt>--reduce_command</tt>
- we discovered a reducer

We will not reduce if:

- <tt>--reduce_tasks</tt> was explicitly set to 0

@return [true, false]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 71
def reduce?
  return false if settings[:reduce_tasks] && settings[:reduce_tasks].to_i == 0
  return true  if settings[:reduce_command]
  return true  if reducer_name
  false
end
reducer_arg() click to toggle source

The argument that we should introspect on to turn into our reducer.

@return [String]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 55
def reducer_arg
  args.last
end
reducer_commandline() click to toggle source

Return the actual commandline used by the reducer, whether running in local or Hadoop mode.

You should be able to copy, paste, and run this command unmodified to debug the reducer.

@return [String]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 16
def reducer_commandline
  return ''                        unless reduce?
  return settings[:reduce_command] if     explicit_reduce_command?
  arg = (mode == :hadoop ? File.basename(reducer_arg) : reducer_arg)
  [command_prefix, 'wu-local', arg].tap do |cmd|
    cmd << "--run=#{reducer_name}" if reducer_needs_run_arg?
    cmd << non_wukong_hadoop_params_string
  end.compact.map(&:to_s).reject(&:empty?).join(' ')
end
reducer_name() click to toggle source

Return the name of the processor to use as the reducer.

Will raise a Wukong::Error if a given reducer is invalid. Will return nil if no reducer can be guessed.

Most of the logic that examines explicit command line arguments and checks for the existence of named processors or files is here.

@return [String]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 109
def reducer_name
  case
  when explicit_reducer?
    if processor_registered?(settings[:reducer])
      settings[:reducer]
    else
      raise Error.new("No such processor: '#{settings[:reducer]}'")
    end
  when single_job_arg? && explicit_mapper? && processor_registered?(reducer_arg)
    reducer_arg
  when separate_map_and_reduce_args? && processor_registered?(reducer_arg)
    reducer_arg
  when separate_map_and_reduce_args? && file_is_processor?(reducer_arg)
    processor_name_from_file(reducer_arg)
  when processor_registered?('reducer')
    'reducer'
  end
end
reducer_needs_run_arg?() click to toggle source

Does the reducer commandline need an explicit –run argument?

Will not be used if the processor name is the same as the name of the script.

@return [true, false]

# File lib/wukong-hadoop/runner/reduce_logic.rb, line 93
def reducer_needs_run_arg?
  return false if reducer_arg.to_s == reducer_name.to_s
  return false if File.basename(reducer_arg.to_s, '.rb') == reducer_name
  true
end