module Wukong::Hadoop::HadoopInvocation
Provides methods for executing a map/reduce job on a Hadoop
cluster via Hadoop streaming.
Public Instance Methods
hadoop_commandline()
click to toggle source
Return the Hadoop
command used to launch this job in a Hadoop
cluster.
You should be able to copy, paste, and run this command unmodified when debugging.
@return [String]
# File lib/wukong-hadoop/runner/hadoop_invocation.rb, line 24 def hadoop_commandline [ hadoop_runner, "jar #{hadoop_streaming_jar}", hadoop_jobconf_options, "-D mapred.job.name='#{job_name}'", hadoop_files, hadoop_other_args, "-mapper '#{mapper_commandline}'", "-reducer '#{reducer_commandline}'", "-input '#{input_paths}'", "-output '#{output_path}'", io_formats, hadoop_recycle_env, ].flatten.compact.join(" \t\\\n ") end
input_format()
click to toggle source
The input format to use.
Respects the value of --input_format
.
@return [String]
# File lib/wukong-hadoop/runner/hadoop_invocation.rb, line 59 def input_format settings[:input_format] end
job_name()
click to toggle source
The job name that will be passed to Hadoop
.
Respects the --job_name
option if given, otherwise constructs one from the given processors, input, and output paths.
@return [String]
# File lib/wukong-hadoop/runner/hadoop_invocation.rb, line 48 def job_name return settings[:job_name] if settings[:job_name] relevant_filename = args.compact.uniq.map { |path| File.basename(path, '.rb') }.join('-') "#{relevant_filename}---#{input_paths}---#{output_path}".gsub(%r{[^\w/\.\-\+]+}, '') end
output_format()
click to toggle source
The output format to use.
Respects the value of --output_format
.
@return [String]
# File lib/wukong-hadoop/runner/hadoop_invocation.rb, line 68 def output_format settings[:output_format] end
remove_output_path()
click to toggle source
Remove the output path.
Will not actually do anything if the --dry_run
option is also given.
# File lib/wukong-hadoop/runner/hadoop_invocation.rb, line 13 def remove_output_path execute_command("#{hadoop_runner} fs -rmr '#{output_path}'") end