class CorpusProcessor::Cli

The operations available to users from CLI.

Public Instance Methods

process(input_file = STDIN, output_file = STDOUT) click to toggle source

Convert a given corpus from one format to other.

By default the input format is LâMPADA and the output format is the one used by Stanford NER in training.

@param input_file [String, IO] the file that contains the original corpus. @param output_file [String, IO] the file in which the converted corpus

is written.

@return [void]

# File lib/corpus-processor/cli.rb, line 23
def process input_file = STDIN, output_file = STDOUT
  input_file  = File.open( input_file, 'r') if  input_file.is_a? String
  output_file = File.open(output_file, 'w') if output_file.is_a? String
  categories  = if options[:categories]
                  CorpusProcessor::Categories.load(options[:categories])
                else
                  CorpusProcessor::Categories.default
                end

  output_file.puts CorpusProcessor::Processor.new(categories: categories)
                                             .process(input_file.read)

  output_file.close
end