class CorpusProcessor::Generators::StanfordNer

The generator for Stanford NER corpus.

Generates corpus in the format used by Stanford NER training.

Public Class Methods

new(categories = CorpusProcessor::Categories.default) click to toggle source

@param categories [Hash] the categories definitions loaded by

{CorpusProcessor::Categories}.
# File lib/corpus-processor/generators/stanford_ner.rb, line 8
def initialize categories = CorpusProcessor::Categories.default
  @categories = categories.fetch :output
end

Public Instance Methods

generate(tokens) click to toggle source

Generate the corpus from tokens.

@param tokens [Array<CorpusProcessor::Token>] the tokens from which

the corpus is generated.

@return [String] the generated corpus.

# File lib/corpus-processor/generators/stanford_ner.rb, line 17
def generate tokens
  tokens.map { |token|
    "#{ token.word }\t#{ @categories[token.category] }"
  }.join("\n") + "\n"
end