class CorpusProcessor::Processor

The entry point for processing corpus.

@example Simple use with default configuration.

CorpusProcessor::Processor.new.process('<P>Some text</P>')
# => "Some\tO\ntext\tO\n.\tO\n""

Public Class Methods

new( categories: CorpusProcessor::Categories.default, parser: CorpusProcessor::Parsers::Lampada.new(categories), generator: CorpusProcessor::Generators::StanfordNer.new(categories)) click to toggle source

@param categories [Hash] the categories extracted with {Categories}. @param parser [#parse] the parser for original corpus. @param generator [#generate] the generator that computes tokens into

the tranformed corpus.
# File lib/corpus-processor/processor.rb, line 12
def initialize(
  categories: CorpusProcessor::Categories.default,
  parser:     CorpusProcessor::Parsers::Lampada.new(categories),
  generator:  CorpusProcessor::Generators::StanfordNer.new(categories))
  @parser    = parser
  @generator = generator
end

Public Instance Methods

process(corpus) click to toggle source

Perform the processing of corpus.

@return [String] the converted corpus.

# File lib/corpus-processor/processor.rb, line 23
def process corpus
  @generator.generate @parser.parse(corpus)
end