class DoverToCalais::Dover

This class is responsible for parsing, reading and sending to OpenCalais, text from a data source. The data source is passed to the class constructor and can be pretty much any form of document or URL. The class allows the user to specify one or more callbacks, to be called when the data source has been processed by OpenCalais ({#to_calais}).

@!attribute [r] data_src

@return [String] the data source to be processed, either a file path or a URL.

@!attribute [r] error

@return [String, nil] any error that occurred during data-source processing, nil if none occurred

Constants

CALAIS_SERVICE

Attributes

data_src[R]
error[R]

Public Class Methods

new(data_src) click to toggle source

creates a new Dover object, passing the name of the data source to be processed

@param data_src [String] the name of the data source to be processed

# File lib/dover_to_calais.rb, line 480
def initialize(data_src)
  @data_src = data_src
  @callbacks = []
end

Public Instance Methods

analyse_this(output_format=nil) click to toggle source

Gets the source text parsed. If the parsing is successful, the data source is POSTed to OpenCalais via an EventMachine request and a callback is set to manage the OpenCalais response. All Dover object callbacks are then called with the request result yielded to them.

@param N/A @return a {Class ResponseData} object

# File lib/dover_to_calais.rb, line 531
def analyse_this(output_format=nil)

  if output_format
    @output_format = 'application/json'
  else
    @output_format = 'Text/Simple'
  end

  @document = get_src_data(@data_src)
  begin
    if @document[0..2].eql?('ERR')
      raise 'Invalid data source'
    else
      response = nil

      connection_options = {:inactivity_timeout => 0}


      if DoverToCalais::PROXY &&
        DoverToCalais::PROXY.class.eql?('Hash') &&
        DoverToCalais::PROXY.keys[0].eql?(:proxy)

        connection_options = connection_options.merge(DoverToCalais::PROXY)
      end

      request_options = {
        :body => @document.to_s,
        :head => {
          'x-calais-licenseID' => DoverToCalais::API_KEY,
          :content_type => 'TEXT/RAW',
          :enableMetadataType => 'GenericRelations,SocialTags',
        :outputFormat => @output_format}
      }

      http = EventMachine::HttpRequest.new(CALAIS_SERVICE, connection_options ).post request_options


      http.callback do

        if http.response_header.status == 200
          if @output_format == 'Text/Simple'
            http.response.match(/<OpenCalaisSimple>/) do |m|
              response = Nokogiri::XML('<OpenCalaisSimple>' + m.post_match)  do |config|
                #strict xml parsing, disallow network connections
                config.strict.nonet
              end #block
            end
          else #@output_format == 'application/json'
            response = JSON.parse(http.response) #response should now be a Hash

          end #if

          case response.class.to_s
          when 'NilClass'
            result = ResponseData.new(nil,'ERR: cannot parse response data - source invalid?')
          when 'Nokogiri::XML::Document'
            result = ResponseData.new(response, nil)
          when 'Hash'
            result = ResponseData.new(response, nil)
          else
            result = ResponseData.new(nil,'ERR: cannot parse response data - unrecognized format!')
          end


        else #non-200 response
          result = ResponseData.new nil,
          "ERR: OpenCalais service responded with #{http.response_header.status} - response body: '#{http.response}'"
        end

        @callbacks.each { |c| c.call(result) }

      end  #callback


      http.errback do
        result = ResponseData.new nil, "ERR: #{http.error}"
        @callbacks.each { |c| c.call(result) }
      end  #errback


    end  #if
  rescue  Exception=>e
    #result = ResponseData.new nil,  "ERR: #{e}"
    #@callbacks.each { |c| c.call(result) }
    @error = "ERR: #{e}"
  end

end
Also aliased as: analyze_this
analyze_this(output_format=nil)
Alias for: analyse_this
to_calais(&block) click to toggle source

Defines the user callbacks. If the data source is successfully read, then this method will store a user-defined block which will be called on completion of the OpenCalais HTTP request. If the data source cannot be read -for whatever reason- then the block will immediately be called, passing the parameter that caused the read failure.

@param block a user-defined block @return N/A

# File lib/dover_to_calais.rb, line 510
def to_calais(&block)
  #fred rules ok
  if !@error
    @callbacks << block
  else
    result = ResponseData.new nil, @error
    block.call(result)
  end

end

Private Instance Methods

get_src_data(src) click to toggle source

uses the {github.com/Erol/yomu yomu} gem to extract text from a number of document formats and URLs. If an exception occurs, it is written to the {@error} instance variable

@param [String] src the name of the data source (file-path or URI) @return [String, nil] the extracted text, or nil if an exception occurred.

# File lib/dover_to_calais.rb, line 491
def get_src_data(src)
  begin
    yomu = Yomu.new src

  rescue Exception=>e
    @error = "ERR: #{e}"
  else
    yomu.text
  end

end