class Spacy::PyLanguage

See also spaCy Python API document for [`Language`](spacy.io/api/language).

Attributes

py_nlp[R]

@return [Object] a Python `Language` instance accessible via `PyCall`

spacy_nlp_id[R]

@return [String] an identifier string that can be used to refer to the Python `Language` object inside `PyCall::exec` or `PyCall::eval`

Public Class Methods

new(model = "en_core_web_sm") click to toggle source

Creates a language model instance, which is conventionally referred to by a variable named `nlp`. @param model [String] A language model installed in the system

# File lib/ruby-spacy.rb, line 219
def initialize(model = "en_core_web_sm")
  @spacy_nlp_id = "nlp_#{model.object_id}"
  PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
  @py_nlp = PyCall.eval(@spacy_nlp_id)
end

Public Instance Methods

get_lexeme(text) click to toggle source

A utility method to get a Python `Lexeme` object. @param text [String] A text string representing a lexeme @return [Object] Python `Lexeme` object (spacy.io/api/lexeme)

# File lib/ruby-spacy.rb, line 257
def get_lexeme(text)
  @py_nlp.vocab[text]
end
matcher() click to toggle source

Generates a matcher for the current language model. @return [Matcher]

# File lib/ruby-spacy.rb, line 233
def matcher
  Matcher.new(@py_nlp)
end
method_missing(name, *args) click to toggle source

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…

# File lib/ruby-spacy.rb, line 309
def method_missing(name, *args)
  @py_nlp.send(name, *args)
end
most_similar(vector, n) click to toggle source

Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word. @param vector [Object] A vector representation of a word (whether existing or non-existing) @return [Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>] An array of hash objects each contains the `key`, `text`, `best_row` and similarity `score` of a lexeme

# File lib/ruby-spacy.rb, line 271
def most_similar(vector, n)
  vec_array = Numpy.asarray([vector])
  py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: n)
  key_texts = PyCall.eval("[[str(n), #{@spacy_nlp_id}.vocab[n].text] for n in #{py_result[0][0].tolist}]")
  keys = key_texts.map{|kt| kt[0]}
  texts = key_texts.map{|kt| kt[1]}
  best_rows = PyCall::List.(py_result[1])[0]
  scores = PyCall::List.(py_result[2])[0]

  results = []
  n.times do |i|
    result = {key: keys[i].to_i,
              text: texts[i],
              best_row: best_rows[i],
              score: scores[i]
    }
    result.each_key do |key|
      result.define_singleton_method(key){ result[key] }
    end
    results << result
  end
  results
end
pipe(texts, disable: [], batch_size: 50) click to toggle source

Utility function to batch process many texts @param texts [String] @param disable [Array<String>] @param batch_size [Integer] @return [Array<Doc>]

# File lib/ruby-spacy.rb, line 300
def pipe(texts, disable: [], batch_size: 50) 
  docs = []
  PyCall::List.(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc|
    docs << Doc.new(@py_nlp, py_doc: py_doc)
  end
  docs
end
pipe_names() click to toggle source

A utility method to list pipeline components. @return [Array<String>] An array of text strings representing pipeline components

# File lib/ruby-spacy.rb, line 246
def pipe_names
  pipe_array = []
  PyCall::List.(@py_nlp.pipe_names).each do |pipe|
    pipe_array << pipe
  end
  pipe_array
end
read(text) click to toggle source

Reads and analyze the given text. @param text [String] a text to be read and analyzed

# File lib/ruby-spacy.rb, line 227
def read(text)
  Doc.new(py_nlp, text: text)
end
vocab(text) click to toggle source

Returns a ruby lexeme object @param text [String] a text string representing the vocabulary item @return [Lexeme]

# File lib/ruby-spacy.rb, line 264
def vocab(text)
  Lexeme.new(@py_nlp.vocab[text])
end
vocab_string_lookup(id) click to toggle source

A utility method to lookup a vocabulary item of the given id. @param id [Integer] a vocabulary id @return [Object] a Python `Lexeme` object (spacy.io/api/lexeme)

# File lib/ruby-spacy.rb, line 240
def vocab_string_lookup(id)
  PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]")
end