class Spacy::PyLanguage
See also spaCy Python API document for [`Language`](spacy.io/api/language).
Attributes
@return [Object] a Python `Language` instance accessible via `PyCall`
@return [String] an identifier string that can be used to refer to the Python `Language` object inside `PyCall::exec` or `PyCall::eval`
Public Class Methods
Creates a language model instance, which is conventionally referred to by a variable named `nlp`. @param model [String] A language model installed in the system
# File lib/ruby-spacy.rb, line 219 def initialize(model = "en_core_web_sm") @spacy_nlp_id = "nlp_#{model.object_id}" PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')") @py_nlp = PyCall.eval(@spacy_nlp_id) end
Public Instance Methods
A utility method to get a Python `Lexeme` object. @param text [String] A text string representing a lexeme @return [Object] Python `Lexeme` object (spacy.io/api/lexeme)
# File lib/ruby-spacy.rb, line 257 def get_lexeme(text) @py_nlp.vocab[text] end
Generates a matcher for the current language model. @return [Matcher]
# File lib/ruby-spacy.rb, line 233 def matcher Matcher.new(@py_nlp) end
Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.…
# File lib/ruby-spacy.rb, line 309 def method_missing(name, *args) @py_nlp.send(name, *args) end
Returns n lexemes having the vector representations that are the most similar to a given vector representation of a word. @param vector [Object] A vector representation of a word (whether existing or non-existing) @return [Array<Hash{:key => Integer, :text => String, :best_rows => Array<Float>, :score => Float}>] An array of hash objects each contains the `key`, `text`, `best_row` and similarity `score` of a lexeme
# File lib/ruby-spacy.rb, line 271 def most_similar(vector, n) vec_array = Numpy.asarray([vector]) py_result = @py_nlp.vocab.vectors.most_similar(vec_array, n: n) key_texts = PyCall.eval("[[str(n), #{@spacy_nlp_id}.vocab[n].text] for n in #{py_result[0][0].tolist}]") keys = key_texts.map{|kt| kt[0]} texts = key_texts.map{|kt| kt[1]} best_rows = PyCall::List.(py_result[1])[0] scores = PyCall::List.(py_result[2])[0] results = [] n.times do |i| result = {key: keys[i].to_i, text: texts[i], best_row: best_rows[i], score: scores[i] } result.each_key do |key| result.define_singleton_method(key){ result[key] } end results << result end results end
Utility function to batch process many texts @param texts [String] @param disable [Array<String>] @param batch_size [Integer] @return [Array<Doc>]
# File lib/ruby-spacy.rb, line 300 def pipe(texts, disable: [], batch_size: 50) docs = [] PyCall::List.(@py_nlp.pipe(texts, disable: disable, batch_size: batch_size)).each do |py_doc| docs << Doc.new(@py_nlp, py_doc: py_doc) end docs end
A utility method to list pipeline components. @return [Array<String>] An array of text strings representing pipeline components
# File lib/ruby-spacy.rb, line 246 def pipe_names pipe_array = [] PyCall::List.(@py_nlp.pipe_names).each do |pipe| pipe_array << pipe end pipe_array end
Reads and analyze the given text. @param text [String] a text to be read and analyzed
# File lib/ruby-spacy.rb, line 227 def read(text) Doc.new(py_nlp, text: text) end
Returns a ruby lexeme object @param text [String] a text string representing the vocabulary item @return [Lexeme]
# File lib/ruby-spacy.rb, line 264 def vocab(text) Lexeme.new(@py_nlp.vocab[text]) end
A utility method to lookup a vocabulary item of the given id. @param id [Integer] a vocabulary id @return [Object] a Python `Lexeme` object (spacy.io/api/lexeme)
# File lib/ruby-spacy.rb, line 240 def vocab_string_lookup(id) PyCall.eval("#{@spacy_nlp_id}.vocab.strings[#{id}]") end