class Spacy::Doc

See also spaCy Python API document for [`Doc`](spacy.io/api/doc).

Attributes

py_doc[R]

@return [Object] a Python `Doc` instance accessible via `PyCall`

py_nlp[R]

@return [Object] a Python `Language` instance accessible via `PyCall`

text[R]

@return [String] a text string of the document

Public Class Methods

new(nlp, py_doc: nil, text: nil) click to toggle source

It is recommended to use {Language#read} method to create a doc. If you need to create one using {Doc#initialize}, there are two method signatures: `Spacy::Doc.new(nlp_id, py_doc: Object)` and `Spacy::Doc.new(nlp_id, text: String)`. @param nlp [Language] an instance of {Language} class @param py_doc [Object] an instance of Python `Doc` class @param text [String] the text string to be analyzed

# File lib/ruby-spacy.rb, line 65
def initialize(nlp, py_doc: nil, text: nil)
  @py_nlp = nlp
  if py_doc
    @py_doc = py_doc
  else
    @py_doc = nlp.(text)
  end
  @text = @py_doc.text
end

Public Instance Methods

[](range) click to toggle source

Returns a span if given a range object; or returns a token if given an integer representing a position in the doc. @param range [Range] an ordinary Ruby's range object such as `0..3`, `1…4`, or `3 .. -1`

# File lib/ruby-spacy.rb, line 178
def [](range)
  if range.is_a?(Range)
    py_span = @py_doc[range]
    return Span.new(self, start_index: py_span.start, end_index: py_span.end - 1)
  else
    return Token.new(@py_doc[range])
  end
end
displacy(style: "dep", compact: false) click to toggle source

Visualize the document in one of two styles: “dep” (dependencies) or “ent” (named entities). @param style [String] either `dep` or `ent` @param compact [Boolean] only relevant to the `dep' style @return [String] in the case of `dep`, the output text will be an SVG, whereas in the `ent` style, the output text will be an HTML.

# File lib/ruby-spacy.rb, line 198
def displacy(style: "dep", compact: false)
  PyDisplacy.render(py_doc, style: style, options: {compact: compact}, jupyter: false)
end
each() { |token| ... } click to toggle source

Iterates over the elements in the doc yielding a token instance each time.

# File lib/ruby-spacy.rb, line 114
def each
  PyCall::List.(@py_doc).each do |py_token|
    yield Token.new(py_token)
  end
end
ents() click to toggle source

Returns an array of spans each representing a named entity. @return [Array<Span>]

# File lib/ruby-spacy.rb, line 164
def ents
  # so that ents canbe "each"-ed in Ruby
  ent_array = []
  PyCall::List.(@py_doc.ents).each do |ent|
    ent.define_singleton_method :label do
      return self.label_
    end
    ent_array << ent
  end
  ent_array
end
method_missing(name, *args) click to toggle source

Methods defined in Python but not wrapped in ruby-spacy can be called by this dynamic method handling mechanism.

# File lib/ruby-spacy.rb, line 203
def method_missing(name, *args)
  @py_doc.send(name, *args)
end
noun_chunks() click to toggle source

Returns an array of spans representing noun chunks. @return [Array<Span>]

# File lib/ruby-spacy.rb, line 142
def noun_chunks
  chunk_array = []
  py_chunks = PyCall::List.(@py_doc.noun_chunks)
  py_chunks.each do |py_chunk|
    chunk_array << Span.new(self, start_index: py_chunk.start, end_index: py_chunk.end - 1)
  end
  chunk_array
end
retokenize(start_index, end_index, attributes = {}) click to toggle source

Retokenizes the text merging a span into a single token. @param start_index [Integer] the start position of the span to be retokenized in the document @param end_index [Integer] the end position of the span to be retokenized in the document @param attributes [Hash] attributes to set on the merged token

# File lib/ruby-spacy.rb, line 79
def retokenize(start_index, end_index, attributes = {})
  PyCall.with(@py_doc.retokenize()) do |retokenizer|
    retokenizer.merge(@py_doc[start_index .. end_index], attrs: attributes)
  end
end
retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {}) click to toggle source

Retokenizes the text splitting the specified token. @param pos_in_doc [Integer] the position of the span to be retokenized in the document @param split_array [Array<String>] text strings of the split results @param ancestor_pos [Integer] the position of the immediate ancestor element of the split elements in the document @param attributes [Hash] the attributes of the split elements

# File lib/ruby-spacy.rb, line 90
def retokenize_split(pos_in_doc, split_array, head_pos_in_split, ancestor_pos, attributes = {})
  PyCall.with(@py_doc.retokenize()) do |retokenizer|
    heads = [[@py_doc[pos_in_doc], head_pos_in_split], @py_doc[ancestor_pos]]
    retokenizer.split(@py_doc[pos_in_doc], split_array, heads: heads, attrs: attributes)
  end
end
sents() click to toggle source

Returns an array of spans each representing a sentence. @return [Array<Span>]

# File lib/ruby-spacy.rb, line 153
def sents
  sentence_array = []
  py_sentences = PyCall::List.(@py_doc.sents)
  py_sentences.each do |py_sent|
    sentence_array << Span.new(self, start_index: py_sent.start, end_index: py_sent.end - 1)
  end
  sentence_array
end
similarity(other) click to toggle source

Returns a semantic similarity estimate. @param other [Doc] the other doc to which a similarity estimation is made @return [Float]

# File lib/ruby-spacy.rb, line 190
def similarity(other)
  py_doc.similarity(other.py_doc)
end
span(range_or_start, optional_size = nil) click to toggle source

Returns a span of the specified range within the doc. The method should be used either of the two ways: `Doc#span(range)` or `Doc#span{start_pos, size_of_span}`. @param range_or_start [Range, Integer] a range object, or, alternatively, an integer that represents the start position of the span @param optional_size [Integer] an integer representing the size of the span @return [Span]

# File lib/ruby-spacy.rb, line 125
def span(range_or_start, optional_size = nil)
  if optional_size
    start_index = range_or_start
    temp = tokens[start_index ... start_index + optional_size]
  else
    start_index = range_or_start.first
    range = range_or_start
    temp = tokens[range]
  end

  end_index = start_index + temp.size - 1

  Span.new(self, start_index: start_index, end_index: end_index)
end
to_s() click to toggle source

String representation of the document. @return [String]

# File lib/ruby-spacy.rb, line 99
def to_s
  @text
end
tokens() click to toggle source

Returns an array of tokens contained in the doc. @return [Array<Token>]

# File lib/ruby-spacy.rb, line 105
def tokens
  results = []
  PyCall::List.(@py_doc).each do |py_token|
    results << Token.new(py_token)
  end
  results
end