class XapianFu::XapianDoc

A XapianDoc represents a document in a XapianDb. Searches return XapianDoc objects and they are used internally when adding new documents to the database. You usually don't need to instantiate them yourself unless you're doing something a bit advanced.

Constants

STOPPER_STRATEGIES

Attributes

data[R]

An abitrary blob of data stored alongside the document in the Xapian database.

db[RW]

The XapianDb object that this document was retrieved from, or should be stored in.

fields[R]

A hash of the fields given to this object on initialize

id[RW]

The unsigned integer “primary key” for this document in the Xapian database.

match[R]

The Xapian::Match object for this document when returned as part of a search result.

weight[R]

The search score of this document when returned as part of a search result

Public Class Methods

new(doc, options = {}) click to toggle source

Expects a Xapian::Document, a Hash-like object, or anything that with a to_s method. Anything else raises a XapianTypeError. The :weight option sets the search weight when setting up search results. The :data option sets some additional data to be stored with the document in the database. The :xapian_db option sets the XapianDb to allow saves and term enumeration.

    # File lib/xapian_fu/xapian_doc.rb
 73 def initialize(doc, options = {})
 74   @options = options
 75 
 76   @fields = {}
 77   if doc.is_a? Xapian::Match
 78     match = doc
 79     doc = match.document
 80     @match = match
 81     @weight = @match.weight
 82   end
 83 
 84   # Handle initialisation from a Xapian::Document, which is
 85   # usually a search result from a Xapian database
 86   if doc.is_a?(Xapian::Document)
 87     @xapian_document = doc
 88     @id = doc.docid
 89   # Handle initialisation from a hash-like object
 90   elsif doc.respond_to?(:has_key?) and doc.respond_to?("[]")
 91     @fields = doc
 92     @id = doc[:id] if doc.has_key?(:id)
 93   # Handle initialisation from an object with a to_xapian_fu_string method
 94   elsif doc.respond_to?(:to_xapian_fu_string)
 95     @fields = { :content => doc.to_xapian_fu_string }
 96   # Handle initialisation from anything else that can be coerced
 97   # into a string
 98   elsif doc.respond_to? :to_s
 99     @fields = { :content => doc.to_s }
100   else
101     raise XapianTypeError, "Can't handle indexing a '#{doc.class}' object"
102   end
103   @weight = options[:weight] if options[:weight]
104   @data = options[:data] if options[:data]
105   @db = options[:xapian_db] if options[:xapian_db]
106 end

Public Instance Methods

==(b) click to toggle source

Compare IDs with another XapianDoc

Calls superclass method
    # File lib/xapian_fu/xapian_doc.rb
149 def ==(b)
150   if b.is_a?(XapianDoc)
151     id == b.id && (db == b.db || db.dir == b.db.dir)
152   else
153     super(b)
154   end
155 end
create() click to toggle source

Add this document to the Xapian Database

    # File lib/xapian_fu/xapian_doc.rb
171 def create
172   self.id = db.rw.add_document(to_xapian_document)
173 end
inspect() click to toggle source
    # File lib/xapian_fu/xapian_doc.rb
157 def inspect
158   s = ["<#{self.class.to_s} id=#{id}"]
159   s << "weight=%.5f" % weight if weight
160   s << "db=#{db.nil? ? 'nil' : db}"
161   s.join(' ') + ">"
162 end
language() click to toggle source

Return this document's language which is set on initialize, inherited from the database or defaults to :english

    # File lib/xapian_fu/xapian_doc.rb
253 def language
254   if @language
255     @language
256   else
257     @language =
258       if ! @options[:language].nil?
259         @options[:language]
260       elsif db and db.language
261         db.language
262       else
263         :english
264       end
265   end
266 end
save() click to toggle source

Add this document to the Xapian Database, or replace it if it already has an id.

    # File lib/xapian_fu/xapian_doc.rb
166 def save
167   id ? update : create
168 end
stemmer() click to toggle source

Return the stemmer for this document. If not set on initialize by the :stemmer or :language option, it will try the database's stemmer and otherwise defaults to an English stemmer.

    # File lib/xapian_fu/xapian_doc.rb
191 def stemmer
192   if @stemmer
193     @stemmer
194   else
195     @stemmer =
196       if ! @options[:stemmer].nil?
197         @options[:stemmer]
198       elsif @options[:language]
199         @options[:language]
200       elsif db
201         db.stemmer
202       else
203         :english
204       end
205     @stemmer = StemFactory.stemmer_for(@stemmer)
206   end
207 end
stemmer=(s) click to toggle source

Set the stemmer to use for this document. Accepts any string that the Xapian::Stem class accepts (Either the English name for the language or the two letter ISO639 code). Can also be an existing Xapian::Stem object.

    # File lib/xapian_fu/xapian_doc.rb
184 def stemmer=(s)
185   @stemmer = StemFactory.stemmer_for(s)
186 end
stopper() click to toggle source

Return the stopper for this document. If not set on initialize by the :stopper or :language option, it will try the database's stopper and otherwise default to an English stopper..

    # File lib/xapian_fu/xapian_doc.rb
212 def stopper
213   if @stopper
214     @stopper
215   else
216     @stopper =
217       if ! @options[:stopper].nil?
218         @options[:stopper]
219       elsif @options[:language]
220         @options[:language]
221       elsif db
222         db.stopper
223       else
224         :english
225       end
226     @stopper = StopperFactory.stopper_for(@stopper)
227   end
228 end
stopper_strategy() click to toggle source
    # File lib/xapian_fu/xapian_doc.rb
236 def stopper_strategy
237   if @stopper_strategy
238     @stopper_strategy
239   else
240     @stopper_strategy =
241       if ! @options[:stopper_strategy].nil?
242         @options[:stopper_strategy]
243       elsif db
244         db.stopper_strategy
245       else
246         :stemmed
247       end
248   end
249 end
terms() click to toggle source

Return a list of terms that the db has for this document.

    # File lib/xapian_fu/xapian_doc.rb
121 def terms
122   raise XapianFu::XapianDbNotSet unless db
123   db.ro.termlist(id) if db.respond_to?(:ro) and db.ro and id
124 end
to_xapian_document() click to toggle source

Return a Xapian::Document ready for putting into a Xapian database. Requires that the db attribute has been set up.

    # File lib/xapian_fu/xapian_doc.rb
128 def to_xapian_document
129   raise XapianFu::XapianDbNotSet unless db
130   xapian_document.data = data
131   # Clear and add values
132   xapian_document.clear_values
133   add_values_to_xapian_document
134   # Clear and add terms
135   xapian_document.clear_terms
136   generate_terms
137   xapian_document
138 end
update() click to toggle source

Update this document in the Xapian Database

    # File lib/xapian_fu/xapian_doc.rb
176 def update
177   db.rw.replace_document(id, to_xapian_document)
178 end
values() click to toggle source

The XapianFu::XapianDocValueAccessor for accessing the values in this document.

    # File lib/xapian_fu/xapian_doc.rb
116 def values
117   @value_accessor ||= XapianDocValueAccessor.new(self)
118 end
xapian_document() click to toggle source

The Xapian::Document for this XapianFu::Document. If this document was retrieved from a XapianDb then this will have been initialized by Xapian, otherwise a new Xapian::Document.new is allocated.

    # File lib/xapian_fu/xapian_doc.rb
144 def xapian_document
145   @xapian_document ||= Xapian::Document.new
146 end

Private Instance Methods

add_values_to_xapian_document() click to toggle source

Add all the fields to be stored as XapianDb values

    # File lib/xapian_fu/xapian_doc.rb
286 def add_values_to_xapian_document
287   db.store_values.collect do |key|
288     values[key] = fields[key]
289     key
290   end
291 end
fields_with_field_names_only() click to toggle source

Array of field names to index with field names only

    # File lib/xapian_fu/xapian_doc.rb
281 def fields_with_field_names_only
282   db ? db.fields_with_field_names_only : []
283 end
fields_without_field_names() click to toggle source

Array of field names not to index with field names

    # File lib/xapian_fu/xapian_doc.rb
276 def fields_without_field_names
277   db ? db.fields_without_field_names : []
278 end
generate_terms() click to toggle source

Run the Xapian term generator against this documents text

    # File lib/xapian_fu/xapian_doc.rb
294 def generate_terms
295   tg = Xapian::TermGenerator.new
296   tg.database = db.rw
297   tg.document = xapian_document
298   tg.stopper = stopper if stopper
299   tg.stemmer = stemmer
300   tg.set_stopper_strategy(XapianDoc::STOPPER_STRATEGIES.fetch(stopper_strategy, 2))
301   tg.set_flags Xapian::TermGenerator::FLAG_SPELLING if db.spelling
302   index_method = db.index_positions ? :index_text : :index_text_without_positions
303   fields.each do |k,o|
304     next if unindexed_fields.include?(k)
305 
306     if db.fields[k] == Array
307       values = Array(o)
308     else
309       values = [o]
310     end
311 
312     values.each do |v|
313       if v.respond_to?(:to_xapian_fu_string)
314         v = v.to_xapian_fu_string
315       else
316         v = v.to_s
317       end
318 
319       # get the custom term weight if a weights function exists
320       weight = db.weights_function ? db.weights_function.call(k, v, fields).to_i : db.field_weights[k]
321       # add value with field name
322       tg.send(index_method, v, weight, 'X' + k.to_s.upcase) unless fields_without_field_names.include?(k)
323       # add value without field name
324       tg.send(index_method, v, weight) unless fields_with_field_names_only.include?(k)
325 
326       if db.field_options[k] && db.field_options[k][:exact]
327         xapian_document.add_term("X#{k.to_s.upcase}#{v.to_s.downcase}", weight)
328       end
329     end
330   end
331 
332   db.boolean_fields.each do |name|
333     Array(fields[name]).each do |value|
334       xapian_document.add_boolean_term("X#{name.to_s.upcase}#{value.to_s.downcase}")
335     end
336   end
337 
338   xapian_document
339 end
unindexed_fields() click to toggle source

Array of field names not to run through the TermGenerator

    # File lib/xapian_fu/xapian_doc.rb
271 def unindexed_fields
272   db ? db.unindexed_fields : []
273 end