class XapianFu::XapianDoc
A XapianDoc
represents a document in a XapianDb
. Searches return XapianDoc
objects and they are used internally when adding new documents to the database. You usually don't need to instantiate them yourself unless you're doing something a bit advanced.
Constants
- STOPPER_STRATEGIES
Attributes
An abitrary blob of data stored alongside the document in the Xapian database.
The XapianDb
object that this document was retrieved from, or should be stored in.
A hash of the fields given to this object on initialize
The unsigned integer “primary key” for this document in the Xapian database.
The Xapian::Match object for this document when returned as part of a search result.
The search score of this document when returned as part of a search result
Public Class Methods
Expects a Xapian::Document, a Hash-like object, or anything that with a to_s method. Anything else raises a XapianTypeError
. The :weight
option sets the search weight when setting up search results. The :data
option sets some additional data to be stored with the document in the database. The :xapian_db
option sets the XapianDb
to allow saves and term enumeration.
# File lib/xapian_fu/xapian_doc.rb 73 def initialize(doc, options = {}) 74 @options = options 75 76 @fields = {} 77 if doc.is_a? Xapian::Match 78 match = doc 79 doc = match.document 80 @match = match 81 @weight = @match.weight 82 end 83 84 # Handle initialisation from a Xapian::Document, which is 85 # usually a search result from a Xapian database 86 if doc.is_a?(Xapian::Document) 87 @xapian_document = doc 88 @id = doc.docid 89 # Handle initialisation from a hash-like object 90 elsif doc.respond_to?(:has_key?) and doc.respond_to?("[]") 91 @fields = doc 92 @id = doc[:id] if doc.has_key?(:id) 93 # Handle initialisation from an object with a to_xapian_fu_string method 94 elsif doc.respond_to?(:to_xapian_fu_string) 95 @fields = { :content => doc.to_xapian_fu_string } 96 # Handle initialisation from anything else that can be coerced 97 # into a string 98 elsif doc.respond_to? :to_s 99 @fields = { :content => doc.to_s } 100 else 101 raise XapianTypeError, "Can't handle indexing a '#{doc.class}' object" 102 end 103 @weight = options[:weight] if options[:weight] 104 @data = options[:data] if options[:data] 105 @db = options[:xapian_db] if options[:xapian_db] 106 end
Public Instance Methods
Compare IDs with another XapianDoc
# File lib/xapian_fu/xapian_doc.rb 149 def ==(b) 150 if b.is_a?(XapianDoc) 151 id == b.id && (db == b.db || db.dir == b.db.dir) 152 else 153 super(b) 154 end 155 end
Add this document to the Xapian Database
# File lib/xapian_fu/xapian_doc.rb 171 def create 172 self.id = db.rw.add_document(to_xapian_document) 173 end
# File lib/xapian_fu/xapian_doc.rb 157 def inspect 158 s = ["<#{self.class.to_s} id=#{id}"] 159 s << "weight=%.5f" % weight if weight 160 s << "db=#{db.nil? ? 'nil' : db}" 161 s.join(' ') + ">" 162 end
Return this document's language which is set on initialize, inherited from the database or defaults to :english
# File lib/xapian_fu/xapian_doc.rb 253 def language 254 if @language 255 @language 256 else 257 @language = 258 if ! @options[:language].nil? 259 @options[:language] 260 elsif db and db.language 261 db.language 262 else 263 :english 264 end 265 end 266 end
Add this document to the Xapian Database, or replace it if it already has an id.
# File lib/xapian_fu/xapian_doc.rb 166 def save 167 id ? update : create 168 end
Return the stemmer for this document. If not set on initialize by the :stemmer or :language option, it will try the database's stemmer and otherwise defaults to an English stemmer.
# File lib/xapian_fu/xapian_doc.rb 191 def stemmer 192 if @stemmer 193 @stemmer 194 else 195 @stemmer = 196 if ! @options[:stemmer].nil? 197 @options[:stemmer] 198 elsif @options[:language] 199 @options[:language] 200 elsif db 201 db.stemmer 202 else 203 :english 204 end 205 @stemmer = StemFactory.stemmer_for(@stemmer) 206 end 207 end
Set the stemmer to use for this document. Accepts any string that the Xapian::Stem class accepts (Either the English name for the language or the two letter ISO639 code). Can also be an existing Xapian::Stem object.
# File lib/xapian_fu/xapian_doc.rb 184 def stemmer=(s) 185 @stemmer = StemFactory.stemmer_for(s) 186 end
Return the stopper for this document. If not set on initialize by the :stopper or :language option, it will try the database's stopper and otherwise default to an English stopper..
# File lib/xapian_fu/xapian_doc.rb 212 def stopper 213 if @stopper 214 @stopper 215 else 216 @stopper = 217 if ! @options[:stopper].nil? 218 @options[:stopper] 219 elsif @options[:language] 220 @options[:language] 221 elsif db 222 db.stopper 223 else 224 :english 225 end 226 @stopper = StopperFactory.stopper_for(@stopper) 227 end 228 end
# File lib/xapian_fu/xapian_doc.rb 236 def stopper_strategy 237 if @stopper_strategy 238 @stopper_strategy 239 else 240 @stopper_strategy = 241 if ! @options[:stopper_strategy].nil? 242 @options[:stopper_strategy] 243 elsif db 244 db.stopper_strategy 245 else 246 :stemmed 247 end 248 end 249 end
Return a list of terms that the db has for this document.
# File lib/xapian_fu/xapian_doc.rb 121 def terms 122 raise XapianFu::XapianDbNotSet unless db 123 db.ro.termlist(id) if db.respond_to?(:ro) and db.ro and id 124 end
Return a Xapian::Document ready for putting into a Xapian database. Requires that the db attribute has been set up.
# File lib/xapian_fu/xapian_doc.rb 128 def to_xapian_document 129 raise XapianFu::XapianDbNotSet unless db 130 xapian_document.data = data 131 # Clear and add values 132 xapian_document.clear_values 133 add_values_to_xapian_document 134 # Clear and add terms 135 xapian_document.clear_terms 136 generate_terms 137 xapian_document 138 end
Update this document in the Xapian Database
# File lib/xapian_fu/xapian_doc.rb 176 def update 177 db.rw.replace_document(id, to_xapian_document) 178 end
The XapianFu::XapianDocValueAccessor
for accessing the values in this document.
# File lib/xapian_fu/xapian_doc.rb 116 def values 117 @value_accessor ||= XapianDocValueAccessor.new(self) 118 end
The Xapian::Document for this XapianFu::Document. If this document was retrieved from a XapianDb
then this will have been initialized by Xapian, otherwise a new Xapian::Document.new is allocated.
# File lib/xapian_fu/xapian_doc.rb 144 def xapian_document 145 @xapian_document ||= Xapian::Document.new 146 end
Private Instance Methods
Add all the fields to be stored as XapianDb
values
# File lib/xapian_fu/xapian_doc.rb 286 def add_values_to_xapian_document 287 db.store_values.collect do |key| 288 values[key] = fields[key] 289 key 290 end 291 end
Array
of field names to index with field names only
# File lib/xapian_fu/xapian_doc.rb 281 def fields_with_field_names_only 282 db ? db.fields_with_field_names_only : [] 283 end
Array
of field names not to index with field names
# File lib/xapian_fu/xapian_doc.rb 276 def fields_without_field_names 277 db ? db.fields_without_field_names : [] 278 end
Run the Xapian term generator against this documents text
# File lib/xapian_fu/xapian_doc.rb 294 def generate_terms 295 tg = Xapian::TermGenerator.new 296 tg.database = db.rw 297 tg.document = xapian_document 298 tg.stopper = stopper if stopper 299 tg.stemmer = stemmer 300 tg.set_stopper_strategy(XapianDoc::STOPPER_STRATEGIES.fetch(stopper_strategy, 2)) 301 tg.set_flags Xapian::TermGenerator::FLAG_SPELLING if db.spelling 302 index_method = db.index_positions ? :index_text : :index_text_without_positions 303 fields.each do |k,o| 304 next if unindexed_fields.include?(k) 305 306 if db.fields[k] == Array 307 values = Array(o) 308 else 309 values = [o] 310 end 311 312 values.each do |v| 313 if v.respond_to?(:to_xapian_fu_string) 314 v = v.to_xapian_fu_string 315 else 316 v = v.to_s 317 end 318 319 # get the custom term weight if a weights function exists 320 weight = db.weights_function ? db.weights_function.call(k, v, fields).to_i : db.field_weights[k] 321 # add value with field name 322 tg.send(index_method, v, weight, 'X' + k.to_s.upcase) unless fields_without_field_names.include?(k) 323 # add value without field name 324 tg.send(index_method, v, weight) unless fields_with_field_names_only.include?(k) 325 326 if db.field_options[k] && db.field_options[k][:exact] 327 xapian_document.add_term("X#{k.to_s.upcase}#{v.to_s.downcase}", weight) 328 end 329 end 330 end 331 332 db.boolean_fields.each do |name| 333 Array(fields[name]).each do |value| 334 xapian_document.add_boolean_term("X#{name.to_s.upcase}#{value.to_s.downcase}") 335 end 336 end 337 338 xapian_document 339 end
Array
of field names not to run through the TermGenerator
# File lib/xapian_fu/xapian_doc.rb 271 def unindexed_fields 272 db ? db.unindexed_fields : [] 273 end