class XapianFu::XapianDb

The XapianFu::XapianDb encapsulates a Xapian database, handling setting up stemmers, stoppers, query parsers and such. This is the core of XapianFu.

Opening and creating the database

The :dir option specified where the xapian database is to be read from and written to. Without this, an in-memory Xapian database will be used. By default, the on-disk database will not be created if it doesn't already exist. See the :create option.

Setting the :create option to true will allow XapianDb to create a new Xapian database on-disk. If one already exists, it is just opened. The default is false.

Setting the :overwrite option to true will force XapianDb to wipe the current on-disk database and start afresh. The default is false.

Setting the :type option to either :glass or :chert will force that database backend, if supported. Leave as nil to auto-detect existing databases and create new databases with the library default (recommended). Requires xapian >=1.4

db = XapianDb.new(:dir => '/tmp/mydb', :create => true)

Language, Stemmers and Stoppers

The :language option specifies the default document language, and controls the default type of stemmer and stopper that will be used when indexing. The stemmer and stopper can be overridden with the :stemmer and stopper options.

The :language, :stemmer and :stopper options can be set to one of of the following: :danish, :dutch, :english, :finnish, :french, :german, :hungarian, :italian, :norwegian, :portuguese, :romanian, :russian, :spanish, :swedish, :turkish. Set it to false to specify none.

The default for all is :english.

db = XapianDb.new(:language => :italian, :stopper => false)

The :stopper_strategy option specifies the default stop strategy that will be used when indexing and can be: :none, :all or :stemmed. Defaults to :stemmed

Spelling suggestions

The :spelling option controls generation of a spelling dictionary during indexing and its use during searches. When enabled, Xapian will build a dictionary of words for the database whilst indexing documents and will enable spelling suggestion by default for searches. Building the dictionary will impact indexing performance and database size. It is enabled by default. See the search section for information on getting spelling correction information during searches.

Fields and values

The :store option specifies which document fields should be stored in the database. By default, fields are only indexed - the original values cannot be retrieved.

The :sortable option specifies which document fields will be available for sorting results on. This is really just does the same thing as :store and is just available to be explicit.

The :collapsible option specifies which document fields can be used to group (“collapse”) results. This also just does the same thing as :store and is just available to be explicit.

A more complete way of defining fields is available:

XapianDb.new(:fields => { :title => { :type => String },
                          :slug => { :type => String, :index => false },
                          :created_at => { :type => Time, :store => true },
                          :votes => { :type => Fixnum, :store => true },
                        })

XapianFu will use the :type option when instantiating a store value, so you'll get back a Time object rather than the result of Time's to_s method as is the default. Defining the type for numerical classes (such as Time, Fixnum and Bignum) allows XapianFu to to store them on-disk in a much more efficient way, and sort them efficiently (without having to resort to storing leading zeros or anything like that).

Indexing options

If :index is false, then the field will not be tokenized, or stemmed or stopped. It will only be searchable by its entire exact contents. Useful for fields that only exact matches will make sense for, like slugs, identifiers or keys.

If :index is true (the default) then the field will be tokenized, stemmed and stopped twice, once with the field name and once without. This allows you to do both search like “name:lily” and simply “lily”, but it does require that the full text of the field content is indexed twice and will increase the size of your index on-disk.

If you know you will never need to search the field using its field name, then you can set :index to :without_field_names and only one tokenization pass will be done, without the field names as token prefixes.

If you know you will only ever search the field using its field name, then you can set :index to :with_field_names_only and only one tokenization pass will be done, with only the fieldnames as token prefixes.

Term Weights

The :weights option accepts a Proc or Lambda that sets custom term weights.

Your function will receive the term key and value and the full list of fields, and should return an integer weight to be applied for that term when the document is indexed.

In this example,

XapianDb.new(:weights => Proc.new do |key, value, fields|
  return 10 if fields.keys.include?('culturally_important')
  return 3  if key == 'title'
  1
end)

terms in the title will be weighted three times greater than other terms, and all terms in 'culturally important' items will weighted 10 times more.

Attributes

boolean_fields[R]

An array of fields that will be treated as boolean terms

dir[R]

Path to the on-disk database. Nil if in-memory database

field_options[R]
field_weights[R]
fields[R]

An hash of field names and their types

fields_with_field_names_only[R]

An array of fields to be indexed only with their field names

fields_without_field_names[R]

An array of fields to be indexed without their field names

index_positions[R]

True if term positions will be stored

language[R]

The default document language. Used for setting up stoppers and stemmers.

sortable_fields[R]
spelling[R]

Whether this db will generate a spelling dictionary during indexing

stopper_strategy[RW]

The default stopper strategy

store_values[R]

An array of the fields that will be stored in the Xapian

unindexed_fields[R]

An array of fields that will not be indexed

weights_function[RW]

Public Class Methods

new( options = { } ) click to toggle source
    # File lib/xapian_fu/xapian_db.rb
182 def initialize( options = { } )
183   @options = { :index_positions => true, :spelling => true }.merge(options)
184   @dir = @options[:dir]
185   @index_positions = @options[:index_positions]
186   @db_flag = Xapian::DB_OPEN
187   @db_flag = Xapian::DB_CREATE_OR_OPEN if @options[:create]
188   @db_flag = Xapian::DB_CREATE_OR_OVERWRITE if @options[:overwrite]
189   case @options[:type]
190   when :glass
191     raise XapianFuError.new("type glass not recognised") unless defined?(Xapian::DB_BACKEND_GLASS)
192     @db_flag |= Xapian::DB_BACKEND_GLASS
193   when :chert
194     raise XapianFuError.new("type chert not recognised") unless defined?(Xapian::DB_BACKEND_CHERT)
195     @db_flag |= Xapian::DB_BACKEND_CHERT
196   when nil
197     # use library defaults
198   else
199     raise XapianFuError.new("type #{@options[:type].inspect} not recognised")
200   end
201   @tx_mutex = Mutex.new
202   @language = @options.fetch(:language, :english)
203   @stemmer = @options.fetch(:stemmer, @language)
204   @stopper = @options.fetch(:stopper, @language)
205   @stopper_strategy = @options.fetch(:stopper_strategy, :stemmed)
206   @field_options = {}
207   setup_fields(@options[:fields])
208   @store_values << @options[:store]
209   @store_values << @options[:sortable]
210   @store_values << @options[:collapsible]
211   @store_values = @store_values.flatten.uniq.compact
212   @spelling = @options[:spelling]
213   @weights_function = @options[:weights]
214 end

Public Instance Methods

<<(doc)
Alias for: add_doc
add_doc(doc) click to toggle source

Short-cut to documents.add

    # File lib/xapian_fu/xapian_db.rb
247 def add_doc(doc)
248   documents.add(doc)
249 end
Also aliased as: <<
add_synonym(term, synonym) click to toggle source

Add a synonym to the database.

If you want to search with synonym support, remember to add the option:

db.search("foo", :synonyms => true)

Note that in-memory databases don't support synonyms.

    # File lib/xapian_fu/xapian_db.rb
261 def add_synonym(term, synonym)
262   rw.add_synonym(term, synonym)
263 end
close() click to toggle source

Closes the database.

    # File lib/xapian_fu/xapian_db.rb
397 def close
398   raise ConcurrencyError if @tx_mutex.locked?
399 
400   @rw.close if @rw
401   @rw = nil
402 
403   @ro.close if @ro
404   @ro = nil
405 end
documents() click to toggle source

The XapianFu::XapianDocumentsAccessor for this database

    # File lib/xapian_fu/xapian_db.rb
242 def documents
243   @documents_accessor ||= XapianDocumentsAccessor.new(self)
244 end
flush() click to toggle source

Flush any changes to disk and reopen the read-only database. Raises ConcurrencyError if a transaction is in process

    # File lib/xapian_fu/xapian_db.rb
390 def flush
391   raise ConcurrencyError if @tx_mutex.locked?
392   rw.flush
393   ro.reopen
394 end
ro() click to toggle source

The read-only Xapian::Database

    # File lib/xapian_fu/xapian_db.rb
232 def ro
233   @ro ||= setup_ro_db
234 end
rw() click to toggle source

The writable Xapian::WritableDatabase

    # File lib/xapian_fu/xapian_db.rb
227 def rw
228   @rw ||= setup_rw_db
229 end
serialize_value(field, value, type = nil) click to toggle source
    # File lib/xapian_fu/xapian_db.rb
407 def serialize_value(field, value, type = nil)
408   if sortable_fields.include?(field)
409     Xapian.sortable_serialise(value)
410   else
411     (type || fields[field] || Object).to_xapian_fu_storage_value(value)
412   end
413 end
size() click to toggle source

The number of docs in the Xapian database

    # File lib/xapian_fu/xapian_db.rb
237 def size
238   ro.doccount
239 end
stemmer() click to toggle source

Return a new stemmer object for this database

    # File lib/xapian_fu/xapian_db.rb
217 def stemmer
218   StemFactory.stemmer_for(@stemmer)
219 end
stopper() click to toggle source

The stopper object for this database

    # File lib/xapian_fu/xapian_db.rb
222 def stopper
223   StopperFactory.stopper_for(@stopper)
224 end
transaction(flush_on_commit = true) { || ... } click to toggle source

Run the given block in a XapianDB transaction. Any changes to the Xapian database made in the block will be atomically committed at the end.

If an exception is raised by the block, all changes are discarded and the exception re-raised.

Xapian does not support multiple concurrent transactions on the same Xapian database. Any attempts at this will be serialized by XapianFu, which is not perfect but probably better than just kicking up an exception.

    # File lib/xapian_fu/xapian_db.rb
373 def transaction(flush_on_commit = true)
374   @tx_mutex.synchronize do
375     begin
376       rw.begin_transaction(flush_on_commit)
377       yield
378     rescue Exception => e
379       rw.cancel_transaction
380       ro.reopen
381       raise e
382     end
383     rw.commit_transaction
384     ro.reopen
385   end
386 end
unserialize_value(field, value, type = nil) click to toggle source
    # File lib/xapian_fu/xapian_db.rb
415 def unserialize_value(field, value, type = nil)
416   if sortable_fields.include?(field)
417     Xapian.sortable_unserialise(value)
418   else
419     (type || fields[field] || Object).from_xapian_fu_storage_value(value)
420   end
421 end

Private Instance Methods

boolean_filter_query(field, values) click to toggle source
    # File lib/xapian_fu/xapian_db.rb
542 def boolean_filter_query(field, values)
543   subqueries = values.map do |value|
544     Xapian::Query.new("X#{field.to_s.upcase}#{value.to_s.downcase}")
545   end
546 
547   Xapian::Query.new(Xapian::Query::OP_OR, subqueries)
548 end
filter_query(query, filter) click to toggle source
    # File lib/xapian_fu/xapian_db.rb
509 def filter_query(query, filter)
510   subqueries = filter.map do |field, values|
511     values = Array(values)
512 
513     if sortable_fields[field]
514       sortable_filter_query(field, values)
515     elsif boolean_fields.include?(field)
516       boolean_filter_query(field, values)
517     end
518   end
519 
520   combined_subqueries = Xapian::Query.new(Xapian::Query::OP_AND, subqueries)
521 
522   Xapian::Query.new(Xapian::Query::OP_FILTER, query, combined_subqueries)
523 end
setup_fields(field_options) click to toggle source

Setup the fields hash and stored_values list from the given options

    # File lib/xapian_fu/xapian_db.rb
463 def setup_fields(field_options)
464   @fields = { }
465   @unindexed_fields = []
466   @fields_without_field_names = []
467   @fields_with_field_names_only = []
468   @store_values = []
469   @sortable_fields = {}
470   @boolean_fields = []
471   @field_weights = Hash.new(1)
472   return nil if field_options.nil?
473   default_opts = {
474     :store => true,
475     :index => true,
476     :type => String
477   }
478   boolean_default_opts = default_opts.merge(
479     :store => false,
480     :index => false
481   )
482   # Convert array argument to hash, with String as default type
483   if field_options.is_a? Array
484     fohash = { }
485     field_options.each { |f| fohash[f] = { :type => String } }
486     field_options = fohash
487   end
488   field_options.each do |name,opts|
489     # Handle simple setup by type only
490     opts = { :type => opts } unless opts.is_a? Hash
491     if opts[:boolean]
492       opts = boolean_default_opts.merge(opts)
493     else
494       opts = default_opts.merge(opts)
495     end
496     @store_values << name if opts[:store]
497     @sortable_fields[name] = {:range_prefix => opts[:range_prefix], :range_postfix => opts[:range_postfix]} if opts[:sortable]
498     @unindexed_fields << name if opts[:index] == false
499     @fields_without_field_names << name if opts[:index] == :without_field_names
500     @fields_with_field_names_only << name if opts[:index] == :with_field_names_only
501     @boolean_fields << name if opts[:boolean]
502     @fields[name] = opts[:type]
503     @field_weights[name] = opts[:weight] if opts.include?(:weight)
504     @field_options[name] = opts
505   end
506   @fields
507 end
setup_ordering(enquiry, order = nil, reverse = true) click to toggle source

Setup ordering for the given Xapian::Enquire objects

    # File lib/xapian_fu/xapian_db.rb
449 def setup_ordering(enquiry, order = nil, reverse = true)
450   if order.to_s == "id"
451     # Sorting by a value that doesn't exist falls back to docid ordering
452     enquiry.sort_by_value!((1 << 32)-1, reverse)
453     enquiry.docid_order = reverse ? Xapian::Enquire::DESCENDING : Xapian::Enquire::ASCENDING
454   elsif order.is_a? String or order.is_a? Symbol
455     enquiry.sort_by_value!(XapianDocValueAccessor.value_key(order), reverse)
456   else
457     enquiry.sort_by_relevance!
458   end
459   enquiry
460 end
setup_ro_db() click to toggle source

Setup the read-only database

    # File lib/xapian_fu/xapian_db.rb
439 def setup_ro_db
440   if dir
441     @ro = Xapian::Database.new(dir)
442   else
443     # In memory db
444     @ro = rw
445   end
446 end
setup_rw_db() click to toggle source

Setup the writable database

    # File lib/xapian_fu/xapian_db.rb
426 def setup_rw_db
427   if dir
428     @rw = Xapian::WritableDatabase.new(dir, db_flag)
429     @rw.flush if @options[:create]
430     @rw
431   else
432     # In memory database
433     @spelling = false # inmemory doesn't support spelling
434     @rw = Xapian::inmemory_open
435   end
436 end
sortable_filter_query(field, values) click to toggle source
    # File lib/xapian_fu/xapian_db.rb
525 def sortable_filter_query(field, values)
526   subqueries = values.map do |value|
527     from, to = value.split("..")
528     slot = XapianDocValueAccessor.value_key(field)
529 
530     if from.empty?
531       Xapian::Query.new(Xapian::Query::OP_VALUE_LE, slot, Xapian.sortable_serialise(to.to_f))
532     elsif to.nil?
533       Xapian::Query.new(Xapian::Query::OP_VALUE_GE, slot, Xapian.sortable_serialise(from.to_f))
534     else
535       Xapian::Query.new(Xapian::Query::OP_VALUE_RANGE, slot, Xapian.sortable_serialise(from.to_f), Xapian.sortable_serialise(to.to_f))
536     end
537   end
538 
539   Xapian::Query.new(Xapian::Query::OP_OR, subqueries)
540 end