class XapianFu::XapianDb
The XapianFu::XapianDb
encapsulates a Xapian database, handling setting up stemmers, stoppers, query parsers and such. This is the core of XapianFu
.
Opening and creating the database¶ ↑
The :dir
option specified where the xapian database is to be read from and written to. Without this, an in-memory Xapian database will be used. By default, the on-disk database will not be created if it doesn't already exist. See the :create
option.
Setting the :create
option to true
will allow XapianDb
to create a new Xapian database on-disk. If one already exists, it is just opened. The default is false
.
Setting the :overwrite
option to true
will force XapianDb
to wipe the current on-disk database and start afresh. The default is false
.
Setting the :type
option to either :glass or :chert will force that database backend, if supported. Leave as nil to auto-detect existing databases and create new databases with the library default (recommended). Requires xapian >=1.4
db = XapianDb.new(:dir => '/tmp/mydb', :create => true)
Language, Stemmers and Stoppers¶ ↑
The :language
option specifies the default document language, and controls the default type of stemmer and stopper that will be used when indexing. The stemmer and stopper can be overridden with the :stemmer
and stopper
options.
The :language, :stemmer and :stopper
options can be set to one of of the following: :danish, :dutch, :english, :finnish, :french, :german, :hungarian, :italian, :norwegian, :portuguese, :romanian, :russian, :spanish, :swedish, :turkish
. Set it to false
to specify none.
The default for all is :english
.
db = XapianDb.new(:language => :italian, :stopper => false)
The :stopper_strategy
option specifies the default stop strategy that will be used when indexing and can be: :none
, :all
or :stemmed
. Defaults to :stemmed
Spelling suggestions¶ ↑
The :spelling
option controls generation of a spelling dictionary during indexing and its use during searches. When enabled, Xapian will build a dictionary of words for the database whilst indexing documents and will enable spelling suggestion by default for searches. Building the dictionary will impact indexing performance and database size. It is enabled by default. See the search section for information on getting spelling correction information during searches.
Fields and values¶ ↑
The :store
option specifies which document fields should be stored in the database. By default, fields are only indexed - the original values cannot be retrieved.
The :sortable
option specifies which document fields will be available for sorting results on. This is really just does the same thing as :store
and is just available to be explicit.
The :collapsible
option specifies which document fields can be used to group (“collapse”) results. This also just does the same thing as :store
and is just available to be explicit.
A more complete way of defining fields is available:
XapianDb.new(:fields => { :title => { :type => String }, :slug => { :type => String, :index => false }, :created_at => { :type => Time, :store => true }, :votes => { :type => Fixnum, :store => true }, })
XapianFu
will use the :type option when instantiating a store value, so you'll get back a Time
object rather than the result of Time's to_s method as is the default. Defining the type for numerical classes (such as Time
, Fixnum
and Bignum
) allows XapianFu
to to store them on-disk in a much more efficient way, and sort them efficiently (without having to resort to storing leading zeros or anything like that).
Indexing options¶ ↑
If :index
is false
, then the field will not be tokenized, or stemmed or stopped. It will only be searchable by its entire exact contents. Useful for fields that only exact matches will make sense for, like slugs, identifiers or keys.
If :index
is true
(the default) then the field will be tokenized, stemmed and stopped twice, once with the field name and once without. This allows you to do both search like “name:lily” and simply “lily”, but it does require that the full text of the field content is indexed twice and will increase the size of your index on-disk.
If you know you will never need to search the field using its field name, then you can set :index
to :without_field_names
and only one tokenization pass will be done, without the field names as token prefixes.
If you know you will only ever search the field using its field name, then you can set :index
to :with_field_names_only
and only one tokenization pass will be done, with only the fieldnames as token prefixes.
Term Weights¶ ↑
The :weights
option accepts a Proc or Lambda that sets custom term weights.
Your function will receive the term key and value and the full list of fields, and should return an integer weight to be applied for that term when the document is indexed.
In this example,
XapianDb.new(:weights => Proc.new do |key, value, fields| return 10 if fields.keys.include?('culturally_important') return 3 if key == 'title' 1 end)
terms in the title will be weighted three times greater than other terms, and all terms in 'culturally important' items will weighted 10 times more.
Attributes
An array of fields that will be treated as boolean terms
Path to the on-disk database. Nil if in-memory database
An hash of field names and their types
An array of fields to be indexed only with their field names
An array of fields to be indexed without their field names
True if term positions will be stored
The default document language. Used for setting up stoppers and stemmers.
Whether this db will generate a spelling dictionary during indexing
The default stopper strategy
An array of the fields that will be stored in the Xapian
An array of fields that will not be indexed
Public Class Methods
# File lib/xapian_fu/xapian_db.rb 182 def initialize( options = { } ) 183 @options = { :index_positions => true, :spelling => true }.merge(options) 184 @dir = @options[:dir] 185 @index_positions = @options[:index_positions] 186 @db_flag = Xapian::DB_OPEN 187 @db_flag = Xapian::DB_CREATE_OR_OPEN if @options[:create] 188 @db_flag = Xapian::DB_CREATE_OR_OVERWRITE if @options[:overwrite] 189 case @options[:type] 190 when :glass 191 raise XapianFuError.new("type glass not recognised") unless defined?(Xapian::DB_BACKEND_GLASS) 192 @db_flag |= Xapian::DB_BACKEND_GLASS 193 when :chert 194 raise XapianFuError.new("type chert not recognised") unless defined?(Xapian::DB_BACKEND_CHERT) 195 @db_flag |= Xapian::DB_BACKEND_CHERT 196 when nil 197 # use library defaults 198 else 199 raise XapianFuError.new("type #{@options[:type].inspect} not recognised") 200 end 201 @tx_mutex = Mutex.new 202 @language = @options.fetch(:language, :english) 203 @stemmer = @options.fetch(:stemmer, @language) 204 @stopper = @options.fetch(:stopper, @language) 205 @stopper_strategy = @options.fetch(:stopper_strategy, :stemmed) 206 @field_options = {} 207 setup_fields(@options[:fields]) 208 @store_values << @options[:store] 209 @store_values << @options[:sortable] 210 @store_values << @options[:collapsible] 211 @store_values = @store_values.flatten.uniq.compact 212 @spelling = @options[:spelling] 213 @weights_function = @options[:weights] 214 end
Public Instance Methods
Short-cut to documents.add
# File lib/xapian_fu/xapian_db.rb 247 def add_doc(doc) 248 documents.add(doc) 249 end
Add a synonym to the database.
If you want to search with synonym support, remember to add the option:
db.search("foo", :synonyms => true)
Note that in-memory databases don't support synonyms.
# File lib/xapian_fu/xapian_db.rb 261 def add_synonym(term, synonym) 262 rw.add_synonym(term, synonym) 263 end
Closes the database.
# File lib/xapian_fu/xapian_db.rb 397 def close 398 raise ConcurrencyError if @tx_mutex.locked? 399 400 @rw.close if @rw 401 @rw = nil 402 403 @ro.close if @ro 404 @ro = nil 405 end
The XapianFu::XapianDocumentsAccessor
for this database
# File lib/xapian_fu/xapian_db.rb 242 def documents 243 @documents_accessor ||= XapianDocumentsAccessor.new(self) 244 end
Flush any changes to disk and reopen the read-only database. Raises ConcurrencyError
if a transaction is in process
# File lib/xapian_fu/xapian_db.rb 390 def flush 391 raise ConcurrencyError if @tx_mutex.locked? 392 rw.flush 393 ro.reopen 394 end
The read-only Xapian::Database
# File lib/xapian_fu/xapian_db.rb 232 def ro 233 @ro ||= setup_ro_db 234 end
The writable Xapian::WritableDatabase
# File lib/xapian_fu/xapian_db.rb 227 def rw 228 @rw ||= setup_rw_db 229 end
Conduct a search on the Xapian database, returning an array of XapianFu::XapianDoc
objects for the matches wrapped in a XapianFu::ResultSet
.
The :limit
option sets how many results to return. For compatability with the will_paginate
plugin, the :per_page
option does the same thing (though overrides :limit
). Defaults to 10.
The :page
option sets which page of results to return. Defaults to 1.
The :order
option specifies the stored field to order the results by (instead of the default search result weight).
The :reverse
option reverses the order of the results, so lowest search weight first (or lowest stored field value first).
The :collapse
option specifies which stored field value to collapse (group) the results on. Works a bit like the SQL GROUP BY
behaviour
The :spelling
option controls whether spelling suggestions will be made for queries. It defaults to whatever the database spelling setting is (true by default). When enabled, spelling suggestions are available using the XapianFu::ResultSet
corrected_query
method.
The :check_at_least
option controls how many documents will be sampled. This allows for accurate page and facet counts. Specifying the special value of :all
will make Xapian sample every document in the database. Be aware that this can hurt your query performance.
The :query_builder
option allows you to pass a proc that will return the final query to be run. The proc receives the parsed query as its only argument.
The first parameter can also be :all
or :nothing
, to match all documents or no documents respectively.
For additional options on how the query is parsed, see XapianFu::QueryParser
# File lib/xapian_fu/xapian_db.rb 311 def search(q, options = {}) 312 defaults = { :page => 1, :reverse => false, 313 :boolean => true, :boolean_anycase => true, :wildcards => true, 314 :lovehate => true, :spelling => spelling, :pure_not => false } 315 options = defaults.merge(options) 316 page = options[:page].to_i rescue 1 317 page = page > 1 ? page - 1 : 0 318 per_page = options[:per_page] || options[:limit] || 10 319 per_page = per_page.to_i rescue 10 320 offset = page * per_page 321 322 check_at_least = options.include?(:check_at_least) ? options[:check_at_least] : 0 323 check_at_least = self.size if check_at_least == :all 324 325 qp = XapianFu::QueryParser.new({ :database => self }.merge(options)) 326 query = qp.parse_query(q.is_a?(Symbol) ? q : q.to_s) 327 328 if options.include?(:query_builder) 329 query = options[:query_builder].call(query) 330 end 331 332 query = filter_query(query, options[:filter]) if options[:filter] 333 334 enquiry = Xapian::Enquire.new(ro) 335 setup_ordering(enquiry, options[:order], options[:reverse]) 336 if options[:collapse] 337 enquiry.collapse_key = XapianDocValueAccessor.value_key(options[:collapse]) 338 end 339 if options[:facets] 340 spies = options[:facets].inject({}) do |accum, name| 341 accum[name] = spy = Xapian::ValueCountMatchSpy.new(XapianDocValueAccessor.value_key(name)) 342 enquiry.add_matchspy(spy) 343 accum 344 end 345 end 346 347 if options.include?(:posting_source) 348 query = Xapian::Query.new(Xapian::Query::OP_AND_MAYBE, query, Xapian::Query.new(options[:posting_source])) 349 end 350 351 enquiry.query = query 352 353 ResultSet.new(:mset => enquiry.mset(offset, per_page, check_at_least), 354 :current_page => page + 1, 355 :per_page => per_page, 356 :corrected_query => qp.corrected_query, 357 :spies => spies, 358 :xapian_db => self 359 ) 360 end
# File lib/xapian_fu/xapian_db.rb 407 def serialize_value(field, value, type = nil) 408 if sortable_fields.include?(field) 409 Xapian.sortable_serialise(value) 410 else 411 (type || fields[field] || Object).to_xapian_fu_storage_value(value) 412 end 413 end
The number of docs in the Xapian database
# File lib/xapian_fu/xapian_db.rb 237 def size 238 ro.doccount 239 end
Return a new stemmer object for this database
# File lib/xapian_fu/xapian_db.rb 217 def stemmer 218 StemFactory.stemmer_for(@stemmer) 219 end
The stopper object for this database
# File lib/xapian_fu/xapian_db.rb 222 def stopper 223 StopperFactory.stopper_for(@stopper) 224 end
Run the given block in a XapianDB transaction. Any changes to the Xapian database made in the block will be atomically committed at the end.
If an exception is raised by the block, all changes are discarded and the exception re-raised.
Xapian does not support multiple concurrent transactions on the same Xapian database. Any attempts at this will be serialized by XapianFu
, which is not perfect but probably better than just kicking up an exception.
# File lib/xapian_fu/xapian_db.rb 373 def transaction(flush_on_commit = true) 374 @tx_mutex.synchronize do 375 begin 376 rw.begin_transaction(flush_on_commit) 377 yield 378 rescue Exception => e 379 rw.cancel_transaction 380 ro.reopen 381 raise e 382 end 383 rw.commit_transaction 384 ro.reopen 385 end 386 end
# File lib/xapian_fu/xapian_db.rb 415 def unserialize_value(field, value, type = nil) 416 if sortable_fields.include?(field) 417 Xapian.sortable_unserialise(value) 418 else 419 (type || fields[field] || Object).from_xapian_fu_storage_value(value) 420 end 421 end
Private Instance Methods
# File lib/xapian_fu/xapian_db.rb 542 def boolean_filter_query(field, values) 543 subqueries = values.map do |value| 544 Xapian::Query.new("X#{field.to_s.upcase}#{value.to_s.downcase}") 545 end 546 547 Xapian::Query.new(Xapian::Query::OP_OR, subqueries) 548 end
# File lib/xapian_fu/xapian_db.rb 509 def filter_query(query, filter) 510 subqueries = filter.map do |field, values| 511 values = Array(values) 512 513 if sortable_fields[field] 514 sortable_filter_query(field, values) 515 elsif boolean_fields.include?(field) 516 boolean_filter_query(field, values) 517 end 518 end 519 520 combined_subqueries = Xapian::Query.new(Xapian::Query::OP_AND, subqueries) 521 522 Xapian::Query.new(Xapian::Query::OP_FILTER, query, combined_subqueries) 523 end
Setup the fields hash and stored_values list from the given options
# File lib/xapian_fu/xapian_db.rb 463 def setup_fields(field_options) 464 @fields = { } 465 @unindexed_fields = [] 466 @fields_without_field_names = [] 467 @fields_with_field_names_only = [] 468 @store_values = [] 469 @sortable_fields = {} 470 @boolean_fields = [] 471 @field_weights = Hash.new(1) 472 return nil if field_options.nil? 473 default_opts = { 474 :store => true, 475 :index => true, 476 :type => String 477 } 478 boolean_default_opts = default_opts.merge( 479 :store => false, 480 :index => false 481 ) 482 # Convert array argument to hash, with String as default type 483 if field_options.is_a? Array 484 fohash = { } 485 field_options.each { |f| fohash[f] = { :type => String } } 486 field_options = fohash 487 end 488 field_options.each do |name,opts| 489 # Handle simple setup by type only 490 opts = { :type => opts } unless opts.is_a? Hash 491 if opts[:boolean] 492 opts = boolean_default_opts.merge(opts) 493 else 494 opts = default_opts.merge(opts) 495 end 496 @store_values << name if opts[:store] 497 @sortable_fields[name] = {:range_prefix => opts[:range_prefix], :range_postfix => opts[:range_postfix]} if opts[:sortable] 498 @unindexed_fields << name if opts[:index] == false 499 @fields_without_field_names << name if opts[:index] == :without_field_names 500 @fields_with_field_names_only << name if opts[:index] == :with_field_names_only 501 @boolean_fields << name if opts[:boolean] 502 @fields[name] = opts[:type] 503 @field_weights[name] = opts[:weight] if opts.include?(:weight) 504 @field_options[name] = opts 505 end 506 @fields 507 end
Setup ordering for the given Xapian::Enquire objects
# File lib/xapian_fu/xapian_db.rb 449 def setup_ordering(enquiry, order = nil, reverse = true) 450 if order.to_s == "id" 451 # Sorting by a value that doesn't exist falls back to docid ordering 452 enquiry.sort_by_value!((1 << 32)-1, reverse) 453 enquiry.docid_order = reverse ? Xapian::Enquire::DESCENDING : Xapian::Enquire::ASCENDING 454 elsif order.is_a? String or order.is_a? Symbol 455 enquiry.sort_by_value!(XapianDocValueAccessor.value_key(order), reverse) 456 else 457 enquiry.sort_by_relevance! 458 end 459 enquiry 460 end
Setup the read-only database
# File lib/xapian_fu/xapian_db.rb 439 def setup_ro_db 440 if dir 441 @ro = Xapian::Database.new(dir) 442 else 443 # In memory db 444 @ro = rw 445 end 446 end
Setup the writable database
# File lib/xapian_fu/xapian_db.rb 426 def setup_rw_db 427 if dir 428 @rw = Xapian::WritableDatabase.new(dir, db_flag) 429 @rw.flush if @options[:create] 430 @rw 431 else 432 # In memory database 433 @spelling = false # inmemory doesn't support spelling 434 @rw = Xapian::inmemory_open 435 end 436 end
# File lib/xapian_fu/xapian_db.rb 525 def sortable_filter_query(field, values) 526 subqueries = values.map do |value| 527 from, to = value.split("..") 528 slot = XapianDocValueAccessor.value_key(field) 529 530 if from.empty? 531 Xapian::Query.new(Xapian::Query::OP_VALUE_LE, slot, Xapian.sortable_serialise(to.to_f)) 532 elsif to.nil? 533 Xapian::Query.new(Xapian::Query::OP_VALUE_GE, slot, Xapian.sortable_serialise(from.to_f)) 534 else 535 Xapian::Query.new(Xapian::Query::OP_VALUE_RANGE, slot, Xapian.sortable_serialise(from.to_f), Xapian.sortable_serialise(to.to_f)) 536 end 537 end 538 539 Xapian::Query.new(Xapian::Query::OP_OR, subqueries) 540 end