Traject settings

Traject settings are a flat list of key/value pairs – a single Hash, not nested. Keys are always strings, and dots (“.”) can be used for grouping and namespacing.

Values are usually strings, but occasionally something else. String values can be easily set via the command line.

Settings can be set in configuration files, usually like:

<s>~ruby settings do provide “key”, “value” end </s>~~

or on the command line: -s key=value. There are also some command line shortcuts for commonly used settings, see traject -h.

provide will only set the key if it was previously unset, so first time to set 'wins'. And command-line settings are applied first of all. It's recommended you use provide.

store is also available, and forces setting of the new value overriding any previous value set.

Known settings

Reading (general)

Error handling

You may instead want to skip the record and continue with indexing, or even conditionally decide which to do. In a custom handler, if you want to halt execution, you should re-raise the exception (or raise another). If you want to skip the record and continue, call context.skip! and do not raise.

The “stabby lambda” syntax is useful for providing a lambda object with proper parsing precedence to not need parentheses.

  error_count = Concurrent::AtomicFixnum.new(0)
  settings do
    provide "mapping_rescue", -> (context, exception) {
      error_count.increment
      context.logger.error "Encountered exception: #{exception}, total errors #{error_count}"
      if my_should_skip?(context, exception)
        context.skip!
      else
        raise exception
      end
    }
  end

At present mapping_rescue only handles exceptions in running mapping/indexing logic, unexpected raises in readers or writers may not be caught here.

Threads

NOTE: If your processing code isn't thread-safe, set to 0 or nil to disable thread pool and do all processing in main thread.

Choose a pool size based on size of your machine, and complexity of your indexing rules. You might want to try different sizes and measure which works best for you. Probably no reason for it ever to be more than number of cores on indexing machine.

Writing (general)

Writing to solr

Dealing with MARC data

Logging and progress