Table of Contents - traject-3.6.0 Documentation
Pages
-
batch_execution
- Hints for running traject as a batch job
- Ruby version setting
- for chruby
- chruby monster wrapper script
- !/usr/bin/env bash
- A wrapper for traject that uses chruby to make sure jruby
- is being used before calling traject, and then calls
- traject with bundle exec from within our traject project
- dir.
- Make sure /usr/local/bin is in PATH for chruby-exec,
- which it's not ordinarily in a cronjob.
- chruby needs SHELL set, which it won't be from a crontab
- Find the dir based on location of this wrapper script,
- then use that dir to cd to for the bundle exec to find
- the right Gemfile.
- do we need to use chruby to switch to jruby?
- for rbenv
- for rvm
- !/usr/bin/env bash
- load rvm ruby
- Bundler too?
- Exit codes
- Logs and Error Reporting
- separate error log
- Completely customizable logging with yell
- Bundler
-
extending
- Extending With Your Own Code
- Expert Summary
- Custom code local to your project
- config_file.rb
- Now that MyMacros is available, extend it into the indexer,
- and use it:
- And likewise, we can use our utility methods:
- at top of config_file.rb…
- Using gems in your traject project
- without bundler (straight rubygems):
- some_traject_config.rb
- with bundler:
-
indexing_rules
- Details on Traject Indexing: from custom logic to Macros
- How to_field works
- record argument
- accumulator argument
- context argument
- Gotcha: Use closures to make your code more efficient
- Back to macros
- This method is included in an Indexer, possibly as a module mix-in.
- then it would be called on the indexer, typically in a traject configuration file,
- when setting up an indexing rule:
- in a file literal_macro.rb
- Combining multiple macros, lambdas and blocks
- Manipulating
context.output_hash
directly - each_record
- More tips and gotchas about indexing steps
- other_commands
-
programmatic_use
- Programmatic/Embedded Use of Traject
- Initializing an indexer
- Configuring an indexer
- Configuring indexer subclasses
- Running the indexer
- process: probably not what you want
- map_record: just map a single record, handle transformed output yourself
- process_record: send a single record to instance writer
- process_with: an in between option for easier programmatic use
- Indexer performance, re-use, and concurrency
- Concurrency concerns
- An example
- Rails concerns, and disabling concurrency
- settings
- xml
- load_maps.rake
- lcc_top_level.yaml
- marc_genre_007.yaml
- marc_genre_leader.yaml
- marc_geographic.yaml
- marc_instruments.yaml
- marc_languages.yaml
Classes and Modules
- Traject
- Traject::ArrayWriter
- Traject::CSVWriter
- Traject::CommandLine
- Traject::DebugWriter
- Traject::DelimitedWriter
- Traject::ExperimentalNokogiriStreamingReader
- Traject::ExperimentalNokogiriStreamingReader::PathTracker
- Traject::Hashie
- Traject::Hashie::IndifferentAccessFix
- Traject::Indexer
- Traject::Indexer::AfterProcessingStep
- Traject::Indexer::ConfigLoadError
- Traject::Indexer::Context
- Traject::Indexer::EachRecordStep
- Traject::Indexer::MarcIndexer
- Traject::Indexer::NokogiriIndexer
- Traject::Indexer::Settings
- Traject::Indexer::Settings::DefaultsHash
- Traject::Indexer::ToFieldStep
- Traject::JsonWriter
- Traject::LineWriter
- Traject::Macros
- Traject::Macros::Basic
- Traject::Macros::Marc21
- Traject::Macros::Marc21Semantics
- Traject::Macros::MarcFormatClassifier
- Traject::Macros::MarcFormats
- Traject::Macros::NokogiriMacros
- Traject::Macros::Transformation
- Traject::MarcExtractor
- Traject::MarcExtractor::Spec
- Traject::MarcExtractor::SpecSet
- Traject::MarcReader
- Traject::MockReader
- Traject::NDJReader
- Traject::NokogiriReader
- Traject::NullWriter
- Traject::OaiPmhNokogiriReader
- Traject::QualifiedConstGet
- Traject::SolrJsonWriter
- Traject::SolrJsonWriter::BadHttpResponse
- Traject::SolrJsonWriter::MaxSkippedRecordsExceeded
- Traject::ThreadPool
- Traject::TranslationMap
- Traject::TranslationMap::Cache
- Traject::TranslationMap::NotFound
- Traject::Util
- Traject::YamlWriter
Methods
- ::apply_class_configure_block — Traject::Indexer
- ::apply_extraction_options — Traject::Macros::Marc21
- ::assemble_lcsh — Traject::Macros::Marc21Semantics
- ::backtrace_from_config — Traject::Util
- ::backtrace_lineno_for_config — Traject::Util
- ::cached — Traject::MarcExtractor
- ::concurrency_disabled? — Traject::ThreadPool
- ::configure — Traject::Indexer
- ::create_controlfield_spec — Traject::MarcExtractor::Spec
- ::create_datafield_spec — Traject::MarcExtractor::Spec
- ::default_processing_thread_pool — Traject::Indexer::Settings
- ::default_settings — Traject::Indexer
- ::default_settings — Traject::Indexer::MarcIndexer
- ::default_settings — Traject::Indexer::NokogiriIndexer
- ::default_settings= — Traject::Indexer
- ::disable_concurrency! — Traject::ThreadPool
- ::drain_queue — Traject::Util
- ::exception_to_log_message — Traject::Util
- ::extract_caller_location — Traject::Util
- ::extract_marc — Traject::Macros::Marc21
- ::extract_marc_from — Traject::Macros::Marc21
- ::filing_version — Traject::Macros::Marc21Semantics
- ::first! — Traject::Macros::Marc21
- ::get_sortable_author — Traject::Macros::Marc21Semantics
- ::get_sortable_title — Traject::Macros::Marc21Semantics
- ::hash_from_string — Traject::MarcExtractor::Spec
- ::io_name — Traject::Util
- ::is_jruby? — Traject::Util
- ::legacy_marc_mode! — Traject::Indexer
- ::new — Traject::ArrayWriter
- ::new — Traject::CommandLine
- ::new — Traject::CSVWriter
- ::new — Traject::DebugWriter
- ::new — Traject::DelimitedWriter
- ::new — Traject::ExperimentalNokogiriStreamingReader
- ::new — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- ::new — Traject::Indexer
- ::new — Traject::Indexer::ConfigLoadError
- ::new — Traject::Indexer::Context
- ::new — Traject::Indexer::Settings
- ::new — Traject::Indexer::EachRecordStep
- ::new — Traject::Indexer::ToFieldStep
- ::new — Traject::Indexer::AfterProcessingStep
- ::new — Traject::LineWriter
- ::new — Traject::Macros::MarcFormatClassifier
- ::new — Traject::MarcExtractor
- ::new — Traject::MarcExtractor::SpecSet
- ::new — Traject::MarcExtractor::Spec
- ::new — Traject::MarcReader
- ::new — Traject::MockReader
- ::new — Traject::NDJReader
- ::new — Traject::NokogiriReader
- ::new — Traject::NullWriter
- ::new — Traject::OaiPmhNokogiriReader
- ::new — Traject::SolrJsonWriter
- ::new — Traject::ThreadPool
- ::new — Traject::TranslationMap
- ::new — Traject::TranslationMap::Cache
- ::new — Traject::TranslationMap::NotFound
- ::oclcnum_extract — Traject::Macros::Marc21Semantics
- ::publication_date — Traject::Macros::Marc21Semantics
- ::read_properties — Traject::TranslationMap
- ::reset_cache! — Traject::TranslationMap
- ::trim_punctuation — Traject::Macros::Marc21
- #<< — Traject::Indexer
- #== — Traject::MarcExtractor::Spec
- #[] — Traject::TranslationMap
- #_lookup! — Traject::TranslationMap::Cache
- #_write — Traject::CSVWriter
- #_write — Traject::DelimitedWriter
- #_write — Traject::LineWriter
- #add — Traject::MarcExtractor::SpecSet
- #add_accumulator_to_context! — Traject::Indexer::ToFieldStep
- #add_output — Traject::Indexer::Context
- #after_processing — Traject::Indexer
- #append — Traject::Macros::Transformation
- #arg_check! — Traject::CommandLine
- #assemble_settings_hash — Traject::CommandLine
- #byte1= — Traject::MarcExtractor::Spec
- #byte2= — Traject::MarcExtractor::Spec
- #check_solr_update_url — Traject::SolrJsonWriter
- #check_uncompleted — Traject::Indexer
- #clear! — Traject::ArrayWriter
- #close — Traject::LineWriter
- #close — Traject::NullWriter
- #close — Traject::SolrJsonWriter
- #collect_exception — Traject::ThreadPool
- #collect_matching_lines — Traject::MarcExtractor
- #collect_subfields — Traject::MarcExtractor
- #command_commit! — Traject::CommandLine
- #command_marcout! — Traject::CommandLine
- #commit — Traject::SolrJsonWriter
- #complete — Traject::Indexer
- #completed? — Traject::Indexer
- #configure — Traject::Indexer
- #control_field? — Traject::MarcExtractor
- #create_logger — Traject::Indexer
- #create_slop! — Traject::CommandLine
- #current_node_doc — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #default — Traject::Macros::Transformation
- #default_mapping_rescue — Traject::Indexer
- #default_namespaces — Traject::ExperimentalNokogiriStreamingReader
- #default_namespaces — Traject::NokogiriReader
- #default_namespaces — Traject::Macros::NokogiriMacros
- #delete — Traject::SolrJsonWriter
- #delete_all! — Traject::SolrJsonWriter
- #delimiter= — Traject::DelimitedWriter
- #derive_solr_update_url_from_solr_url — Traject::SolrJsonWriter
- #determine_solr_update_url — Traject::SolrJsonWriter
- #each — Traject::ExperimentalNokogiriStreamingReader
- #each — Traject::MarcReader
- #each — Traject::MockReader
- #each — Traject::NDJReader
- #each — Traject::NokogiriReader
- #each — Traject::OaiPmhNokogiriReader
- #each_matching_line — Traject::MarcExtractor
- #each_record — Traject::Indexer
- #each_record_xpath — Traject::ExperimentalNokogiriStreamingReader
- #each_record_xpath — Traject::NokogiriReader
- #effective_tag — Traject::MarcExtractor::SpecSet
- #escape — Traject::CSVWriter
- #escape — Traject::DelimitedWriter
- #escaped_delimiter — Traject::DelimitedWriter
- #execute — Traject::CommandLine
- #execute — Traject::Indexer::EachRecordStep
- #execute — Traject::Indexer::ToFieldStep
- #execute — Traject::Indexer::AfterProcessingStep
- #extra_xpath_hooks — Traject::ExperimentalNokogiriStreamingReader
- #extra_xpath_hooks — Traject::NokogiriReader
- #extra_xpath_hooks — Traject::OaiPmhNokogiriReader
- #extract — Traject::MarcExtractor
- #extract_all_marc_values — Traject::Macros::Marc21
- #extract_marc — Traject::Macros::Marc21
- #extract_marc_filing_version — Traject::Macros::Marc21Semantics
- #extract_xpath — Traject::Macros::NokogiriMacros
- #fill_in_defaults! — Traject::Indexer::Settings
- #first_only — Traject::Macros::Transformation
- #fix_namespaces — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #floating? — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #flush — Traject::SolrJsonWriter
- #formats — Traject::Macros::MarcFormatClassifier
- #freeze — Traject::MarcExtractor
- #genre — Traject::Macros::MarcFormatClassifier
- #get_input_io — Traject::CommandLine
- #gsub — Traject::Macros::Transformation
- #handle_mapping_errors — Traject::Indexer
- #http_client — Traject::OaiPmhNokogiriReader
- #includes_subfield_code? — Traject::MarcExtractor::Spec
- #indicator1= — Traject::MarcExtractor::Spec
- #indicator2= — Traject::MarcExtractor::Spec
- #initialize_indexer! — Traject::CommandLine
- #inspect — Traject::Indexer::Settings
- #inspect — Traject::Indexer::EachRecordStep
- #inspect — Traject::Indexer::ToFieldStep
- #inspect — Traject::Indexer::AfterProcessingStep
- #interesting_tag? — Traject::MarcExtractor
- #interesting_tags — Traject::MarcExtractor
- #internal_delimiter= — Traject::DelimitedWriter
- #internal_reader — Traject::MarcReader
- #is_jruby? — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #joinable? — Traject::MarcExtractor::Spec
- #keys — Traject::Indexer::Settings
- #lambda= — Traject::Indexer::EachRecordStep
- #literal — Traject::Macros::Basic
- #load_config_file — Traject::Indexer
- #load_configuration_files! — Traject::CommandLine
- #load_ndjson — Traject::MockReader
- #log_skip — Traject::Indexer
- #logger — Traject::Indexer
- #logger — Traject::NDJReader
- #logger — Traject::OaiPmhNokogiriReader
- #logger — Traject::SolrJsonWriter
- #logger_format — Traject::Indexer
- #lookup — Traject::TranslationMap::Cache
- #manuscript_archive? — Traject::Macros::MarcFormatClassifier
- #map — Traject::TranslationMap
- #map_record — Traject::Indexer
- #map_to_context! — Traject::Indexer
- #marc_era_facet — Traject::Macros::Marc21Semantics
- #marc_formats — Traject::Macros::MarcFormats
- #marc_geo_facet — Traject::Macros::Marc21Semantics
- #marc_instrument_codes_normalized — Traject::Macros::Marc21Semantics
- #marc_instrumentation_humanized — Traject::Macros::Marc21Semantics
- #marc_languages — Traject::Macros::Marc21Semantics
- #marc_lcc_to_broad_category — Traject::Macros::Marc21Semantics
- #marc_lcsh_formatted — Traject::Macros::Marc21Semantics
- #marc_publication_date — Traject::Macros::Marc21Semantics
- #marc_series_facet — Traject::Macros::Marc21Semantics
- #marc_sortable_author — Traject::Macros::Marc21Semantics
- #marc_sortable_title — Traject::Macros::Marc21Semantics
- #match? — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #match_path? — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #matches_indicators? — Traject::MarcExtractor::Spec
- #maybe_in_thread_pool — Traject::ThreadPool
- #merge — Traject::TranslationMap
- #merge — Traject::Hashie::IndifferentAccessFix
- #microform? — Traject::Macros::MarcFormatClassifier
- #normalized_gmd — Traject::Macros::MarcFormatClassifier
- #oclcnum — Traject::Macros::Marc21Semantics
- #online? — Traject::Macros::MarcFormatClassifier
- #open_output_file — Traject::CSVWriter
- #open_output_file — Traject::LineWriter
- #output_values — Traject::DelimitedWriter
- #parse_path — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #pop — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #prepend — Traject::Macros::Transformation
- #print? — Traject::Macros::MarcFormatClassifier
- #proceeding? — Traject::Macros::MarcFormatClassifier
- #process — Traject::Indexer
- #process_record — Traject::Indexer
- #process_with — Traject::Indexer
- #provide — Traject::Indexer::Settings
- #push — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #put — Traject::ArrayWriter
- #put — Traject::LineWriter
- #put — Traject::NullWriter
- #put — Traject::SolrJsonWriter
- #qualified_const_get — Traject::QualifiedConstGet
- #raise_collected_exception! — Traject::ThreadPool
- #raw_output_values — Traject::DelimitedWriter
- #read_and_parse_response — Traject::OaiPmhNokogiriReader
- #reader! — Traject::Indexer
- #reader_class — Traject::Indexer
- #record_inspect — Traject::Indexer::Context
- #record_number — Traject::DebugWriter
- #reparent_node_to_root — Traject::NokogiriReader
- #reset_cache! — Traject::TranslationMap::Cache
- #resumption_url — Traject::OaiPmhNokogiriReader
- #reverse_merge — Traject::Indexer::Settings
- #reverse_merge! — Traject::Indexer::Settings
- #run_after_processing_steps — Traject::Indexer
- #run_extra_xpath_hooks — Traject::ExperimentalNokogiriStreamingReader::PathTracker
- #run_extra_xpath_hooks — Traject::NokogiriReader
- #send_batch — Traject::SolrJsonWriter
- #send_single — Traject::SolrJsonWriter
- #serialize — Traject::DebugWriter
- #serialize — Traject::DelimitedWriter
- #serialize — Traject::JsonWriter
- #serialize — Traject::LineWriter
- #serialize — Traject::NullWriter
- #serialize — Traject::YamlWriter
- #serialized_marc — Traject::Macros::Marc21
- #set_bytes — Traject::MarcExtractor::Spec
- #settings — Traject::Indexer
- #should_close_stream? — Traject::LineWriter
- #show_interest_in_tag — Traject::MarcExtractor
- #shutdown_and_wait — Traject::ThreadPool
- #skip! — Traject::Indexer::Context
- #skip? — Traject::Indexer::Context
- #skippable_exceptions — Traject::SolrJsonWriter
- #skipped_record_count — Traject::SolrJsonWriter
- #solr_update_url_with_query — Traject::SolrJsonWriter
- #source_record_id — Traject::Indexer::Context
- #source_record_id_proc — Traject::Indexer
- #source_record_id_proc — Traject::Indexer::MarcIndexer
- #source_record_id_proc — Traject::Indexer::NokogiriIndexer
- #specs_covering_field — Traject::MarcExtractor
- #specs_for_tag — Traject::MarcExtractor::SpecSet
- #specs_matching_field — Traject::MarcExtractor::SpecSet
- #split — Traject::Macros::Transformation
- #start_url — Traject::OaiPmhNokogiriReader
- #start_url_verb — Traject::OaiPmhNokogiriReader
- #strip — Traject::Macros::Transformation
- #tags — Traject::MarcExtractor::SpecSet
- #thesis? — Traject::Macros::MarcFormatClassifier
- #timeout — Traject::OaiPmhNokogiriReader
- #to_field — Traject::Indexer
- #to_field_step? — Traject::Indexer::EachRecordStep
- #to_field_step? — Traject::Indexer::ToFieldStep
- #to_field_step? — Traject::Indexer::AfterProcessingStep
- #to_hash — Traject::TranslationMap
- #transform — Traject::Macros::Transformation
- #translate_array — Traject::TranslationMap
- #translate_array! — Traject::TranslationMap
- #translation_map — Traject::Macros::Transformation
- #trim_punctuation — Traject::Macros::Marc21
- #unique — Traject::Macros::Transformation
- #validate! — Traject::Indexer::EachRecordStep
- #validate! — Traject::Indexer::ToFieldStep
- #validate_limited_xpath — Traject::ExperimentalNokogiriStreamingReader
- #validate_xpath — Traject::NokogiriReader
- #with_defaults — Traject::Indexer::Settings
- #write_header — Traject::DelimitedWriter
- #writer — Traject::Indexer
- #writer! — Traject::Indexer
- #writer_class — Traject::Indexer