class HTML::Pipeline
GitHub HTML
processing filters and utilities. This module includes a small framework for defining DOM based content filters and applying them to user provided content.
See HTML::Pipeline::Filter
for information on building filters.
Construct a Pipeline
for running multiple HTML
filters. A pipeline is created once with one to many filters, and it then can be `call`ed many times over the course of its lifetime with input.
filters - Array of Filter
objects. Each must respond to call(doc,
context) and return the modified DocumentFragment or a String containing HTML markup. Filters are performed in the order provided.
default_context - The default context hash. Values specified here will be merged
into values from the each individual pipeline run. Can NOT be nil. Default: empty Hash.
result_class - The default Class of the result object for individual
calls. Default: Hash. Protip: Pass in a Struct to get some semblance of type safety.
Constants
- DocumentFragment
Our DOM implementation.
- VERSION
Attributes
Public: Default instrumentation service for new pipeline objects.
Public: String name for this Pipeline
. Defaults to Class name.
Public: Instrumentation service for the pipeline. Set an ActiveSupport::Notifications compatible object to enable.
Public Class Methods
# File lib/html/pipeline_plus.rb, line 87 def initialize(filters, default_context = {}, result_class = nil) raise ArgumentError, 'default_context cannot be nil' if default_context.nil? @filters = filters.flatten.freeze @default_context = default_context.freeze @result_class = result_class || Hash @instrumentation_service = self.class.default_instrumentation_service end
Parse a String into a DocumentFragment
object. When a DocumentFragment
is provided, return it verbatim.
# File lib/html/pipeline_plus.rb, line 59 def self.parse(document_or_html) document_or_html ||= '' if document_or_html.is_a?(String) DocumentFragment.parse(document_or_html) else document_or_html end end
# File lib/html/pipeline_plus.rb, line 47 def self.require_dependency(name, requirer) require name rescue LoadError => e raise MissingDependencyError, "Missing dependency '#{name}' for #{requirer}. See README.md for details.\n#{e.class.name}: #{e}" end
Public Instance Methods
Apply all filters in the pipeline to the given HTML
.
html - A String containing HTML
or a DocumentFragment
object. context - The context hash passed to each filter. See the Filter
docs
for more info on possible values. This object MUST NOT be modified in place by filters. Use the Result for passing state back.
result - The result Hash passed to each filter for modification. This
is where Filters store extracted information from the content.
Returns the result Hash after being filtered by this Pipeline
. Contains an :output key with the DocumentFragment
or String HTML
markup based on the output of the last filter in the pipeline.
# File lib/html/pipeline_plus.rb, line 107 def call(html, context = {}, result = nil) context = @default_context.merge(context) context = context.freeze result ||= @result_class.new payload = default_payload filters: @filters.map(&:name), context: context, result: result instrument 'call_pipeline.html_pipeline', payload do result[:output] = @filters.inject(html) do |doc, filter| perform_filter(filter, doc, context, result) end end result end
Internal: Default payload for instrumentation.
Accepts a Hash of additional payload data to be merged.
Returns a Hash.
# File lib/html/pipeline_plus.rb, line 180 def default_payload(payload = {}) { pipeline: instrumentation_name }.merge(payload) end
Internal: if the `instrumentation_service` object is set, instruments the block, otherwise the block is ran without instrumentation.
Returns the result of the provided block.
# File lib/html/pipeline_plus.rb, line 167 def instrument(event, payload = nil) payload ||= default_payload return yield(payload) unless instrumentation_service instrumentation_service.instrument event, payload do |payload| yield payload end end
# File lib/html/pipeline_plus.rb, line 77 def instrumentation_name return @instrumentation_name if defined?(@instrumentation_name) @instrumentation_name = self.class.name end
Internal: Applies a specific filter to the supplied doc.
The filter is instrumented.
Returns the result of the filter.
# File lib/html/pipeline_plus.rb, line 127 def perform_filter(filter, doc, context, result) payload = default_payload filter: filter.name, context: context, result: result instrument 'call_filter.html_pipeline', payload do filter.call(doc, context, result) end end
Public: setup instrumentation for this pipeline.
Returns nothing.
# File lib/html/pipeline_plus.rb, line 157 def setup_instrumentation(name = nil, service = nil) self.instrumentation_name = name self.instrumentation_service = service || self.class.default_instrumentation_service end
Like call but guarantee the value returned is a DocumentFragment
. Pipelines may return a DocumentFragment
or a String. Callers that need a DocumentFragment
should use this method.
# File lib/html/pipeline_plus.rb, line 138 def to_document(input, context = {}, result = nil) result = call(input, context, result) HTML::Pipeline.parse(result[:output]) end
Like call but guarantee the value returned is a string of HTML
markup.
# File lib/html/pipeline_plus.rb, line 144 def to_html(input, context = {}, result = nil) result = call(input, context, result = nil) output = result[:output] if output.respond_to?(:to_html) output.to_html else output.to_s end end