class Paru::Pandoc

Pandoc is a wrapper around the pandoc document converter. See <pandoc.org/README.html> for details about pandoc. The Pandoc class is basically a straightforward translation from the pandoc command line program to Ruby. It is a Rubyesque API to work with pandoc.

For information about writing pandoc filters in Ruby see {Filter}.

Creating a Paru pandoc converter in Ruby is quite straightforward: you create a new Paru::Pandoc object with a block that configures that Pandoc object with pandoc options. Each command-line option to pandoc is a method on the Pandoc object. Command-line options with dashes in them, such as “–reference-docx”, can be called by replacing the dash with an underscore. So, “–reference-docx” becomes the method reference_docx.

Pandoc command-line flags, such as “–parse-raw”, “–chapters”, or “–toc”, have been translated to Paru::Pandoc methods that take an optional Boolean parameter; true is the default value. Therefore, if you want to enable a flag, no parameter is needed.

All other pandoc command-line options are translated to Paru::Pandoc methods that take either one String or Number argument, or a list of String arguments if that command-line option can occur more than once (such as “–include-before-header” or “–filter”).

Once you have configured a Paru::Pandoc converter, you can call convert or +<<+ (which is an alias for convert) with a string to convert. You can call convert as often as you like and, if you like, reconfigure the converter in between!

@example Convert the markdown string ‘hello world’ to HTML

Paru::Pandoc.new do
    from 'markdown
    to 'html'
end << 'hello *world*'

@example Convert a HTML file to DOCX with a reference file

Paru::Pandoc.new do
    from "html"
    to "docx"
    reference_docx "styled_output.docx"
    output "output.docx"
end.convert File.read("input.html")

@example Convert a markdown file to html but add in references in APA style

Paru::Pandoc.new do
    from "markdown"
    toc
    bibliography "literature.bib"
    to "html"
    csl "apa.csl"
    output "report_with_references.md"
end << File.read("report.md")

Constants

DEFAULT_OPTION_SEP

Use a readable option separator on Unix-like systems, but fall back to a space on Windows.

OPTIONS

For each pandoc command line option a method is defined as follows:

PARU_PANDOC_PATH

Path to the pandoc executatble to use by paru.

Public Class Methods

info() click to toggle source

Gather information about the pandoc installation. It runs +pandoc –version+ and extracts pandoc’s version number and default data directory. This method is typically used in scripts that use Paru to automate the use of pandoc.

@return [Info] Pandoc’s version, such as “[2.10.1]” and the data directory, such as “/home/huub/.pandoc”.

# File lib/paru/pandoc.rb, line 99
def self.info()
    @@info
end
new(&block) click to toggle source

Create a new Pandoc converter, optionally configured by a block with pandoc options. See {#configure} on how to configure a converter.

@param block [Proc] an optional configuration block.

# File lib/paru/pandoc.rb, line 107
def initialize(&block)
    @options = {}
    configure(&block) if block_given?
end

Public Instance Methods

<<(input)
Alias for: convert
configure(&block) click to toggle source

Configure this Pandoc converter with block. In the block you can call all pandoc options as methods on this converter. In multi-word options the dash (-) is replaced by an underscore (_)

Pandoc has a number of command line options. Most are simple options, like flags, that can be set only once. Other options can occur more than once, such as the css option: to add more than one css file to a generated standalone html file, use the css options once for each stylesheet to include. Other options do have the pattern key, which can also occur multiple times, such as metadata.

All options are specified in a pandoc_options.yaml. If it is an option that can occur only once, the value of the option in that yaml file is its default value. If the option can occur multiple times, its value is an array with one value, the default value.

@param block [Proc] the options to pandoc @return [Pandoc] this Pandoc converter

@example Configure converting HTML to LaTeX with a LaTeX engine

converter.configure do
    from 'html'
    to 'latex'
    latex_engine 'lualatex'
end
# File lib/paru/pandoc.rb, line 138
def configure(&block)
    instance_eval(&block)
    self
end
convert(input) click to toggle source

Converts input string to output string using the pandoc invocation configured in this Pandoc instance.

@param input [String] the input string to convert @return [String] the converted output as a string. Note. For some formats, output to STDOUT is not supported (see pandoc’s manual) and the result string will be empty.

The following two examples are the same:

@example Using convert

output = converter.convert 'this is a *strong* word'

@example Using <<

output = converter << 'this is a *strong* word'
# File lib/paru/pandoc.rb, line 158
def convert(input)
    run_converter to_command, input
end
Also aliased as: <<
convert_file(input_file) click to toggle source

Converts an input file to output string using the pandoc invocation configured in this Pandoc instance. The path to the input file is appended to that invocation.

@param input_file [String] the path to the input file to convert @return [String] the converted output as a string. Note. For some formats, output to STDOUT is not supported (see pandoc’s manual) and the result string will be empty.

@example Using convert_file

output = converter.convert_file 'files/document.md'
# File lib/paru/pandoc.rb, line 174
def convert_file(input_file)
    run_converter "#{to_command} #{input_file}"
end
to_command(option_sep = DEFAULT_OPTION_SEP) click to toggle source

Create a string representation of this converter’s pandoc command line invocation. This is useful for debugging purposes.

@param option_sep [String] the string to separate options with @return [String] This converter’s command line invocation string.

# File lib/paru/pandoc.rb, line 183
def to_command(option_sep = DEFAULT_OPTION_SEP)
    "#{escape(@@pandoc_exec)}\t#{to_option_string option_sep}"
end

Private Instance Methods

escape(str) click to toggle source
# File lib/paru/pandoc.rb, line 264
def escape(str)
    if Gem.win_platform?
        escaped = str.gsub("\\", "\\\\")
        "\"#{escaped}\""
    else
        str.shellescape
    end
end
run_converter(command, input = nil) click to toggle source
# File lib/paru/pandoc.rb, line 273
def run_converter(command, input = nil)
    begin
        output = ''
        error = ''
        status = 0

        Open3.popen3(command) do |stdin, stdout, stderr, thread|
            stdin << input unless input.nil?
            stdin.close
            output << stdout.read
            error << stderr.read
            status = thread.value.exitstatus
        end

        warn error unless error.empty?

        if 0 < status
            # pandoc exited with an error
            raise Paru::Error.new "error while running:\n\n#{command}\n\nPandoc responded with:\n\n#{error}\n"
        end

        output
    rescue Paru::Error => err
        raise err
    rescue StandardError => err
        throw Error.new "Unable to run pandoc via command '#{command}': #{err.message}"
    end
end
to_option_string(option_sep) click to toggle source
# File lib/paru/pandoc.rb, line 189
def to_option_string(option_sep)
    options_arr = []
    @options.each do |option, value|
        option_string = "--#{option.to_s.gsub '_', '-'}"

        case value
        when TrueClass then
            # Flags don't have a value, only its name
            # For example: --standalone
            options_arr.push "#{option_string}"
        when FalseClass then
            # Skip this option; consider a flag with value false as unset
        when Array then
            # This option can occur multiple times: list each with its value.
            # For example: --css=main.css --css=print.css
            options_arr.push value.map {|val| "#{option_string}=#{escape(val.to_s)}"}.join(option_sep)
        else
            # All options that aren't flags and can occur only once have the
            # same pattern: --option=value
            options_arr.push "#{option_string}=#{escape(value.to_s)}"
        end
    end
    options_arr.join(option_sep)
end