class Nokogiri::XML::ParseOptions

Options that control the parsing behavior for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

These options directly expose libxml2's parse options, which are all boolean in the sense that an option is β€œon” or β€œoff”.

πŸ’‘ Note that HTML5 parsing has a separate, orthogonal set of options due to the nature of the HTML5 specification. See Nokogiri::HTML5.

⚠ Not all parse options are supported on JRuby. Nokogiri will attempt to invoke the equivalent behavior in Xerces/NekoHTML on JRuby when it's possible.

Setting and unsetting parse options

You can build your own combinations of parse options by using any of the following methods:

ParseOptions method chaining

Every option has an equivalent method in lowercase. You can chain these methods together to set various combinations.

# Set the HUGE & PEDANTIC options
po = Nokogiri::XML::ParseOptions.new.huge.pedantic
doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)

Every option has an equivalent no{option} method in lowercase. You can call these methods on an instance of ParseOptions to unset the option.

# Set the HUGE & PEDANTIC options
po = Nokogiri::XML::ParseOptions.new.huge.pedantic

# later we want to modify the options
po.nohuge # Unset the HUGE option
po.nopedantic # Unset the PEDANTIC option

πŸ’‘ Note that some options begin with β€œno” leading to the logical but perhaps unintuitive double negative:

po.nocdata # Set the NOCDATA parse option
po.nonocdata # Unset the NOCDATA parse option

πŸ’‘ Note that negation is not available for STRICT, which is itself a negation of all other features.

Using Ruby Blocks

Most parsing methods will accept a block for configuration of parse options, and we recommend chaining the setter methods:

doc = Nokogiri::XML::Document.parse(xml) { |config| config.huge.pedantic }
ParseOptions constants

You can also use the constants declared under Nokogiri::XML::ParseOptions to set various combinations. They are bits in a bitmask, and so can be combined with bitwise operators:

po = Nokogiri::XML::ParseOptions.new(Nokogiri::XML::ParseOptions::HUGE | Nokogiri::XML::ParseOptions::PEDANTIC)
doc = Nokogiri::XML::Document.parse(xml, nil, nil, po)

Constants

BIG_LINES

Support line numbers up to long int (default is a short int). On by default for for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

COMPACT

Compact small text nodes. Off by default.

⚠ No modification of the DOM tree is allowed after parsing. libxml2 may crash if you try to modify the tree.

DEFAULT_HTML

The options mask used by default used for parsing HTML4::Document and HTML4::DocumentFragment

DEFAULT_SCHEMA

The options mask used by default used for parsing XML::Schema

DEFAULT_XML

The options mask used by default for parsing XML::Document and XML::DocumentFragment

DEFAULT_XSLT

The options mask used by default used for parsing XSLT::Stylesheet

DTDATTR

Default DTD attributes. On by default for XSLT::Stylesheet.

DTDLOAD

Load external subsets. On by default for XSLT::Stylesheet.

⚠ It is UNSAFE to set this option when parsing untrusted documents.

DTDVALID

Validate with the DTD. Off by default.

HUGE

Relax any hardcoded limit from the parser. Off by default.

⚠ There may be a performance penalty when this option is set.

NOBASEFIX

Do not fixup XInclude xml:base uris. Off by default

NOBLANKS

Remove blank nodes. Off by default.

NOCDATA

Merge CDATA as text nodes. On by default for XSLT::Stylesheet.

NODICT

Do not reuse the context dictionary. Off by default.

NOENT

Substitute entities. Off by default.

⚠ This option enables entity substitution, contrary to what the name implies.

⚠ It is UNSAFE to set this option when parsing untrusted documents.

NOERROR

Suppress error reports. On by default for HTML4::Document and HTML4::DocumentFragment

NONET

Forbid network access. On by default for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

⚠ It is UNSAFE to unset this option when parsing untrusted documents.

NOWARNING

Suppress warning reports. On by default for HTML4::Document and HTML4::DocumentFragment

NOXINCNODE

Do not generate XInclude START/END nodes. Off by default.

NSCLEAN

Remove redundant namespaces declarations. Off by default.

OLD10

Parse using XML-1.0 before update 5. Off by default

PEDANTIC

Enable pedantic error reporting. Off by default.

RECOVER

Recover from errors. On by default for XML::Document, XML::DocumentFragment, HTML4::Document, HTML4::DocumentFragment, XSLT::Stylesheet, and XML::Schema.

SAX1

Use the SAX1 interface internally. Off by default.

STRICT

Strict parsing

XINCLUDE

Implement XInclude substitution. Off by default.

Attributes

options[RW]
to_i[RW]

Public Class Methods

new(options = STRICT) click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 165
def initialize(options = STRICT)
  @options = options
end

Public Instance Methods

==(other) click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 198
def ==(other)
  other.to_i == to_i
end
inspect() click to toggle source
Calls superclass method
# File lib/nokogiri/xml/parse_options.rb, line 204
def inspect
  options = []
  self.class.constants.each do |k|
    options << k.downcase if send(:"#{k.downcase}?")
  end
  super.sub(/>$/, " " + options.join(", ") + ">")
end
strict() click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 189
def strict
  @options &= ~RECOVER
  self
end
strict?() click to toggle source
# File lib/nokogiri/xml/parse_options.rb, line 194
def strict?
  @options & RECOVER == STRICT
end