Package org.w3c.tidy

Class Node


  • public class Node
    extends java.lang.Object
    Used for elements and text nodes element name is null for text nodes start and end are offsets into lexbuf which contains the textual content of all elements in the parse tree. Parent and content allow traversal of the parse tree in any direction. attributes are represented as a linked list of AttVal nodes which hold the strings for attribute/value pairs.
    Version:
    $Revision: 1107 $ ($Author: aditsu $)
    Author:
    Dave Raggett dsr@w3.org , Andy Quick ac.quick@sympatico.ca (translation to Java), Fabrizio Giustina
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected org.w3c.dom.Node adapter
      DOM adapter.
      static short ASP_TAG
      node type: asp tag.
      protected AttVal attributes
      Attribute/Value linked list.
      static short CDATA_TAG
      node type: CDATA.
      protected boolean closed
      true if closed by explicit end tag.
      static short COMMENT_TAG
      node type: comment.
      protected Node content
      Contained node.
      static short DOCTYPE_TAG
      node type: doctype.
      protected java.lang.String element
      Tag name.
      protected int end
      end of span onto text array.
      static short END_TAG
      End tag.
      protected boolean implicit
      true if inferred.
      static short JSTE_TAG
      node type: jste tag.
      protected Node last
      last node.
      protected boolean linebreak
      true if followed by a line break.
      protected Node next
      next node.
      protected Node parent
      parent node.
      static short PHP_TAG
      node type: php tag.
      protected Node prev
      pevious node.
      static short PROC_INS_TAG
      node type: .
      static short ROOT_NODE
      node type: root.
      static short SECTION_TAG
      node type: section tag.
      protected int start
      start of span onto text array.
      static short START_END_TAG
      Start of an end tag.
      static short START_TAG
      Start tag.
      protected Dict tag
      tag's dictionary definition.
      static short TEXT_NODE
      node type: text.
      protected byte[] textarray
      the text array.
      protected short type
      TextNode, StartTag, EndTag etc.
      protected Dict was
      old tag when it was changed.
      static short XML_DECL
      node type: doctype.
    • Constructor Summary

      Constructors 
      Constructor Description
      Node()
      Instantiates a new text node.
      Node​(short type, byte[] textarray, int start, int end)
      Instantiates a new node.
      Node​(short type, byte[] textarray, int start, int end, java.lang.String element, TagTable tt)
      Instantiates a new node.
    • Field Detail

      • START_END_TAG

        public static final short START_END_TAG
        Start of an end tag.
        See Also:
        Constant Field Values
      • SECTION_TAG

        public static final short SECTION_TAG
        node type: section tag.
        See Also:
        Constant Field Values
      • parent

        protected Node parent
        parent node.
      • prev

        protected Node prev
        pevious node.
      • next

        protected Node next
        next node.
      • last

        protected Node last
        last node.
      • start

        protected int start
        start of span onto text array.
      • end

        protected int end
        end of span onto text array.
      • textarray

        protected byte[] textarray
        the text array.
      • type

        protected short type
        TextNode, StartTag, EndTag etc.
      • closed

        protected boolean closed
        true if closed by explicit end tag.
      • implicit

        protected boolean implicit
        true if inferred.
      • linebreak

        protected boolean linebreak
        true if followed by a line break.
      • was

        protected Dict was
        old tag when it was changed.
      • tag

        protected Dict tag
        tag's dictionary definition.
      • element

        protected java.lang.String element
        Tag name.
      • attributes

        protected AttVal attributes
        Attribute/Value linked list.
      • content

        protected Node content
        Contained node.
      • adapter

        protected org.w3c.dom.Node adapter
        DOM adapter.
    • Constructor Detail

      • Node

        public Node()
        Instantiates a new text node.
      • Node

        public Node​(short type,
                    byte[] textarray,
                    int start,
                    int end)
        Instantiates a new node.
        Parameters:
        type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
        textarray - array of bytes contained in the Node
        start - start position
        end - end position
      • Node

        public Node​(short type,
                    byte[] textarray,
                    int start,
                    int end,
                    java.lang.String element,
                    TagTable tt)
        Instantiates a new node.
        Parameters:
        type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
        textarray - array of bytes contained in the Node
        start - start position
        end - end position
        element - tag name
        tt - tag table instance
    • Method Detail

      • getAttrByName

        public AttVal getAttrByName​(java.lang.String name)
        Returns an attribute with the given name in the current node.
        Parameters:
        name - attribute name.
        Returns:
        AttVal instance or null if no attribute with the iven name is found
      • checkAttributes

        public void checkAttributes​(Lexer lexer)
        Default method for checking an element's attributes.
        Parameters:
        lexer - Lexer
      • repairDuplicateAttributes

        public void repairDuplicateAttributes​(Lexer lexer)
        The same attribute name can't be used more than once in each element. Discard or join attributes according to configuration.
        Parameters:
        lexer - Lexer
      • addAttribute

        public void addAttribute​(java.lang.String name,
                                 java.lang.String value)
        Adds an attribute to the node.
        Parameters:
        name - attribute name
        value - attribute value
      • removeAttribute

        public void removeAttribute​(AttVal attr)
        Remove an attribute from node and then free it.
        Parameters:
        attr - attribute to remove
      • findDocType

        public Node findDocType()
        Find the doctype element.
        Returns:
        doctype node or null if not found
      • discardDocType

        public void discardDocType()
        Discard the doctype node.
      • discardElement

        public static Node discardElement​(Node element)
        Remove node from markup tree and discard it.
        Parameters:
        element - discarded node
        Returns:
        next node
      • insertNodeAtStart

        public void insertNodeAtStart​(Node node)
        Insert a node into markup tree.
        Parameters:
        node - to insert
      • insertNodeAtEnd

        public void insertNodeAtEnd​(Node node)
        Insert node into markup tree.
        Parameters:
        node - Node to insert
      • insertNodeAsParent

        public static void insertNodeAsParent​(Node element,
                                              Node node)
        Insert node into markup tree in pace of element which is moved to become the child of the node.
        Parameters:
        element - child node. Will be inserted as a child of element
        node - parent node
      • insertNodeBeforeElement

        public static void insertNodeBeforeElement​(Node element,
                                                   Node node)
        Insert node into markup tree before element.
        Parameters:
        element - child node. Will be insertedbefore element
        node - following node
      • insertNodeAfterElement

        public void insertNodeAfterElement​(Node node)
        Insert node into markup tree after element.
        Parameters:
        node - new node to insert
      • trimEmptyElement

        public static void trimEmptyElement​(Lexer lexer,
                                            Node element)
        Trim an empty element.
        Parameters:
        lexer - Lexer
        element - empty node to be removed
      • trimTrailingSpace

        public static void trimTrailingSpace​(Lexer lexer,
                                             Node element,
                                             Node last)
        This maps hello world to hello world . If last child of element is a text node then trim trailing white space character moving it to after element's end tag.
        Parameters:
        lexer - Lexer
        element - node
        last - last child of element
      • escapeTag

        protected static Node escapeTag​(Lexer lexer,
                                        Node element)
        Escapes the given tag.
        Parameters:
        lexer - Lexer
        element - node to be escaped
        Returns:
        escaped node
      • isBlank

        public boolean isBlank​(Lexer lexer)
        Is the node content empty or blank? Assumes node is a text node.
        Parameters:
        lexer - Lexer
        Returns:
        true if the node content empty or blank
      • trimInitialSpace

        public static void trimInitialSpace​(Lexer lexer,
                                            Node element,
                                            Node text)
        This maps <p> hello <em> world </em> to <p> hello <em> world </em>. Trims initial space, by moving it before the start tag, or if this element is the first in parent's content, then by discarding the space.
        Parameters:
        lexer - Lexer
        element - parent node
        text - text node
      • trimSpaces

        public static void trimSpaces​(Lexer lexer,
                                      Node element)
        Move initial and trailing space out. This routine maps: hello world to hello world and hello world to hello world .
        Parameters:
        lexer - Lexer
        element - Node
      • isDescendantOf

        public boolean isDescendantOf​(Dict tag)
        Is this node contained in a given tag?
        Parameters:
        tag - descendant tag
        Returns:
        true if node is contained in tag
      • insertDocType

        public static void insertDocType​(Lexer lexer,
                                         Node element,
                                         Node doctype)
        The doctype has been found after other tags, and needs moving to before the html element.
        Parameters:
        lexer - Lexer
        element - document
        doctype - doctype node to insert at the beginning of element
      • findBody

        public Node findBody​(TagTable tt)
        Find the body node.
        Parameters:
        tt - tag table
        Returns:
        body node
      • isElement

        public boolean isElement()
        Is the node an element?
        Returns:
        true if type is START_TAG | START_END_TAG
      • moveBeforeTable

        public static void moveBeforeTable​(Node row,
                                           Node node,
                                           TagTable tt)
        Unexpected content in table row is moved to just before the table in accordance with Netscape and IE. This code assumes that node hasn't been inserted into the row.
        Parameters:
        row - Row node
        node - Node which should be moved before the table
        tt - tag table
      • fixEmptyRow

        public static void fixEmptyRow​(Lexer lexer,
                                       Node row)
        If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.
        Parameters:
        lexer - Lexer
        row - row node
      • coerceNode

        public static void coerceNode​(Lexer lexer,
                                      Node node,
                                      Dict tag)
        Coerce a node.
        Parameters:
        lexer - Lexer
        node - Node
        tag - tag dictionary reference
      • removeNode

        public void removeNode()
        Extract this node and its children from a markup tree.
      • insertMisc

        public static boolean insertMisc​(Node element,
                                         Node node)
        Insert a node at the end.
        Parameters:
        element - parent node
        node - will be inserted at the end of element
        Returns:
        true if the node has been inserted
      • isNewNode

        public boolean isNewNode()
        Is this a new (user defined) node? Used to determine how attributes without values should be printed. This was introduced to deal with user defined tags e.g. Cold Fusion.
        Returns:
        true if this node represents a user-defined tag.
      • hasOneChild

        public boolean hasOneChild()
        Does the node have one (and only one) child?
        Returns:
        true if the node has one child
      • findHTML

        public Node findHTML​(TagTable tt)
        Find the "html" element.
        Parameters:
        tt - tag table
        Returns:
        html node
      • findHEAD

        public Node findHEAD​(TagTable tt)
        Find the head tag.
        Parameters:
        tt - tag table
        Returns:
        head node
      • checkNodeIntegrity

        public boolean checkNodeIntegrity()
        Checks for node integrity.
        Returns:
        false if node is not consistent
      • addClass

        public void addClass​(java.lang.String classname)
        Add a css class to the node. If a class attribute already exists adds the value to the existing attribute.
        Parameters:
        classname - css class name
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
        See Also:
        Object.toString()
      • getAdapter

        protected org.w3c.dom.Node getAdapter()
        Returns a DOM Node which wrap the current tidy Node.
        Returns:
        org.w3c.dom.Node instance
      • cloneNode

        protected Node cloneNode​(boolean deep)
        Clone this node.
        Parameters:
        deep - if true deep clone the node (also clones all the contained nodes)
        Returns:
        cloned node
      • setType

        protected void setType​(short newType)
        Setter for node type.
        Parameters:
        newType - a valid node type constant
      • isJavaScript

        public boolean isJavaScript()
        Used to check script node for script language.
        Returns:
        true if the script node contains javascript
      • expectsContent

        public boolean expectsContent()
        Does the node expect contents?
        Returns:
        false if this node should be empty