class DOMReader

Constants

TAG_ATTR
TAG_CLOSE
TAG_COMMON
TAG_OPEN
TAG_SPECIAL
TAG_TEXT

Attributes

root[RW]
tag_count[RW]

Public Class Methods

new() click to toggle source

@Stack is only used to store open tag. Because only opentags can have children @Stack will be initialized with the DOCUMENT node @html is used to store html @index is used to store tag offset @root is the tag tree @tag_count counts the opentag + specialtag + texttag

# File lib/domparser/parser_script.rb, line 25
def initialize
  @stack = []
  @html = nil
  @index = 0
  @root = Node.new('DOCUMENT', nil, 'general', 0)
  @tag_count = 0
end

Public Instance Methods

parser_script(file_path) click to toggle source

Get new node, do proper process depends on its type, weather its opentag, close tag or special tag. Break the loop if the stack has one element left

# File lib/domparser/parser_script.rb, line 36
def parser_script file_path
  read_file file_path
  @stack << @root # initialize the @stack
  loop do
    cur_node = get_new_tag @html, @index
    processing cur_node
    break if @stack.length == 1
  end
  # puts @root
  @root
end
simple_print_parser(data) click to toggle source

Recursively print all the tags in the data structure. Simple cheat print, only use the tag in the data structure. def simple_print_parser data

puts data.tag
return if data.children.empty?
data.children.each do |child|
  print " " * child.depth
  simple_print_parser child
end

end

# File lib/domparser/parser_script.rb, line 59
def simple_print_parser data
  if data.type.nil?
    puts data.tag
  elsif data.attributes.empty?
    puts "<#{data.type}>"
  else
    string = ""
    data.attributes.each do |key, value|
      if key == :class
        string << key.to_s << "='"
        value.each do |class_value|
          string << class_value << " "
        end
        string << "' "
      else
        string << key.to_s << "=" << "'" << value << "'" << " "
      end
    end
    puts "<#{data.type} #{string.strip}>"
  end
  return if data.children.empty?
  data.children.each do |child|
    print " " * child.depth
    simple_print_parser child
  end
end

Private Instance Methods

add_attributes(node) click to toggle source
# File lib/domparser/parser_script.rb, line 163
def add_attributes node
  attributes = node.tag.scan(TAG_ATTR) # Here I use the scan instead of match to get all attributes
  unless attributes.nil?
    attributes.each do |attribute|
      attribute[0] == "class" ? set_class_attr(attribute, node) : set_normal_attr(attribute, node)
    end
  end
end
add_tag_type(node) click to toggle source
# File lib/domparser/parser_script.rb, line 197
def add_tag_type node
  node.type = node.tag.match(TAG_OPEN)[1].to_sym
end
add_text(node) click to toggle source
# File lib/domparser/parser_script.rb, line 152
def add_text node
  text_match = @html[(@index - 1)..(@index + node.offset + 1)].match(TAG_TEXT)
  unless text_match.nil?
    text = text_match[1].strip
    t_node = Node.new(text, nil, 'text')
    setup_relation t_node
    increment_depth t_node
    @tag_count += 1
  end
end
add_to_stack(node) click to toggle source
# File lib/domparser/parser_script.rb, line 201
def add_to_stack node
  @stack << node
end
get_new_tag(html_string, index) click to toggle source

Function: Get the next tag, index property will change for each run. Get the <Matchdata: …> If find the match, get the string form from the original Matchdata Get the offset of the position of the tag(beginning) Create the new node of the tag

# File lib/domparser/parser_script.rb, line 95
def get_new_tag html_string, index
  new_tag = html_string[index..-1].match(TAG_COMMON)
  new_tag = new_tag[0] unless new_tag.nil?
  tag_offset = html_string[index..-1] =~ TAG_COMMON
  new_node = Node.new(new_tag, tag_offset)
end
increment_depth(node) click to toggle source

Small helper methods

# File lib/domparser/parser_script.rb, line 188
def increment_depth node
  node.depth = node.parent.depth + 2
end
increment_index(node) click to toggle source
# File lib/domparser/parser_script.rb, line 205
def increment_index node
  @index += node.tag.length + node.offset
end
opentag?(node) click to toggle source
# File lib/domparser/parser_script.rb, line 209
def opentag? node
  !!node.tag.match(TAG_OPEN)
end
process_closetag(node) click to toggle source

If find a close tag, the last element in the stack must be a match to it. So we pop the last element in the stack. Then setup the relationship with the new last element in the stack. The add text step must be done before the @stack.pop so the text is connected to the previews open tag

# File lib/domparser/parser_script.rb, line 144
def process_closetag node
  add_text node
  @stack.pop
  setup_relation node
  increment_index node
  increment_depth node
end
process_opentag(node) click to toggle source

For the open tag Setup the parent-child connection with last element in stack

# File lib/domparser/parser_script.rb, line 135
def process_opentag node
  process_special node
  add_to_stack node
end
process_special(node) click to toggle source

For special tag, add its previews text and setup relationship, add its type and attributes

# File lib/domparser/parser_script.rb, line 122
def process_special node
  add_text node
  setup_relation node
  add_tag_type node
  add_attributes node
  increment_index node
  @tag_count += 1
  increment_depth node
end
processing(node) click to toggle source

Seperate process the open_tag, close_tag and special tag

# File lib/domparser/parser_script.rb, line 108
def processing node
  if special? node
    process_special node
  elsif opentag? node
    process_opentag node
  else
    process_closetag node
  end
end
read_file(file_path) click to toggle source

Just read the file and strip off all the annoying n

# File lib/domparser/parser_script.rb, line 103
def read_file file_path
  @html = File.read(file_path).gsub("\n", "")
end
set_class_attr(attribute, node) click to toggle source
# File lib/domparser/parser_script.rb, line 172
def set_class_attr attribute, node
  node.attributes[:class] = []
  classes = attribute[1]
  classes.split(" ").each do |one_class|
    node.attributes[:class] << one_class.strip
  end
end
set_normal_attr(attribute, node) click to toggle source
# File lib/domparser/parser_script.rb, line 180
def set_normal_attr attribute, node
  name = attribute[0].to_sym # transform it to symbol
  value = attribute[1]
  node.attributes[name] = value
end
setup_relation(node) click to toggle source
# File lib/domparser/parser_script.rb, line 192
def setup_relation node
  @stack.last.children << node
  node.parent = @stack.last
end
special?(node) click to toggle source
# File lib/domparser/parser_script.rb, line 213
def special? node
  !!node.tag.match(TAG_SPECIAL)
end