DocParser
¶ ↑
DocParser
is a web scraping/screen scraping tool.
You can use it to easily scrape information out of HTML documents.
The gem is called docparser. You can find the documentation here.
Features¶ ↑
-
XPath and CSS support through Nokogiri
-
Support for parallel processing of the documents
-
6 Output formats:
-
CSV
-
XLSX
-
HTML
-
YAML
-
JSON
-
Screen (for debugging and development)
-
And more! (easy to extend)
Installation¶ ↑
Add this line to your application's Gemfile:
gem 'docparser'
And then execute:
bundle
Or install it yourself using:
gem install docparser
Usage¶ ↑
See example.rb
Todo¶ ↑
-
Better examples and documentation
Contributing¶ ↑
-
Fork it
-
Create your feature branch (
git checkout -b my-new-feature
) -
Commit your changes (
git commit -am 'Add some feature'
) -
Push to the branch (
git push origin my-new-feature
) -
Create new Pull Request