WebScraper 4.7.2

WebScraper uses the Integrity v6 engine to quickly scan a website, and can output the data (currently) as CSV or JSON.

  • Easy to scan a site – just enter the starting URL and press “Go”
  • Easy to export – choose the columns you want
  • Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
  • Configuration of various limits on the crawl and the output file size

Version 4.7.2:

  • Small but important enhancement to whitelisting rules. If a page meets the ‘output filter’ rules (which means that it’s an ‘information page’ or ‘detail page’) it’ll be included in the crawl regardless of the rules that are set up in the scan blacklist / whitelist rules.
  • this makes it easier to set up WebScraper where you want to limit the scan to search results or a certain section of the site, but gather information from detail pages which don’t meet those scan rules.
  • Some updates to the context help and other small fixes / enhancements.

 

REQUIREMENTS

OS X 10.8 or later, 64-bit processor

  • CAN NOT DOWNLOAD: Some probably encounter the following error: This site can’t be reached ...sundryfiles.com’s server IP address could not be found. DNS_PROBE_FINISHED_NXDOMAIN. In this case, please use Google DNS and you will get rid of trouble.
  • If downloaded file can not be extracted (file corrupted...), please make sure you have downloaded the file completely and don't use Winzip, it sucks! We would recommend using The Unarchiver.
  • By reason, the App does not work and can not be opened. Mostly, just Disable the Gatekeeper, and you get rid of troubles.
Screenshots:
Size: 6.73 MB