WebScraper uses the Integrity v6 engine to quickly scan a website, and can output the data (currently) as CSV or JSON.
- Easy to scan a site – just enter the starting URL and press “Go”
- Easy to export – choose the columns you want
- Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
- Configuration of various limits on the crawl and the output file size
- Small but important enhancement to whitelisting rules. If a page meets the ‘output filter’ rules (which means that it’s an ‘information page’ or ‘detail page’) it’ll be included in the crawl regardless of the rules that are set up in the scan blacklist / whitelist rules.
- this makes it easier to set up WebScraper where you want to limit the scan to search results or a certain section of the site, but gather information from detail pages which don’t meet those scan rules.
- Some updates to the context help and other small fixes / enhancements.
OS X 10.8 or later, 64-bit processor