Extractor

Our html content extractor extracts plaintext from blog-posts and articles, it is perfect for site scraping. It automatically identifies the main content, and removes the surplus “clutter” (boilerplate, templates) around it. The content extractor works best for news articles and blog posts. Our algorithm is based on the “depth-scanning” principle and it works well for most pages. The algorithm is very fast, compared to other web scraping services, we can provide the parsed content within milliseconds. Furthermore we can execute JavaScript to ensure that AJAX based websites load their full content. Our JSON API is very flexible, it will perfectly fit your needs.

….But a live demo is worth much more than thousand word. So check this out:

Live demo:


Our extractor is also able to extract images, headlines, links and meta data (OpenGraph).

Interested? Buy a API key with NANO (XRB).