Our html content extractor extracts plaintext from blog-posts and articles, its perfect for site scraping. It automatically identifies the main content, and removes the surplus “clutter” (boilerplate, templates) around it. The content extractor works best for news articles and blog posts.

….But a live demo is worth much more than thousand word. Check this out:

Textracto demo


Our extractor is also able to extract images, headlines, links and meta data (OpenGraph).

Interested? Get a free API key.