In this talk I’m going to introduce Scrapinghub’s new open source framework Frontera. Frontera allows to build real-time distributed web crawlers and website focused ones.
Offering:
Along with framework description I’ll demonstrate how to build a distributed crawler using Scrapy, Kafka and HBase, and hopefully present some statistics of Spanish internet collected with newly built crawler. Happy EuroPythoning!