EuroPython 2015

Web Scraping Best Practises

Python is a fantastic language for writing web scrapers. There is a large ecosystem of useful projects and a great developer community. However, it can be confusing once you go beyond the simpler scrapers typically covered in tutorials.

In this talk, we will explore some common real-world scraping tasks. You will learn best practises and get a deeper understanding of what tools and techniques can be used and how to deal with the most challenging of web scraping projects!

We will cover crawling and extracting data at different scales - from small websites to large focussed crawls. This will include an overview of automated extraction techniques. We’ll touch on common difficulties like rendering pages in browsers, proxy management, and crawl architecture.

Slides: https://speakerdeck.com/shaneaevans/web-scraping-best-practises

in on Tuesday 21 July at 11:00 See schedule

Video


Do you have some questions on this talk?

New comment