Contribute Media
A thank you to everyone who makes this possible: Read More

Scraping the Web with Scrapy


Dave has been developing in Django for 3 years in sunny Austin, TX, where he is an organizer for the Austin Web Python User Group. He worked for the Texas Tribune building data apps and contributing to the open source Armstrong CMS framework built on top of Django. He is now freelancing after his latest attempt at the startup game crashed. He enjoys shooting sports and barbecueing beef when he's not scraping ecommerce sites. The worst bet he's ever lost was when he was forced to retake the SAT at the age of 27 because he came in dead last in a fantasy football league.

Python has great tools like Django and Flask for taking your database and turning it into html pages, but what if you want to take somebody else's html pages and build a database from them? Scrapy is a library for building web spiders that will simplify your web scraping tasks immensely. Friends don't let friends use raw urllib2.


Improve this page