Contribute Media
A thank you to everyone who has made this possible: Read More

How to “Scrape” Together a Great Dataset Using Things You Find on the Internet Using Python & SciP


Using Jupyter notebooks and scikit-learn, you’ll predict whether a movie is likely to win an Oscar or be a box office hit. I’ll walk through the most important steps of creating an effective dataset using information that you find on the Internet: asking a question your data can answer, writing a web scraper, and answering those questions using nothing but Python libraries and data from the Internet. To illustrate how these steps fit together, I walk through building a dataset from IMDB data and use it to predict what makes a winning Oscar movie. To preview our findings, check out this trailer and the github repository!

Improve this page