If we want to extract the contents of a website automating information extraction, often we find that the website does not offer any API to get the data you need and it is necessary use scraping techniques to recover data from a Web automatically. Some of the most powerful tools for extracting the data in web pages can be found in python ecosystem.
Introduction to webscraping and python tools
Some of the most powerful tools to extract data can be found in the python ecosystem, among which we highlight Beautiful soup, Pyquery and Scrapy.
Asyncio with aiohttp for asyncronous requests
I will make an introduction to asyncio and aiohttp modules explaining the basic concepts like coroutines and event loops and try to compare them with requests module. The most important is understand why the union asyncio + aiohttp has a better performance than requests module.
Asynchronous scraping data
I will show an example integrating some of the scraping tools commented before like BeautifulSoup or Scrapy with asyncio + aiohttp and obtain the performance improvement comparing with Requests module.