With data more prevalent and accessible than ever, quantitative data analysis has become a core competency for the modern business and research analyst. Ironically, “analysis” is only part of the job of a bonafide data analyst: extracting, cleaning, exploring, wrangling, validating and visualizing are now equally as important.
I come from a traditional analyst background where the primary technical skill required was Microsoft Excel. However I quickly found this limiting: data transformations I performed were not easily replicable; I couldn’t query information housed in databases; nor could I scrape data tables from the web. Either I had to rely on others - typically engineers - to do this for me, or we decided not to do the analysis at all (ie. it wasn’t “feasible”).
After a long and arduous process, I learned how to do most of these things myself and I’m convinced most analysts would also benefit from embracing the “engineering” mindset. Analysts will always need domain expertise, but they must now increasingly embrace modern software development skills in their day-to-day job. They should be data experts first, and subject matter experts second.
Thanks to some great open-source libraries, the Python community is burgeoning with data analysts and scientists. In this talk we’ll cover how to take your data from start to finish, including:
We’ll also touch on some less hard skills, such as data skepticism, sanity checking and exploratory data analysis. By the end of this talk, audience members should have a good understanding of the “data pipeline”: all the steps required to get data from its raw, unrefined state to a structured and presentable story.