Data cleaning is one of the most important tasks in data science but it is unglamorous, underappreciated and under-discussed. These are some common tasks involved in data cleaning but not limited to: - Merging/ appending - Checking completeness of data - Checking of valid values - De-duplication - Handling of missing values - Recoding
Most, if not all, of the time, the datasets that we have to analyze are unclean. i.e. they are not necessarily complete/ accurate/ valid. This will impact the accuracy of our analysis if we do not clean them properly.
This talk covers how to perform data cleaning and understanding using primarily Pandas and Numpy. If you’re new to data analytics/ data science and are interested how to use Python to perform analysis, or if you're an Excel user hoping to move to Python, this talk might be for you.
Participants should be at least familiar with the basics of Python programming.