The goal of this tutorial is to provide efficient and sufficient scaffolding for people with no prior knowledge of Python – but with some knowledge of programming – to effectively utilize Python-based tools for data science research and development, such as the pandas and scikit-learn open source libraries, or the Atigeo xPatterns analytics framework.
The first part of the tutorial will cover basic data science concepts and use code and data examples relevant to data science (drawn from the UCI mushroom dataset). Basic Python programming concepts will include data structures (strings, lists, tuples, dictionaries), control structures (conditionals & loops), file I/O, and defining and calling functions.
The second part of the tutorial will focus on constructing a simple decision tree based on the ID3 algorithm and using it to classify instances from the UCI mushroom dataset. This portion will also include the use of recursion, Python classes (object-oriented programming) and the use of Python scripts with arguments from the command line.
Slides available here: https://github.com/gumption/Python_for_Data_Science