PyData Berlin 2016
Since April 2015 our group has studied the Allergic Rhinitis of a subject with the goal of building a machine learned model that predicts the need for antihistamines. Approximately 30% of the world's population suffers from allergies, we aim to provide a methodology for others to identify the drivers of their own symptoms. This is a "citizen science" project, currently focused on one individual and a year's worth of self-reported antihistamine usage, sneezing data and geolocated points. We'll discuss the available external data (including the London Air project's pollution readings, weather, diet, exercise and commute data), exploratory data analysis, our approach to feature engineering from time-series and text sources and our modeling progress. The data logging iPhone app and data preparation tools are all open sourced. Python tools discussed include scikit-learn, seaborn, statsmodels and textract. We'll also review our distributed working practices.