Filmed at PyData London 2017 www.pydata.org
Description In this talk we will explore the results from a short, collaborative ‘hack’ project with a focus on using Python machine learning tools and open data to find evidence linking increased diesel engine and tyre particulate emissions (PM10, PM2.5 and NO2) in certain residential areas with greater likelihood of diagnosis of dementia amongst the residents of that area.
Abstract Overview Alzheimer’s is a neurodegenerative disease that currently affects over 55 million people worldwide and this is set to increase with trends in population growth and demographics. Meanwhile, particulate emission levels in large cities across Europe and the world have been under scrutiny, as diesel emissions related to a huge uptick in diesel vehicle purchases (due to governments' push to reduce CO2 emission levels) are found to have caused many medical problems in the area of breathing difficulties.
Breathing function and the lungs may not be the only parts affected; there is now more than one scientific study (see link  for an example below) that has found a link between higher incidence of dementia (in particular Alzheimer’s) in those who have lived in highly polluted urban areas for large periods of their lives.
This talk is about a short, collaborative ‘hack’ project with a focus on using Python machine learning tools and open data to test this hypothesis and look for evidence either confirming or denying the link between increased diesel engine and tyre particulate emissions (PM10, PM2.5 and NO2) in a given residential area with a greater likelihood of diagnosis of dementia amongst the residents of that area.
The idea behind this endeavour is that this ‘pilot’ study might enable a funded venture to take root that, for example, increases awareness of the true impact of high diesel car pollution in dense conurbations.
Questions asked Can we visualize the likelihood of incidence of dementia per person on a heat map for a city or for the UK? And similarly visualize annual particulate exposure per resident? Do they look similar? Do various sources of data, when placed under the magnifying glass of data science corroborate the evidence gathered by recent studies? Can a classifier reasonably determine the likelihood of your contracting dementia based on where you have lived mostly during your life? Where should you live and work that gives you a lower chance of contracting Alzheimer’s later in life? How can the findings be used to push city officials to improve city air quality? How can we raise awareness? Challenges encountered on the way How to interpolate, in order to cover 'gaps' in-between emission monitoring stations, incorporating additional data sources (for example road network structure and traffic intensity levels)? Sensitive medical data; how to merge open medical datasets securely without revealing personal information? Which types of machine learning models are best suited to this problem? Approach Starting at a high level, it is possible to determine if there is a higher incidence of dementia cases linked with an increase in airborne pollutants (diesel source):
At a country level (e.g. for the Netherlands or the UK) At a city level (e.g. for Bristol, Eindhoven and / or London) At a borough level (e.g. for Camden, London) At a street level (by postcode) At individual sufferer level (with de-personalized datasets, based on 1,000 blog posters) Other research inputs (articles and papers)  http://www.sciencemag.org/news/2017/01/brain-pollution-evidence-builds-dirty-air-causes-alzheimer-s-dementia