PyData London 2016
This talk details techniques used to train state of the art neural networks on chemical data, from data processing to predictive screening, all in Python!
The phrase "Deep learning" has gained buzzword status in recent years. Regardless of whether the unprecedented hype surrounding technique is well placed, it has established itself as state of the art in many fields, most notably in image and speech recognition and natural language processing, where tech giants such as Google, Facebook, Baidu, Microsoft and Apple have invested billions of dollars in driving the technology forward.
Less well known is that it has been shown to be state of the art for Quantitative Structure Activity Relationship (QSAR) problems, including winning the Kaggle Merck Molecular Activity Challenge, and the recent Tox21 toxicity challenge. These problems aim to predict, given only the chemical structure of a compound, how the application of that compound would affect the activity of a biological system.
This talk will cover research undertaken as part of my PhD. I will cover:
- data formatting and preprocessing using sqlalchemy and pandas
- feature extraction from a chemical structure using the RDKit cheminformatics toolkit
- training of models using Theano and Keras.
- results and comparison to other techniques