Description
Link to slides: https://www.slideshare.net/secret/2a5Xz9Sgc3D5GU
Description Those folks in computer vision keep publishing amazing ideas about you to apply convolutions to images. What about those of us who work with text? Can't we enjoy convolutions as well? In this talk I'll review some convolutional architectures that worked great for images and were adapted to text and confront the hardest parts of getting them to work in Tensorflow .
Abstract
The go to architecture for deep learning on sequences such as text is the RNN and particularly LSTM variants. While remarkably effective, RNNs are painfully slow due their sequential nature. Convolutions allow us to process a whole sequence in parallel greatly reducing the time required to train and infer. One of the most important advances in convolutional architectures has been the use of gating to concur the vanishing gradient problem thus allowing arbitrarily deep networks to be trained efficiently.
In this talk we'll review the key innovations in the DenseNet architecture and show how to adapt it to text. We'll go over "deconvolution" operators and dilated convolutions as means of handling long range dependencies. Finally we'll look at convolutions applied to [translation] (https://arxiv.org/abs/1610.10099) at the character level.
The goal of this talk is to demonstrate the practical advantages and relative ease with which these methods can be applied, as such we will focus on the ideas and implementations (in tensorflow) more than on the math.