Recently, a considerable advancemet in the area of Image Segmentation was achieved after state-of-the-art methods based on Fully Convolutional Networks (FCNs) were developed. The objective of Image Segmentation problem is to label every pixel in the image with the class of its enclosing object or region. This problem is extremely challenging because the method should have strong classification and localization properties at the same time. While being very complicated, image segmentation is an important problem as it has many applications in medicine, autonomous driving and other fields. In our talk, we go through theory of the recent state-of-the-art methods for image segmentation based on FCNs and present our library which aims to provide a simplified way for users to apply these methods for their own problems.
Methods based on Convolutional Neural Networks (CNNs) have pushed the performance on a broad array of problems, including image classification (1) and object detection (2). ImageNet Large Scale Visual Recognition Competition (ILSVRC) is a main image classification competition. The training data of ILSVRC contains 1000 categories and approximately 1.2 million images and all successful approaches that perform well on this dataset are based on CNNs. Moreover, CNNs that were trained on this dataset act as a good initialization for other tasks as object detection, image segmentation and others (2) (3).
However, partial built-in invariance of CNNs to translations, rotations and other transformations made it hard to use pretrained CNNs for the task of image segmentation. While being beneficial for the task of image classification, invariance properties are not beneficial for the task of image segmentation where strong localization propoerties are required (3).
Recent work introduced Fully Convolutional Networks (3), an adaptation of image classification CNNs that enables to successfully use them for the task of image segmentation while reducing the negative effect of invariance properties. In our talk, we briefly describe basic building blocks of CNNs (convolutional layers, pooling layer, fully connected layers etc.), explain why they show superior performance according to recent papers, explain how these CNNs can be converted into FCNs in order to perform image segmentation. After that we conclude with demonstration of how our library can be used to train FCNs for image segmentation on a particular dataset.
We plan to structure our talk in the following way:
- Basic building blocks of Convolutional Neural Networks (CNNs) based on "A guide to convolution arithmetic for deep learning" resource (4).
- Live demonstration on how these CNNs can be applied for image classification based on our blog post (5).
- Live demonstration and explanation on how CNNs can be converted into FCNs based on our blog post (5).
- Live demonstration and explanation on how interpolation can be reformulated in terms of convolution and being integrated into the network architecture based on our blog post (6).
- Live demonstration and explanation on how FCNs can be trained on the PASCAL VOC general image segmentation dataset based on our blog post (7).
- Demonstration of how our library (8) (implemented using Tensorflow library) was used to train these models for the task of segmentation of medical images based on our recent paper (9).
- Demonstration of the same library but ported to PyTorch and why it is easier to use.
Conclusion and discussion
In our talk, we introduced audience to the recent advancement in the field of image segmentation research, briefly covered the theory behind it and showed how some of the recent state-of-the-art image segmentation methods can be applied to a particular task using our library.
Biography and additional information
Daniil Pakhomov is a PhD student at Johns Hopkins University. His main research areas are general image segmentation and segmentation of medical images.
Contents of our blog posts were well-accepted by machine learning community. Some of them got promotional tweets from the official Kaggle account and others (10). The author previously gave a talk on EuroScipy 2016 conference (11). The author has contributed to tensorflow/models, Theano and scikit-image repositories. Similar talk by the author was accepted to be presented at Scipy 2017 and this talk is an extended and improved version of it since then.