Transfer learning refers to the methods that leverage a trained model in one domain to achieve better results on tasks in a related domain i.e. we transfer the knowledge gained in one domain to a new domain. This talk is centered on recent developments in deep learning to facilitate transfer learning in NLP. We will discuss the transformer architecture and its extensions like GPT and BERT.
The classic supervised machine learning paradigm is based on learning in isolation a single predictive model for a task using a single dataset. This approach requires a large number of training examples and performs best for well-defined and narrow tasks. Transfer learning refers to the methods that leverage a trained model in one domain to achieve better results on tasks in a related domain. The model thus trained also show better generalization properties.
Computer vision has seen great success of transfer learning, model trained on the Imagenet data have been 'fine-tuned' to achieve State-of-thee-art in many other problems. In last two years, NLP has also witnessed the emergence of several transfer learning methods and architectures, which significantly improved upon the state-of-the-art on a wide range of NLP tasks.
We will present an overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks.