Recent work by Kiela et al. (2018) reveals that image and text multi-modal classification models far outperform both text- and image-only models. This talk will review work that extends Kiela et al.’s (2018) research by determining if accuracy in classification may be increased by the implementation of transfer learning in language processing. The performance of the model over a MM-IMDb (Arevalo et al. 2017) dataset is analyzed and compared to the baseline provided by Kiela et al. (2018).
The work is implemented with PyTorch and the goal of the talk will be to review details of the implementation, and performance of the model as compared to that recorded in Kiela et al. (2018). Attendees of this talk should have a basic familiarity with neural nets developed in PyTorch for the purposes of NLP and computer vision.
References: Arevalo, J., Solorio, T., Montes-y-Gómez, M., & González, F. A. 2017. Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992.
Kiela, Douwe, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2018. Efficient large-scale multi-modal classification. arXiv preprint arXiv:1802.02892.
Feedback form: https://python.it/feedback-1754
in __on Saturday 4 May at 10:45 **See schedule**