Multimedia automatic learning has drawn attention from companies and governments for a significant number of applications for automated recommendations, classification, and human brain understatement. In recent years, and an increased amount of research has explored using deep neural networks for multimedia related tasks.
Some government security and surveillance applications are automated detections of illegal and violent behaviors, child pornography and traffic infractions. Companies worldwide are looking for content-based recommendation systems that can personalize clients consumption and interactions by understanding the human perception of memorability, interestingness, attractiveness, aesthetics. For these fields like event detection, multimedia affect and perceptual analysis are turning towards Artificial Neural Networks. In this talk, I will present the theory behind multi-modal fusion using deep learning and some open challenges and their state-of-the-art.
Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks.