Our voices are no longer a mystery to speech recognition (SR) software, the technology powering these services has amazed the humanity with its ability to understand us. This talk aims to cover the intrinsic details of advanced state of art SR algorithms with live demos of Project DeepSpeech.
A research says that “50% of all searches will be voice searches by 2020”. World’s technology giants have placed big bets with their investments in services providing voice search, personal digital assistant, IoT devices etc. Solving the problem of speech recognition is a herculean task, given the complexity involved with data like human voice.
The talk will cover a brief history of speech recognition algorithms, the challenges associated with building these systems and then explain how one can build an advance speech recognition system using the power of deep learning and for illustration, we will deep dive into Project DeepSpeech. Project DeepSpeech is an open source Speech-To-Text engine developed by Mozilla Research based on Baidu’s Deep Speech research paper and implemented using Google’s TensorFlow library.
Speech recognition is not all about the technology, there are a lot more concerns, challenges around how these AI models are being part of our day to day life, its biases etc. The bigger question revolves around centralization of these AI services, projects like Common Voice addresses these problems by enabling all to be part of this revolution, a part of the talk will focus on how people need to approach these type of research keeping in mind the community and humanitarian benefits as first priority.