By 2025, there will be over 8 million voice assistants in the world. They are found on your mobile phone, in your home, in your car, and over time, will be embedded in many cyber-physical systems across the world. At the same time, there are over 7000 languages spoken in the world - "living languages".
But voice assistants support just a fraction of these languages. Moreover, accents and diversity _within_ a spoken language are not well handled by voice assistants. For example, African American voices are much less likely to be correctly recognised by the speech recognition algorithms used within voice assistants. And as we start to interact with systems using voice, we have a human desire to listen to voices we resonate with. Voices like us. For many people, there are no synthesised voices that reflect their heritage, language, and gender expression.
There are several techno-social reasons behind this state of affairs.
* The intent of a commercial voice assistant is to make money. This drives technical development in certain ways, such as certain languages being seen as more lucrative than others, irrespective of the number of speakers of that language. For example, there is more voice assistant support for Icelandic, a language spoken by 314,000 people, than there is for Kiswahili, a language spoken by over 100,000,000 people in Eastern Africa. Why? Money.
* The big tech companies behind voice assistants have typically poor gender and racial diversity in their talent pool. Diversity in developers leads to diversity in development.
* The data used for training speech recognition and speech synthesis models often has racial and gender biases. These can stem from both selection bias, but also broader systemic issues of inequality, such as the use of voice assistant technology to gather data - and the affordability of both that technology and its pre-requisites, such as internet access.
* Many languages are considered "low resource languages". This means they often don't have written transcriptions, which are needed to train machine learning models. Those creating transcriptions often face the "transcription bottleneck" - a workflow impediment that means the creation of resources consumes significant labour time.
There are many established and emerging open source tools - many in Python - and movements that _individually_ are addressing aspects of this broader techno-social system. **Together**, they can effect change so that _everyone, everywhere can be afforded the benefits of voice technology_.
Produced by NDV: https://youtube.com/channel/UCQ7dFBzZGlBvtU2hCecsBBg?sub_confirmation=1
Python, PyCon, PyConAU, PyConline
Sat Sep 5 14:05:00 2020 at Floperator