Automatic speech recognition (ASR / Speech to Text) tech has been around for quite a while now. Recent advances in deep learning AI has brought about impressive accuracy levels, with models getting just short of 100% accuracy on Librespeech data.
And yet, speech recognition has had only limited penetration in our daily lives. ASRs continue to suffer on long tail accents and remain painfully monolingual. Worse, there are no reliable ASRs available for a good majority of languages, many spoken by 100s of millions of people around the world.
This large gap between impressive benchmark numbers and poor adoption of ASR tech becomes even more evident when you consider that an annual $50 billion transcription industry still overwhelmingly remains manual.
At Bhasa, we set out to fix this gap both on technical and product fronts. Our speech recognition models are meant to work for the individual without excuses of accents and languages. Not to forget, our biggest challenge and consequently the focus remains on creating product experiences that will defocus on the tech and rather focus on solving problems for our customers.