About Bhasa

Automatic speech recognition (ASR / Speech to Text) tech has been around for quite a while now. Recent advances in deep learning AI has brought about impressive accuracy levels in speech recognition, with models getting just short of 100% accuracy on Librespeech data.

And yet, speech recognition has not penetrated the daily lives of people. ASRs continue to suffer on long tail accents and remain painfully monolingual. Worse, there are no reliable ASRs available for a good majority of languages, many of them spoken by 100s of millions of people around the world.

This large gap between impressive benchmark numbers and poor adoption of ASR tech becomes even more evident when you consider that an annual $50 billion transcription industry still overwhelmingly remains human ops driven.

At Bhasa, we set out to fix this gap both on technical and product fronts. Our speech recognition models are meant to work for the individual without excuses of accents. We aspire to cover as many languages as possible and also enable transcription of multilingual conversations. Not to forget, our biggest challenge and consequently the focus remains on creating product experiences that will defocus on the tech and rather focus on solving problems for our customers.

What Bhasa Does?


Create any number of custom models with just 5 minutes of annotated audio-text.


Upload your audio or video file. Bhasa supports mp3, mp4, wav formats.


Choose between generic and custom transcription models to convert the audio file.


Edit the generated transcript with an incredibly easy to use transcription editor.