How Indian Startups Are Building AI for Vernacular Languages No One Else Cares About
How Indian Startups Are Building AI for Vernacular Languages No One Else Cares About
"The billion people who come online won't be typing in English. They'll be speaking in Tulu, Bodo, Bhojpuri and Gondi. The question is. Will AI be ready for them?"
The Language Wall No One Talks About
I still remember a conversation with a farmer . He was excited to learn how artificial intelligence could help him with his crops.. When he opened his phone and tried to use ChatGPT in his native language the AI just didn't work.
This is a reality for hundreds of millions of Indians who live on the wrong side of what researchers call the Language Wall. It's a barrier that separates those who can access AI from those who cannot based on the language they speak.
India has over 1,600 languages and dialects .OpenAI's ChatGPT only supports a dozen Indian languages. Googles Gemini supports nine. The rest -Thousands of languages spoken by people are ignored.
Something exciting is happening. A few Indian startups are building AI from scratch for languages that have no data.
Part 1: The Scale of Whats Being Ignored
Before we talk about these startups lets understand the problem.
India is like a continent of languages. There are 22 scheduled languages over 780 distinct languages and more than 19,500 dialects.. 85% Of Indias over 1.4 billion people do not speak English fluently.
The result? When you type a question in Tulu, an AI trained on English doesn't understand the context. A word that means one thing in Karnataka Tulu means something different in the hills.
Part 2: Why Big Tech Won't Fix This
Here's the truth: Silicon Valley has no financial incentive to build AI for Tulu.
Training a large language model requires digital text. English has trillions of tokens. Hindi has hundreds of billions...But Tulu? Bodo? Kashmiri?
These languages have virtually no digital footprint. Few written records. Almost no Wikipedia articles. Barely any transcribed audio. Building AI for them means building the data from scratch — expensive, slow, and with a limited immediate market.
Part 3: Meet the Startups Building What No One Else Will
🌿 TuluAI - Giving a Voice to 2 Million Coastal Speakers
Amrith Shenava launched a translation app. Discovered Tulu had zero digital data. So he built TuluAI.TuluAI holds storytelling sessions in areas. Local residents narrate stories. Simulate conversations. They collect WhatsApp voice notes and verify every sample.
🌿 Aakhor AI - The 25-Year-Old Taking on Bodo and Assamese
Kabyanil Talukdars startup, Aakhor AI runs community workshops and voice-note drives via WhatsApp groups.
🔬 KashmiriGPT - Because Even Kashmir Deserves Its AI
A small startup is building an AI model that understands Kashmiri. A language, with almost no digital training data.
🚀 Sarvam AI - Indias Most Ambitious Language AI Bet
Sarvam AI raised $53 million. Was selected by the Indian government to build Indias first homegrown sovereign large language model.
🚀 Krutrim - Indias First AI Unicorn
Krutrim became Indias first AI unicorn valued at $1 billion.
Krutrims LLM was trained on over 2 trillion tokens. Can understand and generate text in all 22 Indian languages. Its flagship product, Kruti is Indias agentic AI assistant. Capable of booking cabs ordering food and navigating everyday Indian life all through voice commands in the users native language.
What makes Krutrim different from a multilingual model is its frugal DNA: built for Indias infrastructure realities it is optimized to run efficiently without supercomputers making it viable for schools, startups and government services that can't afford heavy compute costs.
🔬 BharatGPT - The Reliance-IIT Collaboration
BharatGPT is perhaps the structurally ambitious project in Indias vernacular AI landscape. Developed collaboratively by Reliance and nine IITs it is focused on India-centric data. Aims to bring accurate AI understanding to the full spectrum of Indian languages and dialects. Not just the top five, but the long tail.
Its early deployments have been in banking and healthcare: major Indian banks have used BharatGPT for customer support in languages handling loan applications and account queries; medical institutions have deployed it for patient consultation scheduling in local languages.
Part 4: The Government Bet - The IndiaAI Mission
The Indian government has made AI a national priority. The Bhashini platform launched by the Ministry of Electronics and Information Technology under the National Language Translation Mission provides free cloud APIs, open datasets in all 22 scheduled languages and direct incentives for AI startups focusing on regional languages.
The IndiaAI Mission has committed backing to sovereign AI development. NITI Aayogs AI roadmap includes a proposal called Digital ShramSetu. A mission to empower Indias 490 million workers with digital access through voice-first vernacular AI interfaces.
The vision is that by 2035 voice- AI-powered interfaces will make digital platforms universally accessible.
Indias vernacular AI ecosystem has attracted a $990 million funding surge. The sector has expanded over 3.7 times in the year alone. The country now ranks 1 globally in AI skill penetration with 263% growth in AI talent since 2016.
Part 5: The Real-World Impact - Who Benefits and How
Vernacular AI is already changing lives.
For Farmers: A Tamil-speaking farmer in Tamil Nadu can now ask an AI chatbot. In Tamil. About weather conditions mandi prices, best crop varieties for the season and government subsidy schemes.
For Healthcare: Patients who previously couldn't describe symptoms in English can now interact with AI-powered health interfaces in Marathi, Odia or Punjabi.
For Education: Startups are building AI tutors that can teach mathematics in Bhojpuri explain science concepts in Kannada and prepare students for exams in Marathi.
For Finance: With UPI penetration deep in rural India the next phase is voice-enabled payments in local languages.
For Businesses: Over 60% of online shoppers in India prefer browsing in their regional language. Voice AI for customer support is already reducing costs by 60-70% for companies that adopt it.
Part 6: The Mountain to Climb
For all the progress the challenges ahead are formidable.
Building a language model requires quantities of clean labeled text and audio data. For resource Indian languages startups are building it from scratch. Community by community voice note by voice note.
Dialect complexity makes things harder still. Indias linguistic diversity means that within a single language like Hindi the version spoken in Uttar Pradesh differs meaningfully from Bhojpuri, Awadhi or Maithili.
Investor bias is a structural challenge. Funding tends to concentrate on Hindi, Tamil and Telugu. The languages with the user bases and the most commercial potential.
Policy uncertainty remains a gap. India lacks data governance standards for AI development.
Part 7: Why This Matters Beyond India
Indias AI revolution has implications that extend far beyond its borders.
There are roughly 7,000 living languages in the world. Of those perhaps 20 receive AI attention. The other 6,980 are digitally invisible. Their speakers shut out of the transformative technology of our era.
The methods being developed by startups. Community-sourced data collection, WhatsApp-based voice drives, cultural annotation frameworks, lightweight models designed for low-bandwidth environments. Are not just solutions for India. They are blueprints for the world.
Indias startups are proving that it can be done. They are proving that a 25-year-old founder, in Assam can build what Google won't. They are proving that community-sourced data is not a workaround. It is the honest way to capture the real meaning of a language.
That is a lesson the global AI industry desperately needs to learn.
The Language of Inclusion
The AI revolution has made a promise: that everyone anywhere can use intelligence no matter where they come from.
That promise isn't true yet.
If you speak a language that doesn't matter to Silicon Valley you're left out. You're not left out because you're poor or didn't get an education. Because you were born into a language that wasn't included in a training dataset.
Startups like TuluAI, Aakhor AI, KashmiriGPT, Sarvam AI, Krutrim BharatGPT and many more are working to make that promise real. They're doing it one voice note at a time one storytelling session at a time one WhatsApp audio clip at a time.
They're not waiting for OpenAI to care about languages like Tulu. They know it won't.
By building what no one else will they're doing something. They're showing that every language has a way of looking at the world that can't be replaced. And that we need to build a future that includes all languages or its not a future for everyone.
📌 Share this if you think language shouldn't be a barrier to technology. The revolution is about using languages. Or its not complete.
Comments
Post a Comment