Bhasha Setu: Language Data & GenAI
Bridging language gaps through inclusive data science and AI initiatives that understand Indian languages
ARTPARK and IISc, along with partners, are working to create language data and AI that understands all Indians, so that Digital India is inclusive.
Our belief is that language is a barrier that billions have to transcend in order to access the benefits of the internet and digital transformation. However, language is, by default, diverse in nature and in India, it changes with geography and over time. Working with an ecosystem of partners, we are building Indian language datasets and generative AI solutions for non-English speakers.
Datasets for an inclusive India
-
Spoken language changes continuously with geography. It does not change abruptly at state or district boundaries. We are building pan-India speech and text datasets, truly representative of India's rich diversity.
-
Synthesizing Speech in Indian languages (SYSPIN) is an initiative to create a text-to-speech (TTS) synthesizer in nine Indian languages: Hindi, Bengali, Marathi, Telugu, Bhojpuri, Kannada, Magadhi, Chhattisgarhi and Maithili.
-
Speech recognition in agriculture and finance for the poor is an initiative to create resources and make them available as a digital public good in the open-source domain to spur research and innovation in speech recognition in nine different Indian: Hindi, Bengali, Marathi, Telugu, Bhojpuri, Kannada, Magadhi, Chhattisgarhi, and Maithili.
GenAI Applications
ARTPARK is a double winner of the BMGF Global Grand Challenges for Catalyzing Equitable AI Use.
LLM Chatbot
We are building an LLM-based assistant to support workers in accessing timely, accurate and actionable medical knowledge and guidance while training for and managing high-risk pregnancies.
BellonggAI
Billions of people in LMICs with marginalized identities (gender, caste, disability, sexual orientation, ethnicity, religion) face suboptimal outcomes due to bias and poorly designed programs/policies that ignore their unique needs. These intersectional considerations are often overlooked in SDG activities, leaving gaps in addressing their needs.
BelonggAI and ARTPARK are developing an LLM-based tool to help development practitioners, funders, and researchers uncover and address these exclusions and make their work more inclusive of marginalized groups.