Data Curation Associate-West Zone
Location: Bangalore
Type: Full time Consultant
About the Team
You will be a part of the Language Data & AI team at ARTPARK and IISc.
We are focused on building an ecosystem for AI in the Indian languages space. To this end, we create datasets and models for enabling applications for broad societal impact. We are running some of the largest data initiatives in the world and these ambitious India-wide programs are creating high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing).
You will be part of this core team at ARTPARK.
Overview of the project
As part of an ambitious nation-wide program, you will help create unique, high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing) in Indian languages.
With several large projects under BhashaSetu, ARTPARK’s vision is to spearhead the creation of an inclusive digital-India through propel the AI advancements in Indic languages spanning projects in speech data collection, curation and advanced language modelling.
You will be part of the operations team which drives data collection and curation across all projects in BhashaSetu program, working closely with the ARTPARK team.
Role & key responsibilities
You will be responsible for the data from one or more of the following states:
Gujarat
Rajasthan
Maharashtra
Goa
Understand requirements for data curation and its implications on the AI models built. Understanding the requirements thoroughly as and when guideline documentation is received.
Search and recruit the correct curation experts as required by the project through searching and contacts (e.g., NGOs, local institutes etc.) in the districts/area that you are managing
Design task flyers and find out all ways to reach the individuals and language experts(local in a district) who could be interested in data curation and quality checking
Contact (through phone call and WhatsApp) to applicants as well as those who did not apply
Host project awareness calls with potential experts to drive understanding of their tasks
Manage day-to-day curation operations for audio and transcription data
Training the recruited experts in required tasks. Provide relevant documentation for training
Assigning of daily workload basis their availability and closely coordinating to get the work done
Supervising their daily performance and review their work on a daily basis.
Skills and background
Should be a native and local language speaker of the local language of the following districts:
Marathi- (primarily from Washim, Gondia, Mumbai suburban)
Gujarati- (primarily from Navsari, Valsad, Devbhoomi Dwarka, Gandhinagar)
Hindi- (primarily from Umaria, Dhar, Katni, Bhopal)
Hindi, Rajasthani, Marwari- (primarily from Barmer, Jaisalmer, Jaipur)
Should be good at verbal and written communication both in English and local language
Should be good with handling multiple people (remotely working) and get the task done by them.
Skills: Microsoft Office (Excel, Word, PowerPoint) and Google workspace (Docs, Sheets, Slides)
Good-to-haves
Experience in data curation
Experience in speech data annotation and labelling
Experience in working with data sourcing and annotation companies
ARTPARK at IISc drives impact through innovations in AI & Robotics, by harnessing the best of research/academia, startups/industry, and government/nonprofits.
Our pioneering platform initiatives in language data & AI and health data & AI are driving national-scale impact with stakeholders such as MeitY’s Bhashini, Office of PSA, ICMR, States and Cities.
These platforms are in pursuit of our vision – AI for All.