Data Curation Associate-North Zone

Location: Bangalore

Type: Full time Consultant

About the Team

You will be a part of the Language Data & AI team at ARTPARK and IISc.

We are focused on building an ecosystem for AI in the Indian languages space. To this end, we create datasets and models for enabling applications for broad societal impact. We are running some of the largest data initiatives in the world and these ambitious India-wide programs are creating high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing).

You will be part of this core team at ARTPARK.

Overview of the project

As part of an ambitious nation-wide program, you will help create unique, high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing) in Indian languages.

With several large projects under BhashaSetu, ARTPARK’s vision is to spearhead the creation of an inclusive digital-India through propel the AI advancements in Indic languages spanning projects in speech data collection, curation and advanced language modelling.

You will be part of the operations team which drives data collection and curation across all projects in BhashaSetu program, working closely with the ARTPARK team.

Role & key responsibilities

  1. You will be responsible for the data from one or more of the following states:

    • Punjab

    • Haryana

    • Delhi

    • Himachal Pradesh

    • Uttarakhand

    • Uttar Pradesh

    • Bihar

  2. Understand requirements for data curation and its implications on the AI models built. Understanding the requirements thoroughly as and when guideline documentation is received.

  3. Search and recruit the correct curation experts as required by the project through searching and contacts (e.g., NGOs, local institutes etc.) in the districts/area that you are managing

    • Design task flyers and find out all ways to reach the individuals and language experts(local in a district) who could be interested in data curation and quality checking

    • Contact (through phone call and WhatsApp) to applicants as well as those who did not apply

    • Host project awareness calls with potential experts to drive understanding of their tasks

  4. Manage day-to-day curation operations for audio and transcription data

    • Training the recruited experts in required tasks. Provide relevant documentation for training

    • Assigning of daily workload basis their availability and closely coordinating to get the work done

    • Supervising their daily performance and review their work on a daily basis.

Skills and background

  1. Should be a native and local language speaker of the local language of the following districts:

    • Punjabi - (primarily from Kapurthala, Fazilka, Pathankot)

    • Haryanvi, Jatu - (primarily from Charkhi Dadri, Jhajjar, Rohtak)

    • Hindi - (primarily from Chandigarh, Delhi, Lucknow)

  2. Should be good at verbal and written communication both in English and local language

  3. Should be good with handling multiple people (remotely working) and get the task done by them.

  4. Skills: Microsoft Office (Excel, Word, PowerPoint) and Google workspace (Docs, Sheets, Slides)

Good-to-haves

  1. Experience in data curation

  2. Experience in speech data annotation and labelling

  3. Experience in working with data sourcing and annotation companies

ARTPARK at IISc drives impact through innovations in AI & Robotics, by harnessing the best of research/academia,  startups/industry, and government/nonprofits.

Our pioneering platform initiatives in language data & AI and health data & AI are driving national-scale impact with stakeholders such as MeitY’s Bhashini,  Office of PSA, ICMR, States and Cities.

These platforms are in pursuit of our vision – AI for All.

Previous
Previous

Data Curation Associate-South Zone

Next
Next

Data Curation Associate-North East Zone