Speech ML Engineer

Location: Bangalore

Type: Full time Consultant

About the Team

You will be a part of the Language Data & AI team at ARTPARK and IISc.

We are focused on building an ecosystem for AI in the Indian languages space. To this end, we create datasets and models for enabling applications for broad societal impact. We are running some of the largest data initiatives in the world and these ambitious India-wide programs are creating high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing).

You will be part of this core engineering and research team, working closely with leading NLP researchers at IISc and the leadership of ARTPARK, in addition to NLP researchers at world’s top tech companies.

Learn more about our initiatives

Overview of the project

As part of an ambitious nation-wide program, you will help create unique, high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing) in Indian languages.

With several large projects under BhashaSetu, ARTPARK’s vision is to spearhead the creation of an inclusive digital-India through propel the AI advancements in Indic languages spanning projects in speech data collection, curation and advanced language modelling.

Role & key responsibilities

As an MLE, you will be responsible for building robust machine learning solutions in collaboration with dedicated software engineers.

Responsibilities

  • Data preparation: building scripts for the preprocessing of a large amount of data.

  • Training models: Model building with different ASR toolkits such as Kaldi, ESPnet, SpeechBrain etc

  • Research: Implementing and experimenting for research on ASR/Signal Representation and assessment.

  • MLOps: Deployment of trained ASR models

Early-stage of proof of concept (PoC)

  • Setup and structure code bases that support an interactive ML experimentation process, as well as quick initial deployments

  • Develop and maintain toolsets and processes for ensuring the reproducibility of results

  • Code reviews with other technical team members at various stages of the PoC

  • Develop, extend, adopt a reliable, colab-like environment for ML

Late PoC

  • Develop ETL pipelines. Setup and maintain feature stores, databases, and data catalogs. 

  • Develop and support model metrics

Responsibilities during production deployment

  • Develop and support A/B testing. Set up continuous integration and development (CI/CD) processes and pipelines for models

  • Develop and support continuous model monitoring

  • Define and publish service-level agreements (SLAs) for model serving. Such agreements include model latency, throughput, and reliability

  • L1/L2/L3 support for model debugging

  • Develop and support model serving environments

  • Model compression and distillation

Requirements & qualitication

  • Candidates should possess a strong knowledge of ML/AI concepts, and expert-level knowledge of how those concepts can be applied. 

  • (Essential)Bachelor’s in Computer science/ Electrical engineering or related degrees

  • (Essential)Have worked on deep learning previously with PyTorch/TensorFlow

  • (Essential) Proficient in Matlab and/or Python and/or Bash scripting

  • (Preferred)Coursework done in linear algebra, probability, signal processing, machine learning etc

  • (Preferred)Familiar with speech processing

  • (Preferred)Familiar with basics of Natural language processing and language modelling

  • (Preferred)Experience with deep learning on speech tasks

  • (Preferred)Experience with deploying deep learning models

ARTPARK at IISc drives impact through innovations in AI & Robotics, by harnessing the best of research/academia,  startups/industry, and government/nonprofits.

Our pioneering platform initiatives in language data & AI and health data & AI are driving national-scale impact with stakeholders such as MeitY’s Bhashini,  Office of PSA, ICMR, States and Cities.

These platforms are in pursuit of our vision – AI for All.

Previous
Previous

Quality Check Executive

Next
Next

Sr. Machine Learning Engineer - GenAI