Speech ML Engineer
Location: Bangalore
Type: Full time Consultant
About the Team
You will be a part of the Language Data & AI team at ARTPARK and IISc.
We are focused on building an ecosystem for AI in the Indian languages space. To this end, we create datasets and models for enabling applications for broad societal impact. We are running some of the largest data initiatives in the world and these ambitious India-wide programs are creating high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing).
You will be part of this core engineering and research team, working closely with leading NLP researchers at IISc and the leadership of ARTPARK, in addition to NLP researchers at world’s top tech companies.
Learn more about our initiatives
Overview of the project
As part of an ambitious nation-wide program, you will help create unique, high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing) in Indian languages.
With several large projects under BhashaSetu, ARTPARK’s vision is to spearhead the creation of an inclusive digital-India through propel the AI advancements in Indic languages spanning projects in speech data collection, curation and advanced language modelling.
Role & key responsibilities
As an MLE, you will be responsible for building robust machine learning solutions in collaboration with dedicated software engineers.
Responsibilities
Data preparation: building scripts for the preprocessing of a large amount of data.
Training models: Model building with different ASR toolkits such as Kaldi, ESPnet, SpeechBrain etc
Research: Implementing and experimenting for research on ASR/Signal Representation and assessment.
MLOps: Deployment of trained ASR models
Early-stage of proof of concept (PoC)
Setup and structure code bases that support an interactive ML experimentation process, as well as quick initial deployments
Develop and maintain toolsets and processes for ensuring the reproducibility of results
Code reviews with other technical team members at various stages of the PoC
Develop, extend, adopt a reliable, colab-like environment for ML
Late PoC
Develop ETL pipelines. Setup and maintain feature stores, databases, and data catalogs.
Develop and support model metrics
Responsibilities during production deployment
Develop and support A/B testing. Set up continuous integration and development (CI/CD) processes and pipelines for models
Develop and support continuous model monitoring
Define and publish service-level agreements (SLAs) for model serving. Such agreements include model latency, throughput, and reliability
L1/L2/L3 support for model debugging
Develop and support model serving environments
Model compression and distillation
Requirements & qualitication
Candidates should possess a strong knowledge of ML/AI concepts, and expert-level knowledge of how those concepts can be applied.
(Essential)Bachelor’s in Computer science/ Electrical engineering or related degrees
(Essential)Have worked on deep learning previously with PyTorch/TensorFlow
(Essential) Proficient in Matlab and/or Python and/or Bash scripting
(Preferred)Coursework done in linear algebra, probability, signal processing, machine learning etc
(Preferred)Familiar with speech processing
(Preferred)Familiar with basics of Natural language processing and language modelling
(Preferred)Experience with deep learning on speech tasks
(Preferred)Experience with deploying deep learning models
ARTPARK at IISc drives impact through innovations in AI & Robotics, by harnessing the best of research/academia, startups/industry, and government/nonprofits.
Our pioneering platform initiatives in language data & AI and health data & AI are driving national-scale impact with stakeholders such as MeitY’s Bhashini, Office of PSA, ICMR, States and Cities.
These platforms are in pursuit of our vision – AI for All.