Data Engineer
Location: Bangalore
Type: Full-time
Overview of the role
ARTPARK’s One-Health team tackles interconnected challenges in human, animal, and environmental health through collaborative and interdisciplinary efforts.
Working with city, state, and national governments, we support data-driven public health responses to endemic, epidemic, and climate-related threats through innovative solutions leveraging statistical and AI/ML-based approaches.
In this role, you will have the opportunity to engage with leading experts in disease modelling, climate-health systems, engineering, and public health, both nationally and internationally, in a dynamic and highly motivated environment.
Responsibilities
Integrate and structure data from diverse sources into a coherent, harmonised format ready for use by advanced computational models.
Develop and automate a robust, scalable data and ETL pipeline using cutting-edge technologies to ensure smooth data flow, reliability, and real-time processing.
Work with data analysts and computational epidemiologists to design and deploy simple, accessible, and scalable data access mechanisms and policies while ensuring strict data governance that complies with relevant laws and policies.
Engage in exhaustive data cataloguing and documentation for all data acquired from various sources and maintain a repository of the standards and processes used on the data
You will be responsible for streamlining the data flow so that computational and simulation modellers can easily access and utilise the data in their models without manual intervention.
Manage and handle different types of data, including spatiotemporal complex datasets - such as semi-structured and unstructured data, climate data, image datasets
Apply state-of-the-art data standardisation techniques, leveraging AI and machine learning, including large language models (LLMs), to convert unstructured and semi-structured data into clean, usable formats for production-grade models.
Requirements
Bachelor's in computer science, engineering, mathematics or related quantitative scientific discipline. A master's degree is preferred
3-5 years experience in similar roles.
Demonstrable experience in developing and implementing ETL pipelines.
Expertise in Data Engineering and Automation: Proven experience designing and implementing robust data pipelines using tools like AWS cloud services and Python. Working on and prior experience maintaining open source stacks is highly desirable
Expertise in Database Management and Data Modelling: Deep knowledge of database management, schema design, and data modelling. Working closely with the computational epidemiology team, you will design databases and structures that align with their requirements, ensuring the data is well-organised and ready for analysis.
Prior experience with AI and Machine Learning Integration is desirable but not required.
ARTPARK at IISc drives impact through innovations in AI & Robotics, by harnessing the best of research/academia, startups/industry, and government/nonprofits.
Our pioneering platform initiatives in language data & AI and health data & AI are driving national-scale impact with stakeholders such as MeitY’s Bhashini, Office of PSA, ICMR, States and Cities. At ARTPARK, you will work with the best researchers in the country and around the world in a strong data-driven environment and have the opportunity to address systemic issues and implement solutions.
These platforms are in pursuit of our vision – AI for All.