As a Data Operations Engineer Associate, in Technology and Operations Management, you will collaborate with a dedicated team of social scientists to implement robust data pipelines to realize the potential of the firm’s data for impactful research. You will be responsible for designing and maintaining the codebases that ensure data are delivered accurately, on-time, and as efficiently as possible. Additionally, you will be at the forefront of identifying and exploring new data assets that can expand the scope and/or improve the quality of Institute research products.
Job Responsibilities:
- Conceptualize, implement, and maintain data pipelines and infrastructure that transform administrative data into data products for research use.
- Write robust, modular, and high-quality production code and algorithms to support data pipelines.
- Perform exploratory analysis on upstream administrative data to identify high-value data sources that meet the Institute’s needs.
- Implement large ETL jobs in a cost-efficient manner, ensuring optimal performance and resource utilization.
- Manage timelines and expectations across multiple simultaneous projects in a dynamic and demanding environment and write and maintain comprehensive, user-friendly documentation for data assets to facilitate ease of use and understanding.
- Troubleshoot and resolve data issues identified by research teams, ensuring data integrity and reliability and identify hidden problems and patterns in data, using these insights to implement root-cause solutions.
- Train researchers on the use and understanding of data assets, ensuring they can effectively leverage these resources for their work.
Required qualifications, capabilities, and skills:
- Bachelors degree in relevant discipline (e.g. computer science)
- 2+ years of relevant experience including implementing and maintaining data pipelines, programming data intensive applications, and troubleshooting data quality issues
- Proficiency with SQL, particularly Data Definition Language, Data Manipulation Language, and Data Query Language usage
- Proficiency in Python/Scala, particularly for ETL purposes
- Proficiency with big data technology such as Spark
- Hands-on practical experience in data engineering, particularly in creating robust, business-as-usual (BAU) data assets for analytical teams
- Experience in developing, debugging, and maintaining code in a complex corporate environment
- Experience in writing and maintaining robust data documentation for a non-technical audience
- Demonstrated knowledge of database management and data asset management skills
Preferred qualifications, capabilities, and skills:
- Experience working with AWS Cloud technologies (e.g. Glue, EMR)
- Experience working with Databricks