Pipeline Architecture & Development: Design, build, and maintain scalable, reliable, and high-performance data pipelines using Azure Data Factory, Databricks, and Spark to support machine learning and analytics workloads.
ETL & Data Integration: Lead the creation and optimization of ETL processes for structured and unstructured data, ensuring data quality, lineage, and compliance with enterprise standards.
Feature Store Management: Develop and manage feature stores to streamline the reusability and governance of engineered features for AI and ML models.
ML Ops and Automation: Implement robust MLOps workflows that enable continuous integration, deployment, monitoring, and retraining of ML models in production environments.
Collaboration: Partner with data scientists, ML engineers, and business stakeholders to translate analytic requirements into scalable data solutions, and provide guidance on best practices for data engineering in an AI context.
Communication: Excellent communication skills to collaborate effectively with cross-functional teams, including data scientists, analysts, and business stakeholders.
Cloud and Security: Ensure secure data movement and storage by applying security best practices and compliance protocols within Azure cloud environments.
Continuous Improvement: Stay current with advancements in data engineering, MLOps, and cloud technologies, and proactively apply new techniques to enhance existing solutions.
Skills and attributes for success
Education: Bachelor’s or Master’s degree in Computer Science, Information Management, Data Engineering, or a closely related technical field.
Experience: 4-6 years of professional experience in data engineering, with a strong emphasis on ML Ops, data pipelines, and large-scale ETL/ELT processes.
Technical Skills: Advanced proficiency in Azure Data Factory, Databricks, and Apache Spark for designing and implementing complex data pipelines and distributed data processing jobs; expertise in
ETL/ELT processes, data wrangling, and integrating diverse data sources into unified, high-quality datasets suitable for AI and analytics applications.
MLOps and Workflow Management: Hands-on experience with feature store management and supporting ML and AI workflows in production environments; solid understanding of MLOps principles,including CI/CD for machine learning, automation of model deployment, and monitoring practices.
Programming and Scripting: Proficient in SQL and Python for data transformation, orchestration, and automation tasks; familiarity with additional programming languages (e.g., Scala, R) is a plus.
Cloud Technologies: Experience with cloud-native tools and services (e.g., Azure, AWS, Google Cloud) for data storage, processing, and orchestration.
Data Security and Compliance: Working knowledge of data security, privacy, and compliance regulations (e.g., GDPR, HIPAA) in cloud-based solutions, ensuring data governance and protection.
Analytical and Problem-Solving Skills: Strong analytical skills to troubleshoot data-related issues and optimize data workflows for performance and efficiency.