The ideal candidate will also have experience supporting the end-to-end ML lifecycle, including model training and experiment tracking, with MLflow experience as a strong asset. As part of our AI and Machine Learning team, you will be instrumental in enabling advanced analytics and delivering personalized user experiences.
About the role:
- Feature Engineering & Data Integration: Develop and maintain end-to-end ML feature engineering pipelines using Databricks, ensuring data is consistently structured to support ML models effectively.
- Pipeline Development & Management: Integrate diverse data sources (clickstreams, user behaviour, demographic data, etc.) and tailor data integration processes to optimize data quality and performance.
- Medallion Architecture Expertise: Build ETL/ELT pipelines that follow the bronze, silver, and gold layers of the medallion architecture, ensuring efficient data structuring for ML workflows.
- Model Training & Experiment Tracking: Support ML model training and calibration through optimized data pipelines, using MLflow for experiment tracking, model versioning, and performance monitoring.
- Query Optimization & Low Latency Pipelines: Design and implement optimized queries and low-latency data pipelines to support real-time and batch model inference in production.
- CI/CD & Deployment: Apply CI/CD best practices to ensure smooth and efficient pipeline deployments, with automated testing for consistent performance.
- Data Governance & Compliance: Ensure pipelines meet security and compliance standards, particularly for PII, and manage metadata and master data across the data catalogue.
- Collaboration: Work closely with data scientists, data stewards, and other teams to align data ingestion and transformation efforts with business requirements.
About you:
- Experience: Minimum 4 years in data engineering, focusing on ML feature engineering, ETL pipeline development, and data preparation for machine learning.
- Databricks & Medallion Architecture: Proven expertise in managing ETL/ELT pipelines on Databricks, with a solid understanding of the medallion architecture.
- ML Lifecycle & MLflow: Familiarity with the ML lifecycle and experience using MLflow for model training, calibration, and experiment tracking is highly desirable.
- Spark & Big Data Technologies: Advanced skills in Apache Spark for big data processing and analytics.
- Programming & Querying: Strong skills in Python for data manipulation, SQL for query optimization, and performance tuning.
- Low Latency Data Pipelines: Experience in building and optimizing pipelines for low-latency model inference and serving in production environments.
- CI/CD & System Integration: Familiarity with continuous integration and deployment practices for ETL/ELT pipeline development.
- Data Pipeline Management: Expertise in managing data pipelines, ensuring adherence to security, compliance, and best practices.
- Metadata & Master Data Management: Competency in managing metadata and master data within a technical data catalogue
- You are a detail-oriented ML Data Engineer passionate about building scalable, efficient data pipelines tailored for machine learning.
- You thrive in a collaborative environment, working effectively with cross-functional teams to drive data-driven insights and personalized solutions.
- You are proactive in troubleshooting, monitoring, and optimizing data pipelines to support high-performance ML models in production.
We work hard to embrace diversity and inclusion and encourage everyone at McAfee to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.
- Bonus Program
- Pension and Retirement Plans
- Medical, Dental and Vision Coverage
- Paid Time Off
- Paid Parental Leave
- Support for Community Involvement