8+ years of experience in machine learning, MLOps, or software engineering roles, with at least 3 years in a technical leadership or lead engineer role.
Proven experience designing, building, and operationalizing large-scale machine learning systems in a production environment.
Expert knowledge of cloud platforms (preferably Azure) for ML workloads, including infrastructure as code, resource provisioning, and cost management.
Deep expertise in ML frameworks (e.g., TensorFlow, PyTorch, Scikit-learn) and MLOps tools (e.g., Kubeflow, MLflow, Azure ML).
Strong experience with containerization (Docker, Kubernetes) and microservices architecture.
Solid understanding of security, privacy, and compliance requirements in machine learning systems (e.g., GDPR, CCPA, Responsible AI).
Experience leading cross-functional teams and projects, influencing stakeholders, and driving decision-making processes.
Preferred Qualifications:
Advanced degree in Computer Science, Data Science, or a related field.
Experience with distributed computing frameworks (e.g., Apache Spark) for large-scale data processing and model training.
Hands-on experience with advanced machine learning techniques (e.g., reinforcement learning, generative models, transfer learning).
Strong knowledge of automation and orchestration tools for model monitoring and retraining in production.
Excellent communication and presentation skills, with the ability to influence technical and business stakeholders.
Key Responsibilities:
Technical Leadership:
Lead and mentor a team of machine learning and MLOps engineers in developing and deploying machine learning models and systems at scale.
Provide technical leadership across the entire ML lifecycle, from data ingestion to model development, deployment, monitoring, and governance.
Drive best practices for MLOps, ensuring the team is following modern, scalable, and secure practices for ML model deployment and operations.
ML Architecture & Pipeline Design:
Architect end-to-end machine learning pipelines, ensuring scalability, robustness, and maintainability in production environments.
Design and implement CI/CD pipelines for ML model training, testing, deployment, and monitoring, with a focus on automation and reducing time-to-market.
Optimize model performance and cost efficiency through advanced techniques like distributed training, model pruning, and hardware acceleration (e.g., GPUs, TPUs).
Model Governance & Compliance:
Define and enforce model governance policies including versioning, reproducibility, monitoring, and auditing to ensure compliance with regulatory and ethical standards.
Lead efforts to ensure ML models adhere to Microsoft’s security and compliance guidelines, including privacy, fairness, and responsible AI practices.
Design frameworks for model validation and drift detection, ensuring the continuous performance of models in production.
Cross-Functional Collaboration:
Collaborate with data science, software engineering, and product teams to integrate ML models into scalable, production-ready systems.
Influence the broader organization’s ML strategy by advocating for new technologies, tools, and approaches to improve the performance, scalability, and security of ML models.
Serve as a liaison between technical teams and senior leadership, translating business needs into technical solutions.
Innovation & Continuous Improvement:
Stay current with the latest trends and advancements in machine learning and MLOps to ensure that the team is adopting the best tools and practices.
Identify bottlenecks and pain points in the current ML workflows, and spearhead initiatives to improve efficiency and effectiveness.
Lead proof-of-concept (PoC) efforts for new tools, frameworks, or methods to keep the platform cutting-edge.
Mentorship & Talent Development:
Provide mentorship to engineers across the team, fostering a culture of growth and continuous learning.
Take an active role in talent development, conducting code reviews, guiding architecture decisions, and providing technical feedback.
Identify skill gaps within the team and create development plans to elevate the team’s technical competencies.