Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Experience with machine learning workflows and integrating ML models into production pipelines.
Expertise in distributed systems and big data technologies like Hive, Presto, Spark, or Azure equivalents or similar.
Solid programming skills in C#, .NET, SQL, Python or equivalent, with a focus on scalable and cost-effective solutions.
Deep understanding of distributed systems, stream processing, and high-performance computing.
Proven ability to automate data auditing and implement data lineage tracking tools to reduce operational overhead.
Experience handling large-scale, high-volume datasets with an emphasis on cost optimization.
Knowledge of CI/CD pipelines, containerized environments, and cloud infrastructure.
Preferred Qualifications:
Familiarity with data visualization tools for delivering operational insights.
Proven experience in data privacy compliance and governance practices.
Hands-on experience in building and deploying machine learning models in production environments.
Solid communication and collaboration skills to work effectively with diverse teams.
Responsibilities
Architect & Build: Develop large-scale, highly available data pipelines (batch and streaming) that power real-time machine learning and analytics across Microsoft Ads.
ML Pipeline Integration: Collaborate with data scientists to integrate models, e.g., LLMs, ranking algorithms, and fraud detection classifiers—into production workflows.
Optimize & Scale: Leverage technologies such as Azure big data frameworks (ADF, AML), SCOPE, COSMOS, Spark (or similar big data frameworks) to optimize data processing, reduce latency, and manage costs effectively.
Data Quality & Governance: Implement frameworks for auditing, lineage tracking, and automated validation to ensure data fidelity, compliance, and privacy.
Reliability & SLAs: Define, monitor, and enforce performance SLAs for mission-critical data flows in a 24x7 environment.
Automation & Tooling: Develop CI/CD pipelines, monitoring and alerting tools, to reduce manual overhead and streamline deployments.
Dashboards and Visualization: Develop dashboards using Power BI or similar tools and to enable visualization of data pipeline operations.
Leadership & Collaboration: Work cross-functionally with product managers, ML researchers, and software engineers; mentor junior engineers and guide architectural best practices.