By applying to this position, you are required to be local to the San Francisco area or Redmond area and in office 3 days a week.
Responsibilities
- Build, maintain, and enhance data ETL pipelines for processing large-scale data with low latency and high throughput to support Copilot operations.
- Design and maintain high throughput, low latency experimentation reporting pipelines that enable data scientists and product teams to measure model performance and user engagement.
- Own data quality initiatives including monitoring, alerting, validation, and remediation processes to ensure data integrity across all downstream systems.
- Implement robust schema management solutions that enable quick and seamless schema evolution without disrupting downstream consumers.
- Develop and maintain data infrastructure that supports real-time and batch processing requirements for machine learning model training and inference.
- Collaborate with ML engineers and data scientists to optimize data access patterns and improve pipeline performance for model evaluation workflows.
- Design scalable data architectures that can handle growing data volumes and evolving business requirements.
- Implement comprehensive monitoring and observability solutions for data pipelines, including SLA tracking and automated alerting.
- Partner with cross-functional teams to understand data requirements and translate them into efficient technical solutions.
Required Qualifications
- Doctorate in Computer Science, Data Engineering, Software Engineering, or related field AND 4 year(s) data engineering experience (e.g., building ETL pipelines, managing distributed data systems, implementing data quality frameworks)
- OR Master's Degree in Computer Science, Data Engineering, Software Engineering, or related field AND 6 years data engineering experience (e.g., building ETL pipelines, managing distributed data systems, implementing data quality frameworks)
- OR Bachelor's Degree in Computer Science, Data Engineering, Software Engineering, or related field AND 8 years data engineering experience (e.g., building ETL pipelines, managing distributed data systems, implementing data quality frameworks)
- OR equivalent experience.
- Experience building and maintaining production data pipelines at scale using technologies such as Apache Spark, Kafka, or similar distributed processing frameworks.
- Experience writing production-quality Python, Scala, or Java code for data processing applications.
- Experience building and scaling experimentation frameworks.
- Experience with cloud data platforms (Azure, AWS, or GCP) and their data services.
- Experience with schema management and data governance practices.
Preferred Qualifications
- Doctorate in Computer Science, Data Engineering, Software Engineering, or related field AND 8 years data engineering experience (e.g., building ETL pipelines, managing distributed data systems, implementing data quality frameworks)
- OR Master's Degree in Computer Science, Data Engineering, Software Engineering, or related field AND 10 years data engineering experience (e.g., building ETL pipelines, managing distributed data systems, implementing data quality frameworks)
- OR Bachelor's Degree in Computer Science, Data Engineering, Software Engineering, or related field AND 12 years data engineering experience (e.g., building ETL pipelines, managing distributed data systems, implementing data quality frameworks)
- OR equivalent experience.
- Experience with real-time data processing and streaming architectures.
- Experience with data orchestration tools such as Airflow, Prefect, or similar workflow management systems.
- Experience with containerization technologies (Docker, Kubernetes) for data pipeline deployment.
- Demonstrated experience with data quality frameworks and monitoring solutions.