12+ years relevant technical engineering experience
OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience
OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 4+ years technical engineering experience
OR Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 2+ years technical engineering experience.
8+ years of work experience in managing manufacturing quality in the electronic industry.
5+ years of direct engineering experience in hardware system issue resolution for GPU Servers.
Versed in filtering through applicable debug data, like telemetry and logs to identify and investigate HW failure signatures
Preferred Qualifications:
Bachelor's Degree in manufacturing, material, mechanical, electrical, and industrial engineering, or related field AND 7+ years experience in a manufacturing environment/repair
OR Master's Degree in manufacturing, material, mechanical, electrical, and industrial engineering, or related field AND 6+ years experience in a high-volume manufacturing environment
OR Doctorate in manufacturing, material, mechanical, electrical, and industrial engineering, or related field AND 3+ years experience in a manufacturing environment/repair
OR 9+ years equivalent experience.
Patent or track record of engineering excellency.
12+ years of experience in working with the modern server architectures – includes understanding of GPU, CPU methods for failure analysis, debugging or validation.
8+ years of system level server debugging with an understanding of power, system and network environments
3+ years of direct GPU related engineering experience in issue debug/test log review.
Leadership skills and ability to collaborate with diverse teams and drive a call to action.
Expert of root cause analysis and corrective action methods to identify contributing factors of production defects.
Ability to analyze large data sets, extract key insights, and effectively present and communicate the results.
Proficient communication and project management skills.
Responsibilities
Develop and implement a robust supplier quality management strategy to ensure the data center hardware is manufactured at the highest level of quality standards.
Lead quality issues and improvement task force to contain, mitigate, and resolve the top-quality issues impacting global data centers.
Conduct debug and failure analysis for GPU subsystems in the Azure fleet and drive resolution with partners and suppliers.
Drive the continuous improvement process based on Root Cause Analysis (RCA) and identified opportunities.
Responsible for quality readouts based on your telemetry data analysis, to bring clarity on status, actions across the organization and next steps for issue resolution.
Establish Critical-to-Quality performance metrics to measure and improve product quality.
Act as the voice of quality in the hardware change management process, ensuring quality requirements are considered and met and improved.