Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Microsoft Data Center Engagement Lead 
United States 
316396926

10.12.2024


As the DataCenter Engagement Lead, you will play a pivotal role in shaping next-generation supercomputing systems. You will contribute to the design process, oversee buildout and validation pipelines, ensure timely delivery, and proactively drive operational excellence. Additionally, you will engage deeply with strategic customers, directly influencing their business outcomes while indirectly benefiting the broader Azure ecosystem. Your work will enable the next wave of growth and innovation in AI and high-performance computing (HPC) in the cloud.

Required Qualifications:
  • Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience
  • 3+ years of experience in cloud infrastructure, or developing/running/troubleshooting AI/HPC applications on clusters
  • 3+ years of experience in multiple DataCenter technologies: power, cooling, IT hardware, telemetry
  • 3+ years of experience in DataCenter operational logistics

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Preferred Qualifications:
  • PHD or Masters' Degree in Computer Science or Technical related fieldAND 10+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript,or Python
      • OR equivalent experience
  • Operational experience running large scale HPC systems or infrastructure situated in Cloud environments
  • Previous experience with GPU-based HPC systems
  • Expertise in Cloud Computing, Virtualization and Container Technologies

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:Microsoft will accept applications for the role until December 16, 2024.


Responsibilities
  • Partner with cross organizational teams to drive architecture, design, development and deployment of end-to-end solutions to manage core infrastructure including current & next generation datacenter, IT hardware, power & cooling technologies
  • Drive operational excellence by developing strategies and execution plans to improve key metrics such as Job Mean Time to Interrupt, Nodes in Service, Mean Time to Resolve on flagship supercomputers.
  • Drive prioritization across the key issues and tactical decision making mindful of resourcing & staffing constraints. Monitor SLA’s across partner teams and champion efforts to improve efficiency across staffing & resourcing constraints.
  • Define and drive the development of integrated telemetry and data pipelines needed to provide real time alerting and monitoring of job impacting incidents
  • Partner with teams on continuous learning and continuous improvement programs by leading the resolution of complex incidents, driving root cause analyses and championing initiatives to minimize future customer impact
  • Lead and grow a team of engineers to build scalable services while championing a growth mindset, diversity and inclusion, and our model, coach, care management philosophy.