Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Senior High Performance Computing Software Engineer 
United States 
42049378

10.12.2024

As a Senior High Performance Computing (HPC) Software Engineer, you will be critical in designing and delivering the next generations of AI training, AI inferencing, virtual desktop, video and gaming infrastructure for Azure. You will be challenged across a wide spectrum of hardware architectures, network types and processor types. You will help define and deliver an end-to-end vertical view, with continuous focus on customer value, quality, performance and automation.

Your mission will be to help ensure Azure platform is consistent on performance, can scale on-demand, and engineered to withstand the unparalleled computing demand from the customer workloads. You will help building a test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality.
Required Qualifications:
  • Bachelor's Degree in Computer Science or related technical field AND 4+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Python
    • OR equivalent experience
  • 4+ years of experience in High Performance Computing (HPC) or Machine Learning

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Bachelor's Degree in Computer Science or related technical field AND8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • ORMaster's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience
  • 2+ years of experience with Deep Learning, and AI Infrastructure including Diagonostic, Profiling and Performance Analysis Tools
  • Experience on with Distributed System,High Performance Computing / Machine Learning middleware andCo-Designing Hardware-Software
  • Experience on Profiling and Performance Analysis Tools


Responsibilities
  • Willing to dive deeply into any level or layer of a problem andlearn emerging technologies, from hardware to software.
  • Evaluate and make recommendations that advance Azure infrastructure for AI and other GPU-based workloads.
  • Leads by example within the team by producing extensible and maintainable code. Optimizes, debugs, refactors, and reuses code to improve performance and maintainability, effectiveness, and return on investment (ROI). Applies metrics to drive the quality and stability of code, as well as appropriate coding patterns and best practices.
  • Ensures alignment with partners' expectations. Considers partner teams across organizations and their end goals for products to drive and achieve desirable user experiences and fitting dynamic needs of partners/customers through product development.
  • Drives identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.