Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Senior Supercomputing Software Engineer 
United States 
983201212

24.12.2024
Required Qualifications:
  • Bachelor's Degree in Computer Science or related technical or scientific field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • 3+ years of experience in operating AI/HPC systems, developing and running AI/HPC applications on clusters, or operating Cloud Infrastructure.
  • 2+ years of specialized experience with one of AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure.

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
  • Bachelor's Degree in Computer ScienceOR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript,OR Python
    • OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • Previous experience with running and troubleshooting machine learning workloads on GPU-based HPC systems.
  • Experience with Cloud Computing, Virtualization and Container Technologies.
  • Familiarity with the HPC software stack.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:


Microsoft will accept applications for the role until January 6, 2025.

Responsibilities
  • Be part of a comprehensive systems management team focused on operational excellence and customer success.
  • Build tools and analyze key system metrics and telemetry to proactively identify and debug HPC system issues.
  • Partner with customers, vendors, and other teams within Azure to drive comprehensive solutions for operating world class Supercomputers in the public cloud environment.
  • Help ensure Azure platform is consistent on performance, can scale on-demand, and engineered to withstand the unparalleled computing demand from the customer workloads.
  • Contribute to a test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality.