Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Principal Software Engineer 
United States, Washington 
585498160

01.05.2024

As a Principal Software Engineer in the Azure HPC/AI team, you will play a critical role in designing and delivering the next generations of our platform by solving technical problems at all levels of the stack, contributing to our codebases to enable new features on our VMs, working on architectural proposals, and collaborating with our industry partners.


Required Qualifications:

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Python
    • OR equivalent experience
  • 2+ years of experience in High Performance Computing (HPC) or Machine Learning

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Master’s or PhD degree in computer science or related areas
  • Familiarity with Machine Learning and AI Infrastructure
  • Familiarity with Operating Systems fundamentals and virtualization technologies
  • Experience developing and/or debugging low level system software
  • Experience with Profiling and Performance Analysis Tools
  • Familiarity with Cloud Computing technologies
  • Experience in Distributed Systems
  • Experience in High Performance Computing / Machine Learning frameworks and middleware
  • Experience in Co-Designing Hardware-Software
  • Familiarity with Hardware Accelerators

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Responsibilities
  • Analyzes functionality, integration, and performance issues at various levels of the HW/SW stack on current and future generations of AI training platforms.
  • Designs and codes solutions that improve functional correctness, stability and performance of AI training oriented VM offerings and related services. When appropriate drives internal partner teams or industry partners to implement such solutions.
  • Leads by example within the team by producing extensible and maintainable code. Optimizes, debugs, refactors, and reuses code to improve performance and maintainability, effectiveness, and return on investment (ROI). Applies metrics to drive the quality and stability of code, as well as appropriate coding patterns and best practices.
  • Holds accountability as a Designated Responsible Individual (DRI), and mentors other engineers across products/solutions, working as on-call to monitor system/product/service for degradation, downtime, or interruptions.
  • Develops a playbook for the team to resolve issues.
  • Coordinates people and resources to ensure DRI responsibilities are covered across teams
  • Maintains communication with key partners across the Microsoft ecosystem of engineers.
  • Acts as a key contact for leadership to ensure alignment with partners' expectations. Considers partner teams across organizations and their end goals for products to drive and achieve desirable user experiences and fitting dynamic needs of partners/customers through product development.
  • Your mission will be to help ensure Azure platform is consistent on performance, can scale on-demand, and engineered to withstand the unparalleled computing demand from the customer workloads. You will help build a test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality.
  • Embody our and .