Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft Senior Software Engineer - High Performance Computing 
United States, Washington 
892315648

17.07.2024

You will play a critical role in delivering and maintaining the infrastructure for our cloud supercomputers and enabling the revolution of AI. You will be responsible for owning the delivery and burn-in of clusters into Azure independently, ensuring that the hardware is stable for customers to run their applications. This will involve working closely with hardware vendors and other teams to ensure that the clusters are properly configured and optimized for performance across CPU (Central Processing Unit), accelerators, and network infrastructure as well as tracking progress during all the stages of the process.In addition, you will be responsible for automating the quality process and debugging issues as they arise, ensuring successful resolution. This will involve developing and maintaining tools and processes to automate testing and ensure that quality is built into every step of the development process. You will also work closely with other teams to diagnose and resolve issues, and to ensure that our customers have seamless experience using our cloud supercomputers, as well as becoming the voice of the customer to represent their issues.Your attention to detail will be critical in this role, as you will be responsible for ensuring that quality is always front and center as well as having the desire to identify and isolate potential issues in the early phases of the project. This will involve reviewing system level specification, code and configurations, and working with other teams to identify and address any issues that arise. You will also be responsible for documenting processes and procedures, and for ensuring that our team is following the industry’s best practices and standards for software development and deployment.


Qualifications

Required Qualifications:

  • Bachelor's Degree in Computer Science, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Shell, PowerShell or Python
    • OR equivalent experience.
  • 4+ years of experience doing automation and debug complex issues across host/network/performance.

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Shell, PowerShell or Python
    • OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Shell, PowerShell or Python
    • OR equivalent experience.
  • 8+ years of experience preferred doing automation and debug complex issues at cloud scale, including host/network/performance
  • 8+ years of experience in large scale and distributed system automation/execution frameworks

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:

Responsibilities
  • Collaborates with appropriate stakeholders to determine user requirements for a scenario.
  • Drives identification of dependencies and the development of design documents for a product, application, service, or platform.
  • Creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
  • Leverages subject-matter expertise of product features and partners with appropriate stakeholders (e.g., project managers) to drive a workgroup's project plans, release plans, and work items.
  • Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.
  • Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.