Expoint - all jobs in one place

The point where experts and best companies meet

Limitless High-tech career opportunities - Expoint

Microsoft Software Engineer 
India, Karnataka, Bengaluru 
437237275

17.09.2024

AI Platform. You will partner with top engineering talent within Azureand across Azure to work on cluster orchestration, job scheduling, storage, networking,and operating system integration. Your work will enable various AI languages and run-times on Azureto bring distributed deep learning training and inferencing to life. In addition, you will build infrastructure componentsto build, deploy,and servicehighly availableand scalable Microsoft Service Fabric and Kubernetes clusters under your care. You will lead development and customer support from thefrontline andarchitecture, service excellence guidelines and a high-quality bar

a track recordfor delivering engineering and service excellence on a mid-to-large scale service

. We believe that building a planet-scale AI Supercomputer from the ground-up which addresses the fundamental pain-points of data scientists and AI practitioners and takes AI to the unprecedented scale is

What Is Azure

is a globally distributed, multi-tenant service that provides robust,and competitive AI infrastructure (and storage) for AI training and inferencing. By abstracting workloads from underlying infrastructure, Azurecreates a shared pool of resources that can be dynamically provisioned for fullof expensive GPU, and enabling data scientists to productively build, scale, experiment, and iterate their models on top of a robust, performant, scalable and cost-effective distributed infrastructure built for AI. In Azureto apply the best ideas from AI, ML, distributed systems, distributed databases, machine learning, information retrieval, networking, and security.


Required and Preferred -

  • + years of experience with coding in one of C#, C or C++, Rust, go
  • Experience working with the Linux operation system and Kubernetes cluster orchestration
  • Experience with improving service operations or engineering fundamentals
  • Excellent collaboration skills
  • Amaster’s or bachelor’sdegree in computer science or a related field
  • At least3years of experience building and shipping production software or services

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Responsibilities:

  • Deliver a robust container orchestration platform forAzure AI Infrastructure
  • Design and build the scheduling sub-system thatis responsible fordelivering on the SLAs for AI training and inferencing workloads
  • Design and build storage and caching system for efficient DNN training and inferencing
  • Design and build control plane APIs for creation and management of training jobs and inference model metadata
  • Deliver node management, fault detection and node repair as a service to improve job/model reliability
  • Deliver world-class monitoring systems and telemetry pipelines to enhance service and job observability for both end-users and operators.
  • Codify security and compliance requirements by building and strengthening system defenses against malicious attacks and exploits
  • Leverage performance and profiling tools toidentifyhot spots and bottlenecks across hardware and software boundaries: from CPU, GPU, microcode, OS, networking code and drive end-to-end job performance