Expoint - all jobs in one place

המקום בו המומחים והחברות הטובות ביותר נפגשים

Limitless High-tech career opportunities - Expoint

Microsoft Principal Software Engineer 
India, Karnataka, Bengaluru 
522463057

11.12.2024

AI Infrastructureteam is looking for passionate engineers to build the largest deep-learning infrastructure service at Microsoft. In this role you will be tasked with building new components to bring the latest innovations in AI Infrastructure onto the Azure ML platform. You will partner with top engineering talent within AzureAI Infrastructureand across Azure to work on cluster orchestration, job scheduling, storage, networking,and operating system integration. Your work will enable various AI languages and run-times on AzureAI Infrastructureto bring distributed deep learning training and inferencing to life. In addition, you will build infrastructure componentsto build, deploy,and servicehighly availableand scalable Microsoft Service Fabric and Kubernetes clusters under your care. You will lead development and customer support from the frontline andarchitecture, service excellence guidelines and a high-quality bar.

a track recordfor delivering engineering and service excellence on a mid-to-large scale service.

AI Infrastructure. We believe that building a planet-scale AI Supercomputer from the ground-up which addresses the fundamental pain-points of data scientists and AI practitioners and takes AI to the unprecedented scale is

What Is AzureAI Infrastructure

AI Infrastructureis a globally distributed, multi-tenant service that provides robust,and competitive AI infrastructure (and storage) for AI training and inferencing. By abstracting workloads from underlying infrastructure, AzureAI Infrastructurecreates a shared pool of resources that can be dynamically provisioned for full utilization of expensive GPU, and enabling data scientists to productively build, scale, experiment, and iterate their models on top of a robust, performant, scalable and cost-effective distributed infrastructure built for AI. In AzureAI Infrastructureto apply the best ideas from AI, ML, distributed systems, distributed databases, machine learning, information retrieval, networking, and security.


Qualifications
  • 10+ years of experience with coding in one of C#, C or C++, Rust, go
  • Experience working with the Linux operation system and Kubernetes cluster orchestration
  • Experience with improving service operations or engineering fundamentals
  • Excellent collaboration skills
  • AMasters orBachelorsdegree in computer science or a related field
  • At least 8 years of experience building and shipping production software or services

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Responsibilities
  • Work on the architecture, design, and development of the core AI Infrastructure services that support large scale AI training and inferencing
  • Develop, test, andmaintainbackend services written in C#, Go, Rust, C++, hosted on Kubernetes/Service Fabric clusters and Docker containers
  • Enhance systems and applications to ensure high stability, efficiency, & maintainability, low latency, tight cloud security
  • Provide operational support and DRI responsibilities for the product
  • Develop and foster a deep understanding of the machine learning systems and concepts and their usage by our customers
  • Collaborate closely with engineers, data scientists within the team, internal Microsoft Research teams and external enterprises to build better solutions together
  • Provide vision,expertise, and technical leadership to other team members
  • Help to grow talent in these areas