Expoint - all jobs in one place

Finding the best job has never been easier

Limitless High-tech career opportunities - Expoint

Microsoft System Engineer 
India, Karnataka, Bengaluru 
116493331

30.07.2024

Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) is the team behind Microsoft’s expanding Cloud Infrastructure and responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's over 200 online businesses including Bing, MSN, Office 365, Xbox Live, Teams, OneDrive, and the Microsoft Azure platform globally with our server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions. Our focus is on smart growth, high efficiency, andexperience to customers and partners worldwide and we are looking for passionate, high-energy engineers to help achieve that mission.

AI & Advanced Systems Engineering (CAASE)is responsible forexpanding Microsoft’s Cloud Infrastructure to enable Microsoft’s mission to empower every person and every organization on the planet to achieve more. The CAASE team is instrumental in delivering world class and innovative hardware atto ensure a high-quality experience to the millions of Microsoft Azure customers.technologies. #CAASE #AZURE #Cloud

the ability to deploy new offerings and hardware infrastructure on time, in high volume with high quality and lowest cost is of paramount importance. To achieve this goal, the CAASE team is instrumental in defining and delivering operational measures of success for hardware manufacturing, improving the planning process, quality, delivery,and sustainability related to Microsoft cloud hardware. We are looking for seasoned engineers with a dedicated passion for customer focused solutions,and industry knowledge to envision and implement future technical solutions that will manage andthe Cloud infrastructure.

System Engineerto join the team.


Qualifications

Required Qualifications

  • B.Tech/MS in Electrical/Computer/Electronics Engineering or related degree
  • 7+ years of relevant experience in Server systems/platforms design and/or validation for enterprise or cloud market segments,incomputeand/or AI systems/platforms design and development.
  • 5+ years of hands-on experience inCloud grade Front end and Back-end networks architecture and implementation.
  • Experience in post silicon validation, platform bring up, system Integration, functionalvalidationand server platform validation.
  • Good grasp on the Ethernet - Physical layer, Data Link and Network layers, Congestion control, QoS, Traffic Classes
  • Understanding CLOS networks, routing protocols - BGP, ECMP, Lossless networks, Congestion handling -DCQCN, PFC, CBFC.Understanding NPU architecture and relation to network performance like bandwidth, RTT latencies, Packet size diversity
  • Understanding onNetworking hardware - QSFP-dd cables, DACs, AECs, Cable Backplanes, NICs, PHY, Switches.
  • Ability to define validation test cases to qualify end to end network across functionality,performanceand scale testing
  • Ability to trouble shoot network issues at multiple layers - Physical layer, Datalink and Network Layer, Protocol layer
  • Great to have: Understanding on AI Network, Network Collectives, Traffic profiles in AI networks, Ultra Ethernet
  • Experience in platform level test architecture and usage of debug tools like (Lauterbach,Arium, ARM JTAG tools, debug emulators orequivalent.
  • Experience in debugging complex system level issues and ability to root-cause/identifyingpotential fixes down toa boardhardware, signal integrity, CPLD/FPGA, thermal and Firmware components, OS isrequired.
  • Programming Skills: Perl / Python / Shell Scripting.
  • communication skills (verbal and written) to interface with cross-functional technical teams within and/or outside the organization.

Preferred Qualifications

  • Experience in evaluating off the shelf OEM hardware designs, HW/FW/OS interactions, platform config trade-offs, performance tuning and optimizationsisrequired.
  • Knowledge of high-volume silicon (SoCs, GPUs, or FPGAs),compute, storage, manufacturing, and deployment.
  • In-depth experience with operating systems (Windows and/or Linux), system firmware (BIOS, BMC), and system security (hardware and software).
  • Functional knowledge of secure boot, attestation, FW update & recovery on server platform architectures.
  • Advanced troubleshooting and debugging skills.Familiarwith networking, power, rack device management and remote access environments.
  • Experienced in debugging complex system level issues and ability to root-cause/identifyingpotential fixes down toa boardhardware, signal integrity, CPLD, thermal and Firmware components, OS isrequired.
  • Understanding of AI/ML workloads and how tovalidatesoftware stacks, such astensorflow.
  • Strong verbal and written communication and presentation skills.
Responsibilities
  • state-of-the-artAI,computer, storage, networking, and accelerator hardware solutions.
  • Plan and lead System debug activities. Work with cross organization teams in definingpre-Siliconplatform bring up, test and validation execution. Own and drive the platform bring up with SOC,testand validation plans & execution.
  • Be able to lead cross functional/cross org work groups leading innovative solutions and solving complex problems.
  • Analyze new interfaces and subsystems to develop integration plans, analyze power efficiency, debug integration issues, and provide recommendations.
  • Define system behavior and concept of operations for the platform to ensure compatibility with Microsoft Azure datacenter software, serviceability, telemetry, and customer expectations.
  • Perform NUDD (new, unique, different, and difficult) technology and feature analysis and provide risk assessment and mitigations.
  • Drive technical requirements and ensure the solution is flexible and scalable across the full (HW/FW/SW) stack.
  • Enable platform and solution level discussions, influencing architecture of the product, and delivering to product goals across quality, reliability, and performance.
  • Collaborate with internal, external, and open-source partners to onboard innovative technologies in a seamless manner.