Required/minimum qualifications
- Master's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 9+ years technical engineering experience OR Bachelor's Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 11+ years technical engineering experience OR equivalent experience.
- 10+ years designing and operating large-scale L2/L3 Ethernet fabrics for HPC/AI or hyperscale services.
- 5+ years of experience with Ethernet, RDMA/RoCEv2, congestion control (ECN/PFC, DCQCN, HPCC, TIMELY), routing (BGP/ECMP, IS-IS/OSPF), and load balancing (CONGA/HULA/PLB).
- 5+ years of experience with of switch/NIC architecture (ASIC pipelines, queueing/scheduling, buffers, telemetry, hash/ECMP behaviors) and optics (DR/FR/LR, PAM-4, FEC).
- 5+ years of experience with traffic generation and analysis (ixia/Keysight, TRex, pktgen, iperf, perfetto), switch/NIC telemetry, and packet capture (INT, ERSPAN, SPAN, pcaps).
- 3+ years of experience managing engineers (hiring, mentoring, performance management, org health).
Other Requirements:
- Abilityto meet Microsoft, customer and/or government security screening requirementsarerequired for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications
- Experience optimizing networks for AI collectives (all-reduce, all-gather, expert routing) and distributed training systems.
- Familiarity with programmable data planes (P4, eBPF/XDP), in-network telemetry/compute, and NIC offloads (GRO/TSO/LRO, DPDK).
- Depth in buffer management and queue disciplines (DWRR, WFQ, Deficit Round Robin, QCN, VOQ) and QoS for multi-tenant clusters.
- Experience with optic/PHY roadmaps (800G/1.6T, linear pluggables, CPO/LPO, FEC trade-offs) and DC power/cooling constraints affecting network design.
- Contributions to standards bodies/consortia (drafts, presentations) and vendor co-development.
- Proven track record shipping production network designs with measurable latency/throughput improvements and reliability gains.
- Proficiency in Python/Go and automation frameworks (Ansible/Terraform) for test, measurement, and CI.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:Microsoft will accept applications for the role until October 24th, 2025.