Expoint – all jobs in one place
המקום בו המומחים והחברות הטובות ביותר נפגשים
Limitless High-tech career opportunities - Expoint

Microsoft Principal Software Engineering Manager- GPU Inference Optimization 
Taiwan, Taoyuan City 
149860283

Today

The R&D of Search Ads aims to build an online advertising ecosystem of users, advertisers, and the search engine.
This is a lead role focused on GPU inference optimization of large and small language models: it requires hands-on software development skills and expiernce to lead the team efforts by applying the model-coach-care practices. We’re looking for someone who has a demonstrated history of solving hard technical problems and is motivated to tackle the hardest problems in building a full end-to-end AI stack. An entrepreneurial approach and ability to take initiative and move fast are essential.


Qualifications

• Bachelor's degree in computer science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, ROCm or equivalent experience

• Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels

• Quick learning, good communication (fluent in English) and solid problem-solving skills

• Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers

• Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute is a plus

• Familiar with LLM inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM is a plus

Responsibilities
• Lead the software development in C/C++, Python, and in GPU languages such as CUDA, ROCm, or Triton
• Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions.
• Work with cutting-edge hardware stacks and a fast-moving software stack to deliver best-of-class inference and optimal cost.
• Engage with key partners to understand and implement inference and training optimization for state-of-the-art LLMs and other models.