Required/Minimum Qualifications
- Doctorate in relevant field OR equivalent experience.
- 4+ years of combined experience, including 2+ years of industry experience in low-precision model optimization and quantization for LLM workloads
Other Qualifications
- Abilityto meet Microsoft, customer and/or government security screening requirementsarerequired for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications
- Experience publishing academic papers as a lead author or essential contributor.
- Experience participating in a top conference in relevant research domain.
- Proven track record in developing production-scale software for model compression and performance optimization.
- Proficient with deep learning frameworks such as PyTorch, TensorFlow, TensorRT, and ONNX Runtime.
- In-depth understanding of Transformer and LLM architecture, including various model optimization techniques such as quantization, pruning, neural architecture search (NAS), knowledge distillation, sharding/parallelism, KV cache optimization, and FlashAttention.
- Hands-on experience in setting up large scale evaluation framework for SOTA LLMs, fine tuning of large models.
- Programming skills in Python, C, and C++.
- Excellent communication skills and a team-oriented mindset.
- Hands-on experience implementing and optimizing low-level linear algebra routines and custom BLAS kernels would be a plus.
- Deep knowledge of mixed-precision arithmetic unit microarchitecture would be a plus.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:Microsoft will accept applications for the role until July 18th, 2025.