Expoint – all jobs in one place
Finding the best job has never been easier
Limitless High-tech career opportunities - Expoint

Microsoft Senior ML Research Engineer – LLM Quantization & Model Optimization 
Taiwan, Taoyuan City 
733180218

17.07.2025

Required/Minimum Qualifications

  • Doctorate in relevant field OR equivalent experience.
  • 4+ years of combined experience, including 2+ years of industry experience in low-precision model optimization and quantization for LLM workloads

Other Qualifications

  • Abilityto meet Microsoft, customer and/or government security screening requirementsarerequired for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

Preferred Qualifications

  • Experience publishing academic papers as a lead author or essential contributor.
  • Experience participating in a top conference in relevant research domain.
  • Proven track record in developing production-scale software for model compression and performance optimization.
  • Proficient with deep learning frameworks such as PyTorch, TensorFlow, TensorRT, and ONNX Runtime.
  • In-depth understanding of Transformer and LLM architecture, including various model optimization techniques such as quantization, pruning, neural architecture search (NAS), knowledge distillation, sharding/parallelism, KV cache optimization, and FlashAttention.
  • Hands-on experience in setting up large scale evaluation framework for SOTA LLMs, fine tuning of large models.
  • Programming skills in Python, C, and C++.
  • Excellent communication skills and a team-oriented mindset.
  • Hands-on experience implementing and optimizing low-level linear algebra routines and custom BLAS kernels would be a plus.
  • Deep knowledge of mixed-precision arithmetic unit microarchitecture would be a plus.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:Microsoft will accept applications for the role until July 18th, 2025.


Responsibilities
  • Design and develop novel quantization techniques to enable efficient deployment of LLM inference and training in Microsoft’s Azure production environments.
  • Drive software development and model optimization tooling proof-of-concept effort to streamline deployment of quantized models.
  • Analyze performance bottlenecks in state-of-the-art LLM architectures and drive performance improvements.
  • Prototype and evaluate emerging low-precision data formats through proof-of-concept implementations.
  • Co-design model architecture optimized for low-precision deployment in close collaboration with companywide AI teams.
  • Work cross-functionally with data scientists and ML researchers/engineers to align on model accuracy and performance goals.
  • Partner with hardware architecture and AI software framework teams to ensure end-to-end system efficiency.